Natural Language Processing

  •        0

Natural Language Processing



comments powered by Disqus

Related Projects

Hydra - Distributed processing framework for search solutions

Hydra is designed to give the search solution the tools necessary to modify the data that is to be indexed in an efficient and flexible way. This is done by providing a scalable and efficient pipeline which the documents will have to pass through before being indexed into the search engine. Architecturally Hydra sits in between the search engine and the source integration.

Aperture - Java framework for getting data and metadata

Aperture is a Java framework for extracting and querying full-text content and metadata from various information systems. It could crawl and extract information from File system, Websites, Mail boxes and Mail servers. It supports various file formats like Office, PDF, Zip and lot more. Metadata information is extracted from image files. Aperture has a strong focus on semantics, metadata extracted could be mapped to predefined properties.

Gate - General Architecture for Text Engineering

GATE excels at text analysis of all shapes and sizes. It provides support for diverse language processing tasks such as parsers, morphology, tagging, Information Retrieval tools, Information Extraction components for various languages, and many others. It provides support to measure, evaluate, model and persist the data structure. It could analyze text or speech. It has built-in support for machine learning and also adds support for different implementation of machine learning via plugin.

TextTeaser - Automatic Summarization Algorithm

TextTeaser is an automatic summarization algorithm that combines the power of natural language processing and machine learning to produce good results. It can provide provide a gist of an article, Better previews in news readers.

OpenPipe - Document Pipeline

OpenPipe is an open source scalable platform for manipulating a stream of documents. A pipeline is an ordered set of steps / operations performed on a document to convert from its raw form to something ready to be put into the index. The operations performed on documents include language detection, field manipulation, POS tagging, entity extraction or submitting the document to a search engine.

Flendex - Flesh-Index tool

Tool for computing Flesh Index (for german texts), written in Java. Read more on topic Flesh-Index at

Afos - Analyzer of phonological characteristics of written text

AFOS goal is to analyze phonological characteristics of written text. Currently it works with texts in Serbian (Croatian, Bosnian, Serbo-Croatian), but it will adopted for other languages, too. While its goal is to make a linguistic tool, it consists applications which may be useful for daily usage (like hyphenation is).

Dreamofalgorithm - algorithm study

algorithm study about language analysis

Text-processing-and-analysis-tool-for-turkish - A Text Processing and Analysis Tool for Turkish

The analysis of Turkish texts is significant in Turkish language, literature and a wide spectrum of areas. It is a complicated task to count language structures in the texts manually. By the way, a computer application that processes and analyzes Turkish text documents or document sets (corpus) is beneficial. In this study, the text processing and analyzing tool is developed to analyze the texts and computes various phonetic, syllable, affix, stem, word, sentence frequencies. The text processing

Oa4j - Java Client for OpenAmplify

OverviewOA4J is a java client for Version 2.1 of the OpenAmplify web service. It requires java 1.6+. It needs an API key to be used, this is free and takes a couple of minutes to obtain. InstallationAdd oa4j-x.x.x.jar to your classpath, Java 1.6+ is required. Usageimport static java.lang.System.out;import;import com.linguamathematica.oa4j.Analysis;import com.linguamathematica.oa4j.AnalysisService;import com.linguamathematica.oa4j.DefaultAnalysisService;public class Test{\tpublic sta