GATE excels at text analysis of all shapes and sizes. It provides support for diverse language processing tasks such as parsers, morphology, tagging, Information Retrieval tools, Information Extraction components for various languages, and many others. It provides support to measure, evaluate, model and persist the data structure. It could analyze text or speech. It has built-in support for machine learning and also adds support for different implementation of machine learning via plugin.
It has family of products:
GATE Developer: An integrated development environment for language processing components bundled with a very widely used Information Extraction system and a comprehensive set of other plugins.
GATE Teamware: A collaborative annotation environment for factory-style semantic annotation projects built around a workflow engine and a heavily-optimized backend service infrastructure.
GATE Embedded: An object library optimized for inclusion in diverse applications giving access to all the services used by GATE Developer and more.
Tags | text-extraction text-analysis content-connector text-processing nlp |
Implementation | Java |
License | LGPL AGPL |
Platform | OS-Independent |
OpenPipe is an open source scalable platform for manipulating a stream of documents. A pipeline is an ordered set of steps / operations performed on a document to convert from its raw form to something ready to be put into the index. The operations performed on documents include language detection, field manipulation, POS tagging, entity extraction or submitting the document to a search engine.
content-connector text-analysis nlp document-pipeline text-processingTextBlob is a Python (2 and 3) library for processing textual data. It provides a simple API for diving into common natural language processing (NLP) tasks such as part-of-speech tagging, noun phrase extraction, sentiment analysis, classification, translation, and more. TextBlob stands on the giant shoulders of NLTK and pattern, and plays nicely with both.
nlp nltk pattern python-3 python-2 natural-language-processingAperture is a Java framework for extracting and querying full-text content and metadata from various information systems. It could crawl and extract information from File system, Websites, Mail boxes and Mail servers. It supports various file formats like Office, PDF, Zip and lot more. Metadata information is extracted from image files. Aperture has a strong focus on semantics, metadata extracted could be mapped to predefined properties.
document-pipeline connector content-connector text-analysis text-extraction crawler web-crawlerThe Apache OpenNLP library is a machine learning based toolkit for the processing of natural language text. It supports the most common NLP tasks, such as tokenization, sentence segmentation, part-of-speech tagging, named entity extraction, chunking, parsing, and coreference resolution. These tasks are usually required to build more advanced text processing services. OpenNLP also includes maximum entropy and perceptron based machine learning.
nlp natural-language-processing machine-learning parts-of-speech categorization auto-tagging sentence-detectorThis curated list comprises awesome resources, libraries, information sources about computational processing of texts in human languages with the Ruby programming language. That field is often referred to as NLP, Computational Linguistics, HLT (Human Language Technology) and can be brought in conjunction with Artificial Intelligence, Machine Learning, Information Retrieval, Text Mining, Knowledge Extraction and other related disciplines. This list comes from our day to day work on Language Models and NLP Tools. Read why this list is awesome. Our FAQ describes the important decisions and useful answers you may be interested in.
machine-learning natural-language-processing nlp sentiment-analysis pos-tag awesome awesome-list list computational-linguistics rubynlp rubymlApache Tika is a toolkit for detecting and extracting metadata and structured text content from various documents using existing parser libraries.
text-extraction document-extraction office-documents open-xml export conversionThe Apache Tika toolkit detects and extracts metadata and text from over a thousand different file types (such as PPT, XLS, and PDF). All of these file types can be parsed through a single interface, making Tika useful for search engine indexing, content analysis, translation, and much more.
text-analysis text-extraction library content-analysis document-analysisDerive useful insights from your data using Python. Learn the techniques related to natural language processing and text analytics, and gain the skills to know which technique is best suited to solve a particular problem. A structured and comprehensive approach is followed in this book so that readers with little or no experience do not find themselves overwhelmed. You will start with the basics of natural language and Python and move on to advanced analytical and machine learning concepts. You will look at each technique and algorithm with both a bird's eye view to understand how it can be used as well as with a microscopic view to understand the mathematical concepts and to implement them to solve your own problems.
text-analytics text-summarization text-classification natural-language natural-language-processing clustering sentiment semantic sentiment-analysis nltk stanford-nlp spacy pattern scikit-learn gensimprose is Go library for text (primarily English at the moment) processing that supports tokenization, part-of-speech tagging, named-entity extraction, and more. The library's functionality is split into subpackages designed for modular use.See the GoDoc documentation for more information.
readability prose nlp part-of-speech-tagger tokenization natural-language-processing change-case summarization summary-statisticsPyNLPl, pronounced as 'pineapple', is a Python library for Natural Language Processing. It contains various modules useful for common, and less common, NLP tasks. PyNLPl can be used for basic tasks such as the extraction of n-grams and frequency lists, and to build simple language model. There are also more complex data types and algorithms. Moreover, there are parsers for file formats common in NLP (e.g. FoLiA/Giza/Moses/ARPA/Timbl/CQL). There are also clients to interface with various NLP specific servers. PyNLPl most notably features a very extensive library for working with FoLiA XML (Format for Linguistic Annotatation). The library is a divided into several packages and modules. It works on Python 2.7, as well as Python 3.
nlp computational-linguistics linguistics library folia machine-learning language-modelling search-algorithms evaluation-metrics text-processing nlp-library natural-language-processingYou will always begin by creating a NL type calling nlp.New(), the NL type is a Natural Language Processor that owns 3 funcs, RegisterModel(), Learn() and P().RegisterModel takes 3 parameters, an empty struct, a set of samples and some options for the model.
nlp parse natural-language-processing text-extraction textStanford CoreNLP provides a set of natural language analysis tools which can take raw English language text input and give the base forms of words, their parts of speech, whether they are names of companies, people, etc., normalize dates, times, and numeric quantities, mark up the structure of sentences in terms of phrases and word dependencies, and indicate which noun phrases refer to the same entities. It provides the foundational building blocks for higher level text understanding applications.
natural-language-processing nlp nlp-parsing named-entity-recognition stanford-nlpThis is a package for Chinese text segmentation, keyword extraction and speech tagging.
cppjieba jieba chinese-text-segmentation lexical-analysis chinese nlpA lightweight open source chinese tokenizer with keywords, key sentences, summary extracts support, offered the latest lucene,solr,elasticsearch API.
jcseg mmseg chinese-word-segmentation natural-language-processing pos-tagging nlp nlp-keywords-extraction lucene-analyzer lucene-tokenizer solr-plugin elasticsearch-analyzer chinese-text-segmentation chinese-nlp keywords-extraction jcseg-analyzerUIMA analyzes large volumes of unstructured information in order to discover knowledge that is relevant to an end user. It is a framework with different set of components. The components include Language Identification, Language specific segmentation, Sentence boundary detection, Entity detection (person/place names) etc. The framework manages these components and the data flows between them.
document-pipeline connector content-connector text-extraction document-processing unstructuredSMILA is an extensible framework for building search solutions to access unstructured information in the enterprise. Besides providing essential infrastructure components and services, SMILA also delivers ready-to-use add-on components, like connectors to most relevant data sources. Using the framework as their basis will enable developers to concentrate on the creation of higher value solutions, like semantic driven applications etc.
document-pipeline connector content-connector text-extraction document-processing unstructuredThis repository contains the lecture slides and course description for the Deep Natural Language Processing course offered in Hilary Term 2017 at the University of Oxford. This is an applied course focussing on recent advances in analysing and generating speech and text using recurrent neural networks. We introduce the mathematical definitions of the relevant machine learning models and derive their associated optimisation algorithms. The course covers a range of applications of neural networks in NLP including analysing latent dimensions in text, transcribing speech to text, translating between languages, and answering questions. These topics are organised into three high level themes forming a progression from understanding the use of neural networks for sequential language modelling, to understanding their use as conditional language models for transduction tasks, and finally to approaches employing these techniques in combination with other mechanisms for advanced applications. Throughout the course the practical implementation of such models on CPU and GPU hardware is also discussed.
deep-learning machine-learning natural-language-processing nlp oxfordTCPDF is a PHP class for generating PDF documents without requiring external extensions. TCPDF Supports UTF-8, Unicode, RTL languages, XHTML, Javascript, digital signatures, barcodes and much more.
text-extraction pdf pdf-library pdf-text-extractionSnips NLU (Natural Language Understanding) is a Python library that allows to parse sentences written in natural language and extracts structured information. To find out how to use Snips NLU please refer to our documentation, it will provide you with a step-by-step guide on how to use and setup our library.
nlp nlu machine-learning data-science text-classification intent-classification ner named-entity-recognition slot-filling intent-parser information-extraction snips natural-language-processingTextTeaser is an automatic summarization algorithm that combines the power of natural language processing and machine learning to produce good results. It can provide provide a gist of an article, Better previews in news readers.
summarization nlp text-processing text-analysis summary
We have large collection of open source products. Follow the tags from
Tag Cloud >>
Open source products are scattered around the web. Please provide information
about the open source projects you own / you use.
Add Projects.