GATE excels at text analysis of all shapes and sizes. It provides support for diverse language processing tasks such as parsers, morphology, tagging, Information Retrieval tools, Information Extraction components for various languages, and many others. It provides support to measure, evaluate, model and persist the data structure. It could analyze text or speech. It has built-in support for machine learning and also adds support for different implementation of machine learning via plugin. 
 It has family of products: 
 GATE Developer: An integrated development environment for language processing components bundled with a very widely used Information Extraction system and a comprehensive set of other plugins. 
 
 GATE Teamware: A collaborative annotation environment for factory-style semantic annotation projects built around a workflow engine and a heavily-optimized backend service infrastructure. 

 GATE Embedded: An object library optimized for inclusion in diverse applications giving access to all the services used by GATE Developer and more.
 <img src="/AppImages/Article/gate_img1.jpg" alt="" class="float-center">

GATE excels at text analysis of all shapes and sizes. It provides support for diverse language processing tasks such as parsers, morphology, tagging, Information Retrieval tools, Information Extraction components for various languages, and many others. It provides support to measure, evaluate, model and persist the data structure. It could analyze text or speech. It has built-in support for machine learning and also adds support for different implementation of machine learning via plugin.

Gate - General Architecture for Text Engineering

Hydra is designed to give the search solution the tools necessary to modify the data that is to be indexed in an efficient and flexible way. This is done by providing a scalable and efficient pipeline which the documents will have to pass through before being indexed into the search engine. Architecturally Hydra sits in between the search engine and the source integration. 
 
A common use-case would be to use Apache ManifoldCF to crawl a filesystem and send the documents to Hydra which in turn will process and dispatch processed documents to Solr for indexing.

Hydra is designed to give the search solution the tools necessary to modify the data that is to be indexed in an efficient and flexible way. This is done by providing a scalable and efficient pipeline which the documents will have to pass through before being indexed into the search engine. Architecturally Hydra sits in between the search engine and the source integration. 

Hydra - Distributed processing framework for search solutions

Aperture is a Java framework for extracting and querying full-text content and metadata from various information systems. It could crawl and extract information from File system, Websites, Mail boxes and Mail servers. It supports various file formats like Office, PDF, Zip and lot more. Metadata information is extracted from image files. Aperture has a strong focus on semantics, metadata extracted could be mapped to predefined properties.

Aperture - Java framework for getting data and metadata

The Apache Tika toolkit detects and extracts metadata and text from over a thousand different file types (such as PPT, XLS, and PDF). All of these file types can be parsed through a single interface, making Tika useful for search engine indexing, content analysis, translation, and much more.

Apache Tika - A content analysis toolkit

Obsei is a low code AI powered automation tool. It can be used in various business flows like social listening, AI based alerting, brand image analysis, comparative study and more. 
It consist of Observer, Analyzer and Informer. Observer observes the platform like Twitter, Facebook, App Stores, Google reviews, Amazon reviews, News, Website etc and feed that information. Analyzer performs text analysis like classification, sentiment, translation, PII on the analyzed data. Informer sends it to ticketing system, data store, dataframe etc for further action and analysis.

Obsei - Low code AI powered automation tool

OpenPipe is an open source scalable platform for manipulating a stream of documents. A pipeline is an ordered set of steps / operations performed on a document to convert from its raw form to something ready to be put into the index.
 The operations performed on documents include language detection, field manipulation, POS tagging, entity extraction or submitting the document to a search engine. OpenPipe has support to extract content from database and file system. It could extract content or metadata from any file formats.

OpenPipe is an open source scalable platform for manipulating a stream of documents. A pipeline is an ordered set of steps / operations performed on a document to convert from its raw form to something ready to be put into the index.
  The operations performed on documents include language detection, field manipulation, POS tagging, entity extraction or submitting the document to a search engine.

OpenPipe - Document Pipeline

TextTeaser is an automatic summarization algorithm that combines the power of natural language processing and machine learning to produce good results. It can provide provide a gist of an article, 
Better previews in news readers. 

TextTeaser - Automatic Summarization Algorithm

Discover open source projects across all platforms

Projects

Gate - General Architecture for Text Engineering

Hydra - Distributed processing framework for search solutions

Aperture - Java framework for getting data and metadata

Apache Tika - A content analysis toolkit

Obsei - Low code AI powered automation tool

OpenPipe - Document Pipeline

TextTeaser - Automatic Summarization Algorithm

TechStack

Tagcloud

License

Suggested keywords:

Projects

Gate - General Architecture for Text Engineering

Hydra - Distributed processing framework for search solutions

Aperture - Java framework for getting data and metadata

Apache Tika - A content analysis toolkit

Obsei - Low code AI powered automation tool

OpenPipe - Document Pipeline

TextTeaser - Automatic Summarization Algorithm

TechStack

Tagcloud

License