Hydra is designed to give the search solution the tools necessary to modify the data that is to be indexed in an efficient and flexible way. This is done by providing a scalable and efficient pipeline which the documents will have to pass through before being indexed into the search engine. Architecturally Hydra sits in between the search engine and the source integration. 
<BR><BR>
A common use-case would be to use Apache ManifoldCF to crawl a filesystem and send the documents to Hydra which in turn will process and dispatch processed documents to Solr for indexing. 

Hydra is designed to give the search solution the tools necessary to modify the data that is to be indexed in an efficient and flexible way. This is done by providing a scalable and efficient pipeline which the documents will have to pass through before being indexed into the search engine. Architecturally Hydra sits in between the search engine and the source integration. 

Hydra - Distributed processing framework for search solutions

The Apache Tika toolkit detects and extracts metadata and text from over a thousand different file types (such as PPT, XLS, and PDF). All of these file types can be parsed through a single interface, making Tika useful for search engine indexing, content analysis, translation, and much more.

Apache Tika - A content analysis toolkit

Discover open source projects across all platforms

Projects

Hydra - Distributed processing framework for search solutions

Apache Tika - A content analysis toolkit

TechStack

Tagcloud

License

Suggested keywords:

Projects

Hydra - Distributed processing framework for search solutions

Apache Tika - A content analysis toolkit

TechStack

Tagcloud

License