Behemoth - Large Scale Document Processing based on Apache Hadoop

  •        0

Behemoth is an open source platform for large scale document processing based on Apache Hadoop. It consists of a simple annotation-based implementation of a document and a number of modules operating on these documents. One of the main aspects of Behemoth is to simplify the deployment of document analysers on a large scale.

https://github.com/jnioche/behemoth

Tags
Implementation
License
Platform

   

comments powered by Disqus


Related Projects

OpenOffice - leading open-source office software suite


OpenOffice is the leading open-source office software suite for word processing, spreadsheets, presentations, graphics, databases and more. It is available in many languages and works on all common computers. It stores all your data in an international open standard format and can also read and write files from other common office software packages.

KOffice - office suite for KDE


KOffice consists of several applications that offer a specialized interface for a certain task to a generic set of content components. Its office suite includes word processing, spreadsheets, presentations, graphics, databases, project planning.

Rainbow - portal development made easy


Rainbow CMS available today in 29 languages, allows content authoring to be safely delegated to role-based team members who need little or no knowledge of HTML. Rainbow optionally supports a two-step approval-publish process. 75 plug-in modules are now included in the standard release. It is also fairly easy to build your own custom modules.

Processmaker - Open source workflow and business process management (BPM) software suite


ProcessMaker is an open source workflow and business process management (BPM) software suite that allows small to medium-sized organizations to automate document intensive, approval-based processes across various systems including finance, HR and operations.

Hydra - Distributed processing framework for search solutions


Hydra is designed to give the search solution the tools necessary to modify the data that is to be indexed in an efficient and flexible way. This is done by providing a scalable and efficient pipeline which the documents will have to pass through before being indexed into the search engine. Architecturally Hydra sits in between the search engine and the source integration.

GoldenOrb - Scalable Graph Analysis


GoldenOrb is a cloud-based project for massive-scale graph analysis, built upon Apache Hadoop and modeled after Google's Pregel architecture. It provides solutions to complex data problems, remove limits to innovation and contribute to the emerging ecosystem that spans all aspects of big data analysis. It enables users to run analytics on entire data sets instead of samples.

Apache Accumulo - Key Value Store based on Google BigTable


The Apache Accumulo sorted, distributed key/value store is a robust, scalable, high performance data storage and retrieval system. Apache Accumulo is based on Google's BigTable design and is built on top of Apache Hadoop, Zookeeper, and Thrift. Apache Accumulo features a few novel improvements on the BigTable design in the form of cell-based access control and a server-side programming mechanism that can modify key/value pairs at various points in the data management process.

XOM - XML object model in Java


XOM is a new XML object model. It is a tree-based API for processing XML with Java that strives for correctness, simplicity, and performance, in that order.

Spark - Fast Cluster Computing


Apache Spark is an open source cluster computing system that aims to make data analytics fast — both fast to run and fast to write. To run programs faster, Spark offers a general execution model that can optimize arbitrary operator graphs, and supports in-memory computing, which lets it query data faster than disk-based engines like Hadoop.

jBilling


jBilling is a web-based enterprise billing and rating system. It manages your subscribers with automatic invoicing (email and PDF) and payment processing (credit cards, checks, direct deposit). Robust, well documented and easy to use!







Open source products are scattered around the web. Please provide information about the open source projects you own / you use. Add Projects.

Tag Cloud >>