Displaying 1 to 8 from 8 results

docx4j - JAXB-based Java library for Word docx, Powerpoint pptx, and Excel xlsx files

  •    Java

docx4j is a library which helps you to work with the Office OpenXML file format as used in docx documents, pptx presentations, and xlsx spreadsheets.

Apache POI - Java API To Access Microsoft Document File Formats

  •    Java

APIs for manipulating various file formats based upon Open Office XML (ECMA-376) and Microsoft's OLE 2 Compound Document formats using pure Java. Apache POI is your Java Excel, Word and PowerPoint solution. We have a complete API for porting other OOXML and OLE 2 Compound Document formats and welcome others to participate.

Hydra - Distributed processing framework for search solutions

  •    Java

Hydra is designed to give the search solution the tools necessary to modify the data that is to be indexed in an efficient and flexible way. This is done by providing a scalable and efficient pipeline which the documents will have to pass through before being indexed into the search engine. Architecturally Hydra sits in between the search engine and the source integration.

SMILA - Unified information access architecture

  •    Java

SMILA is an extensible framework for building search solutions to access unstructured information in the enterprise. Besides providing essential infrastructure components and services, SMILA also delivers ready-to-use add-on components, like connectors to most relevant data sources. Using the framework as their basis will enable developers to concentrate on the creation of higher value solutions, like semantic driven applications etc.




Behemoth - Large Scale Document Processing based on Apache Hadoop

  •    Java

Behemoth is an open source platform for large scale document processing based on Apache Hadoop. It consists of a simple annotation-based implementation of a document and a number of modules operating on these documents. One of the main aspects of Behemoth is to simplify the deployment of document analysers on a large scale.

UIMA - Unstructured information management architecture

  •    Java

UIMA analyzes large volumes of unstructured information in order to discover knowledge that is relevant to an end user. It is a framework with different set of components. The components include Language Identification, Language specific segmentation, Sentence boundary detection, Entity detection (person/place names) etc. The framework manages these components and the data flows between them.

Aspose.Words for Java Examples

  •    

This project contains example code for Aspose.Words for Java. Aspose.Words is a class library for generating, converting and rendering wordprocessing documents. Aspose.Words supports DOC, OOXML, RTF, HTML, MHTML, TXT, OpenDocument, PDF, XPS, EPUB, SWF, SVG, Image, printing ...

documents4j - Java library for converting documents into another document format

  •    Java

documents4j is a Java library for converting documents into another document format. This is achieved by delegating the conversion to any native application which understands the conversion of the given file into the desired target format.