IFilter Text Extracter

  •        0

A simple component to extract just the text from any file that has an IFilter installed. Available as a C++ COM component and as a C# .NET library.




comments powered by Disqus

Related Projects


CodeTextBox is a Windows Forms control for colorizing code while you are typing in the text box.

IndexTank - Search Engine powers Reddit

IndexTank search engine powers search in Reddit, Social bookmarking site. IndexTank is acquired by LinkedIn and released the project as open source. It includes features like Variables boosts, Facets, Faceted search, Snippeting, Custom scoring functions, Suggest, and Autocomplete.

Rainbow - portal development made easy

Rainbow CMS available today in 29 languages, allows content authoring to be safely delegated to role-based team members who need little or no knowledge of HTML. Rainbow optionally supports a two-step approval-publish process. 75 plug-in modules are now included in the standard release. It is also fairly easy to build your own custom modules.


ImageMagick is a software suite to create, edit, and compose bitmap images. It can read, convert and write images in a variety of formats (over 100) including DPX, EXR, GIF, JPEG, JPEG-2000, PDF, PhotoCD, PNG, Postscript, SVG, and TIFF. Use ImageMagick to translate, flip, mirror, rotate, scale, shear and transform images, adjust image colors, apply various special effects, or draw text, lines, polygons, ellipses and Bézier curves.

iText - Java PDF library

iText is one of the popular and widely used PDF library. It is used to generate PDF documents dynamically. Mostly web developers will love it to generate PDF documents and reports based on data from an XML file or a database and serves it to the browser. It has support of adding bookmarks, watermarks, Encryption, Form filling and lot more.

MARY - Text-to-Speech System

MARY is an open-source, multilingual Text-to-Speech Synthesis platform written in Java. It supports German, British and American English, Telugu, Turkish, and Russian.


Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-text search, especially cross-platform.

Semantic Vectors - Creating and Searching Semantic Vector using Lucene

The Semantic Vectors package uses a Random Projection algorithm, a form of automatic semantic analysis. Other methods supported by the package include Latent Semantic Analysis (LSA) and Reflective Random Indexing. Latent Semantic Analysis (LSA) is a theory and method for extracting and representing the contextual-usage meaning of words by statistical computations applied to a large corpus of text. This library is used in semantic analysis and text mining.

wxWidgets - Cross Platform GUI Library

wxWidgets is a C++ library that lets developers create applications for Windows, OS X, Linux and UNIX on 32-bit and 64-bit architectures as well as several mobile platforms including Windows Mobile, iPhone SDK and embedded GTK+. It has the usual basic controls such as text and bitmap buttons, text entry, scrolling list, combobox, checkbox, and so on. Additional to that Powerful event system, Printing facility etc..

Jackrabbit - Content Repository in Java

Apache Jackrabbit is a Content Repository fully conforming to JCR specification. Jackrabbit content repository is a hierarchical content store with support for structured and unstructured content, full text search, versioning, transactions, observation, and more.