Data Extracting SDK

  •        0

Data Extracting SDK can help you to extract information from the web resources in a simple way.

http://extracting.codeplex.com/

Tags
Implementation
License
Platform

   

comments powered by Disqus


Related Projects

DotNetZip Library


DotNetZip is a FAST, FREE class library and toolset for manipulating zip files. Use VB, C# or any .NET language to easily create, extract, or update zip files.

PDFBox - Java PDF library


Apache PDFBox is an open source Java PDF library for working with PDF documents. This library allows creation of new PDF documents, manipulation of existing documents and the ability to extract content from documents. It provides support for adding bookmarks, fonts, text extraction, Encryption, PDF printing and lot more.

Ghostscript - Document Rendering and Conversion


Ghostscript is a rendering and conversion engine for page description languages, including Postscript and PDF. It has ability to convert PostScript language files to many raster formats, view them on displays, and print them on printers that don't have PostScript language capability built in.

Semantic Vectors - Creating and Searching Semantic Vector using Lucene


The Semantic Vectors package uses a Random Projection algorithm, a form of automatic semantic analysis. Other methods supported by the package include Latent Semantic Analysis (LSA) and Reflective Random Indexing. Latent Semantic Analysis (LSA) is a theory and method for extracting and representing the contextual-usage meaning of words by statistical computations applied to a large corpus of text. This library is used in semantic analysis and text mining.

Aperture - Java framework for getting data and metadata


Aperture is a Java framework for extracting and querying full-text content and metadata from various information systems. It could crawl and extract information from File system, Websites, Mail boxes and Mail servers. It supports various file formats like Office, PDF, Zip and lot more. Metadata information is extracted from image files. Aperture has a strong focus on semantics, metadata extracted could be mapped to predefined properties.

iText - Java PDF library


iText is one of the popular and widely used PDF library. It is used to generate PDF documents dynamically. Mostly web developers will love it to generate PDF documents and reports based on data from an XML file or a database and serves it to the browser. It has support of adding bookmarks, watermarks, Encryption, Form filling and lot more.

iOS-Artwork-Extractor


Extract iOS artwork and emoji symbols into png files, generate glossy buttons png files

ANTLR - ANother Tool for Language Recognition


ANTLR (ANother Tool for Language Recognition) is a powerful parser generator for reading, processing, executing, or translating structured text or binary files. It's widely used to build languages, tools, and frameworks. From a grammar, ANTLR generates a parser that can build and walk parse trees. Twitter search uses ANTLR for query parsing, with over 2 billion queries a day.

jMIR


jMIR is intended for use in music information retrieval research involving the study of music in both audio and symbolic formats. The jMIR suite includes software for performing feature extraction, applying data mining algorithms and managing metadata.

delayed_job - Database backed asynchronous priority queue -- Extracted from Shopify


Database backed asynchronous priority queue -- Extracted from Shopify







Open source products are scattered around the web. Please provide information about the open source projects you own / you use. Add Projects.

Tag Cloud >>