Data Extracting SDK

  •        86

Data Extracting SDK can help you to extract information from the web resources in a simple way.

http://extracting.codeplex.com/

Tags
Implementation
License
Platform

   




Related Projects

.net HTTP Data Extractor


The nWeb Data Extractor Library provides support for extracting data from the http response html, it allows user to convert http response HTML to XML, then allows user to extract desired data form the generated xml file.

magical_code - Code for The Magical Art of Extracting Meaning From Data: Data Mining For The Web


Code for The Magical Art of Extracting Meaning From Data: Data Mining For The Web

pn2200-g3data


Grab graph data, a program for extracting data from graphs

Aperture - Java framework for getting data and metadata


Aperture is a Java framework for extracting and querying full-text content and metadata from various information systems. It could crawl and extract information from File system, Websites, Mail boxes and Mail servers. It supports various file formats like Office, PDF, Zip and lot more. Metadata information is extracted from image files. Aperture has a strong focus on semantics, metadata extracted could be mapped to predefined properties.

NewPipeExtractor - Core part of NewPipe


NewPipe Extractor is a library for extracting things from streaming sites. It is a core component of NewPipe, but could be used independently.NewPipe Extractor is available at JitPack's Maven repo.



Data-Extractor - Collection of algorithms for extracting useful data from strings.


Collection of algorithms for extracting useful data from strings.

web-extractor - Framework for parsing/extracting data from web pages into a more manageable format


Framework for parsing/extracting data from web pages into a more manageable format

py-extractor - extracting a data from html pages.


extracting a data from html pages.

extractr - Extract text from pdfs using various web and local APIs


xpdf is the default. The structure of the three method options (xpdf, gs) for extracting using the extract() function give the same structure back: a simple list, a slot for metadata attached to the PDF, and a slot for data (the extracted text).NOTE: Some of the code in this package has been adapted from the tm R package (GPL-3 licensed), where we've borrowed some of their code for extracting text from PDFs, but have modified the code.

Content-Based Cross-Site Web Data Mining


Content-Based Cross-Site Mining (CCM) of Web Data Records algorithm combines techniques of extracting data records based on the structure of documents (HTML tags) with an analysis of the semantics of the content for better data record extraction

TiebaPostGrabber - This crawler grab the text content of the posting in threads. For Baidu::Tieba


This crawler grab the text content of the posting in threads. For Baidu::Tieba

TiebaImageGrabber - crawler grab the signature pictures of the posting from Baidu-Tieba


crawler grab the signature pictures of the posting from Baidu-Tieba

MiniCrawler - Super-tiny crawler script that will grab links or images from a web page


Super-tiny crawler script that will grab links or images from a web page

podcast - web crawler to grab itunes podcast info


web crawler to grab itunes podcast info

pdf-highlight-extractor - A simple MacRuby tool for extracting highlighted passages from PDF files


A simple MacRuby tool for extracting highlighted passages from PDF files

extractor - API for extracting terms from text


API for extracting terms from text

class-extractor - Tool for extracting PHP classes from libraries


Tool for extracting PHP classes from libraries

news-extractor - Nutch plugin for extracting news stories from well formatted news websites.


Nutch plugin for extracting news stories from well formatted news websites.