.net HTTP Data Extractor

  •        81

The nWeb Data Extractor Library provides support for extracting data from the http response html, it allows user to convert http response HTML to XML, then allows user to extract desired data form the generated xml file.

http://nwebdataextractor.codeplex.com/

Tags
Implementation
License
Platform

   




Related Projects

TagSoup - HTML/XML parser for Haskell


TagSoup is a library for parsing HTML/XML. It supports the HTML 5 specification, and can be used to parse either well-formed XML, or unstructured and malformed HTML from the web. The library also provides useful functions to extract information from an HTML document, making it ideal for screen-scraping.

Hpricot - HTML parser for Ruby


Hpricot is a fast, flexible HTML parser. Hpricot can be handy for reading broken XML files, since many of the same techniques can be used. If a quote is missing, Hpricot tries to figure it out. If tags overlap, Hpricot works on sorting them out.

Html-parser - An HTML parser which converts an HTML document into a tree-like data structure.


An HTML parser which converts an HTML document into a tree-like data structure.

simple-crawler - The simple crawler use Apache httpclient and Neko html parser


The simple crawler use Apache httpclient and Neko html parser

nightcrawler - Web crawler and HTML parser


Web crawler and HTML parser



goquery - A little like that j-thing, only in Go.


goquery brings a syntax and a set of features similar to jQuery to the Go language. It is based on Go's net/html package and the CSS Selector library cascadia. Since the net/html parser returns nodes, and not a full-featured DOM tree, jQuery's stateful manipulation functions (like height(), css(), detach()) have been left off.Also, because the net/html parser requires UTF-8 encoding, so does goquery: it is the caller's responsibility to ensure that the source document provides UTF-8 encoded HTML. See the wiki for various options to do this.

xquery - xquery lets you extract data or evaluate from HTML/XML documents using XPath expression


A golang package, lets you extract data from HTML/XML documents using XPath expression.List of supported XPath functions you can found at XPath Package.

Neko HTML Parser - simple HTML scanner


NekoHTML is a simple HTML scanner and tag balancer that enables application programmers to parse HTML documents and access the information using standard XML interfaces. The parser can scan HTML files and fix up many common mistakes that human (and computer) authors make in writing HTML documents. NekoHTML adds missing parent elements. Automatically closes elements with optional end tags and can handle mismatched inline element tags.

htmlparser - HTML Parser - Java library used to parse HTML in either a linear or nested fashion


HTML Parser - Java library used to parse HTML in either a linear or nested fashion

HTML Purifier - Standards compliant HTML filter written in PHP


HTML Purifier is an HTML filtering solution that uses a unique combination of robust whitelists and agressive parsing to ensure that not only are XSS attacks thwarted, but the resulting HTML is standards compliant.

Html Agility Pack


This is an HTML parser that builds a read/write DOM from “real world” HTML files. It supports XPATH or XSLT and is tolerant with "real world" malformed HTML.

jsForm


jQuery based form library to handle data in js objects. This is a ui support library for a json communication backend.

html-parser - Simple HTML Parser / Serializer library for JavaScript


Simple HTML Parser / Serializer library for JavaScript

Babelfish.NET


Babelfish was created as a common framework for navigating several different node-to-node structured data sources, such as HTML, CSS, Javascript, XML & JSON. Developed in C# .NET 3.5

HtmlCleaner - HTML parser in Java


HtmlCleaner is HTML parser written in Java. HTML found on Web is usually dirty, ill-formed and unsuitable for further processing. HtmlCleaner reorders individual elements and produces well-formed XML. By default, it follows similar rules that the most of web browsers use in order to create Document Object Model. However, user may provide custom tag and rule set for tag filtering and balancing.

JTidy - HTML parser and pretty printer in Java


JTidy is a Java port of HTML Tidy, a HTML syntax checker and pretty printer. Like its non-Java cousin, JTidy can be used as a tool for cleaning up malformed and faulty HTML. In addition, JTidy provides a DOM interface to the document that is being processed, which effectively makes you able to use JTidy as a DOM parser for real-world HTML.

Jericho HTML Parser


Jericho HTML Parser is a java library allowing analysis and manipulation of parts of an HTML document, including server-side tags, while reproducing verbatim any unrecognised or invalid HTML.

perl-HTML-StripScripts-Parser - HTML::StripScripts::Parser - XSS filter using HTML::Parser


HTML::StripScripts::Parser - XSS filter using HTML::Parser

LavaHTMLFactory - Lava HTML Factory - Write HTML with OOP PHP


Lava HTML Factory - Write HTML with OOP PHP