System.Html

  •        81

A html parser that turns badly formatted html into XPath query able xml. Similar to html tidy and html agility pack; I suppose you can call it "Just Another Html Parser". Written in c# and does not require anything that isn't found in the dot net framework.

http://systemhtml.codeplex.com/

Tags
Implementation
License
Platform

   




Related Projects

html-to-xml-sharp - Informal conversion of HTML to XML. Useful for screen-scraping with XPath.


Informal conversion of HTML to XML. Useful for screen-scraping with XPath.

html-to-xml - Informal conversion of HTML to XML. Useful for screen-scraping with XPath.


Informal conversion of HTML to XML. Useful for screen-scraping with XPath.

Light HTML to XML converter Umbraco xslt wrapper


An xslt extension for use in Umbraco that wraps the functionality found in Light HTML to XML converter by Alain COUTHURES: http://sourceforge.net/projects/light-html2xml/ The extension can help to reformat bad html into xml for getting external content i.e. screen scraping.

Nokogiri - HTML, XML, SAX, and Reader parser with XPath and CSS selector support


Nokogiri (?) is an HTML, XML, SAX, DOM parser. Among Nokogiri's many features is the ability to search documents via XPath or CSS3 selectors, XML/HTML builder, XSLT transformer. Nokogiri parses and searches XML/HTML using native libraries (either C or Java, depending on your Ruby), which means it's fast and standards-compliant.

Arbica


Arabica is an XML and HTML processing toolkit, providing SAX, DOM, XPath, and partial XSLT implementations, written in Standard C++.



Html Agility Pack


This is an HTML parser that builds a read/write DOM from “real world” HTML files. It supports XPATH or XSLT and is tolerant with "real world" malformed HTML.

TagSoup - HTML/XML parser for Haskell


TagSoup is a library for parsing HTML/XML. It supports the HTML 5 specification, and can be used to parse either well-formed XML, or unstructured and malformed HTML from the web. The library also provides useful functions to extract information from an HTML document, making it ideal for screen-scraping.

That weird screen-scraping thing


Atropine is a library for assisting with screen-scraping tasks, particularly making that of making exhaustive assertions about the structure of HTML documents. It is built on top of the fantastic BeautifulSoup HTML parser.

Beautiful Soup - Python HTML/XML parser


Beautiful Soup is a Python HTML/XML parser designed for quick turnaround projects like screen-scraping. Beautiful Soup parses anything you give it, and does the tree traversal stuff for you. You can tell it "Find all the links", or "Find all the links of class externalLink", or "Find all the links whose urls match "foo.com", or "Find the table heading that's got bold text, then give me that text."

XPath Parser


XPath parser in C# source code. Close to one System.Xml uses in the XslCompiledTransform. Currently supports XPath 1.0 grammar.

symbiosis - A Symphony CMS ensemble where XML, HTML, CSS, XSLT, XPath and SVG come to play


A Symphony CMS ensemble where XML, HTML, CSS, XSLT, XPath and SVG come to play

nakula-firequark


Firequark is an extension to Firebug to aid the process of HTML Screen Scraping. Firequark automatically extracts css selector for a single or multiple html node(s) from a web page using Firebug (a web development plugin for Firefox). The css selector generated can be given as an input to html screen scrapers like Scrapi to extract information. Firequark is built to unleash the power of css selector for use of html screen scraping.

xquery - xquery lets you extract data or evaluate from HTML/XML documents using XPath expression


A golang package, lets you extract data from HTML/XML documents using XPath expression.List of supported XPath functions you can found at XPath Package.

xpath - XPath package for Golang


XPath is Go package provides selecting nodes from XML, HTML or other documents using XPath expression.XQuery : lets you extract data from HTML/XML documents using XPath package.

saoparser - simple and scan only once html xpath parser


simple and scan only once html xpath parser

mochiweb_xpath - XPath support for mochiweb's html parser


XPath support for mochiweb's html parser

learn-xquery-xslt-xpath - XPath 2.0, XSLT 2.0 and XQuery 1.0 learn repo


XPath 2.0, XSLT 2.0 and XQuery 1.0 learn repo

nokogiri - HTML parser for PHP - ?????? HTML


?????? ?????????? - ??? ??????? ?????? html ????, ??????? ???????? ???????? ? ?????????? ?????. ?? ???? ?????????? ???????? ???????? ? ????????? UTF-8 ??? DomDocument. ??? ?????? ????????? ???????????? css-?????????, ??????? ????????????? ?????? ? xpath ?????????. ?????????? xpath ????????? ??????????, ???? ? ?????? get ?? ??? ????????? ? false ?????? ???????? (????? ????????? ??????????? ?????? ? ?????? ???????????? ????????? css ?????????). ? ???????????? ????? ->toArray() ???????? ????????? ?

xmlspectrum - XSLT 2.0 coded syntax highlighter for XSLT 2.0, XPath 2.0...


XSLT 2.0 coded syntax highlighter for XSLT 2.0, XPath 2.0...