Html Agility Pack

  •        472

This is an HTML parser that builds a read/write DOM from “real world” HTML files. It supports XPATH or XSLT and is tolerant with "real world" malformed HTML.

http://htmlagilitypack.codeplex.com/

Tags
Implementation
License
Platform

   




Related Projects

HtmlAgilityPackContrib - Logical extension to HtmlAgilityPack


HtmlAgilityPackContrib - A logical extension to HtmlAgilityPack to parse HTML using jQuery like methods inspired by jSoup

System.Html


A html parser that turns badly formatted html into XPath query able xml. Similar to html tidy and html agility pack; I suppose you can call it "Just Another Html Parser". Written in c# and does not require anything that isn't found in the dot net framework.

Jumony


Jumony is based on. NET Framework 3.5 in the HTML an analysing and processing engine, makes it possible in C #, easy and convenient manipulation of HTML documents, is's also act as a Web page engine.

Babelfish.NET


Babelfish was created as a common framework for navigating several different node-to-node structured data sources, such as HTML, CSS, Javascript, XML & JSON. Developed in C# .NET 3.5

JTidy - HTML parser and pretty printer in Java


JTidy is a Java port of HTML Tidy, a HTML syntax checker and pretty printer. Like its non-Java cousin, JTidy can be used as a tool for cleaning up malformed and faulty HTML. In addition, JTidy provides a DOM interface to the document that is being processed, which effectively makes you able to use JTidy as a DOM parser for real-world HTML.



HtmlAgilityPack XPath Finder


Each web browser has different DOM generated for web page. This tool is used for easily find XPath for specific html elements in HTMLAgilityPack DOM.

Nokogiri - HTML, XML, SAX, and Reader parser with XPath and CSS selector support


Nokogiri (?) is an HTML, XML, SAX, DOM parser. Among Nokogiri's many features is the ability to search documents via XPath or CSS3 selectors, XML/HTML builder, XSLT transformer. Nokogiri parses and searches XML/HTML using native libraries (either C or Java, depending on your Ruby), which means it's fast and standards-compliant.

hext - Extensions to the HtmlAgilityPack library


A set of extensions to the HTML Agility Pack library designed to make your code more readable, maintainable, and concise.

Neko HTML Parser - simple HTML scanner


NekoHTML is a simple HTML scanner and tag balancer that enables application programmers to parse HTML documents and access the information using standard XML interfaces. The parser can scan HTML files and fix up many common mistakes that human (and computer) authors make in writing HTML documents. NekoHTML adds missing parent elements. Automatically closes elements with optional end tags and can handle mismatched inline element tags.

php4-html-dom: Fast HTML Parser for PHP


Light weight, fault tolerant, high speed single pass HTML parser. Builds HTML DOM similar to accessing the browsers DOM with javascript. Compatible with PHP4 and higher. Send in your feature requests.

CodeIgniter-dom-parser - CodeIgniter DOM Parser library by PHP Simple HTML DOM Parser


CodeIgniter DOM Parser library by PHP Simple HTML DOM Parser

php-simple-html-dom-parser - PHP Simple HTML DOM Parser adaptation for Composer and PSR-0


PHP Simple HTML DOM Parser adaptation for Composer and PSR-0

goquery - A little like that j-thing, only in Go.


goquery brings a syntax and a set of features similar to jQuery to the Go language. It is based on Go's net/html package and the CSS Selector library cascadia. Since the net/html parser returns nodes, and not a full-featured DOM tree, jQuery's stateful manipulation functions (like height(), css(), detach()) have been left off.Also, because the net/html parser requires UTF-8 encoding, so does goquery: it is the caller's responsibility to ensure that the source document provides UTF-8 encoded HTML. See the wiki for various options to do this.

PHP Simple HTML DOM Parser


A simple PHP HTML DOM parser written in PHP5+, supports invalid HTML, and provides a very easy way to handle HTML elements.

Delphi Dom HTML Parser and Converter


DOM core interface, HTML parser, HTML -gt; Unicode converter, HTML -gt; XHTML converter

kohana-simple-html-dom - Port for PHP Simple HTML DOM Parser to Kohana


Port for PHP Simple HTML DOM Parser to Kohana

TagSoup - HTML/XML parser for Haskell


TagSoup is a library for parsing HTML/XML. It supports the HTML 5 specification, and can be used to parse either well-formed XML, or unstructured and malformed HTML from the web. The library also provides useful functions to extract information from an HTML document, making it ideal for screen-scraping.

HTML-TagParser - Yet another HTML document parser with DOM-like methods


Yet another HTML document parser with DOM-like methods

Hpricot - HTML parser for Ruby


Hpricot is a fast, flexible HTML parser. Hpricot can be handy for reading broken XML files, since many of the same techniques can be used. If a quote is missing, Hpricot tries to figure it out. If tags overlap, Hpricot works on sorting them out.