We have collection of more than 1 Million open source products ranging from Enterprise product to
small libraries in all platforms. We aggregate information from all open source repositories.
Search and find the best for your needs. Check out projects section.
Hpricot is a fast, flexible HTML parser. Hpricot can be handy for reading broken XML files, since many of the same techniques can be used. If a quote is missing, Hpricot tries to figure it out. If tags overlap, Hpricot works on sorting them out. Source code location: http://github.com/hpricot/hpricot
Hpricot is a standalone library. It requires no other libraries. Just Ruby!
Hpricot works hard to sort out bad HTML and pays a small penalty in order to get that right.
If you can see it in Firefox, then Hpricot should parse it.
Primarily, Hpricot is used for reading HTML and tries to sort out troubled HTML by having some idea of what good HTML is.
TagSoup, a SAX-compliant parser written in Java that, instead of parsing well-formed or valid XML, parses HTML as it is found in the wild: poor, nasty and brutish, though quite often far from short. TagSoup is designed for people who have to process this stuff using some semblance of a rational application design. TagSoup also includes a command-line processor that reads HTML files and can generate either clean HTML or well-formed XML that is a close approximation to XHTML.
Tablesorter is a jQuery plugin for turning a standard HTML table with THEAD and TBODY tags into a sortable table without page refreshes. tablesorter can successfully parse and sort many types of data including linked data in a cell.
JTidy is a Java port of HTML Tidy, a HTML syntax checker and pretty printer. Like its non-Java cousin, JTidy can be used as a tool for cleaning up malformed and faulty HTML. In addition, JTidy provides a DOM interface to the document that is being processed, which effectively makes you able to use JTidy as a DOM parser for real-world HTML.
HtmlCleaner is HTML parser written in Java. HTML found on Web is usually dirty, ill-formed and unsuitable for further processing. HtmlCleaner reorders individual elements and produces well-formed XML. By default, it follows similar rules that the most of web browsers use in order to create Document Object Model. However, user may provide custom tag and rule set for tag filtering and balancing.
TagSoup is a library for parsing HTML/XML. It supports the HTML 5 specification, and can be used to parse either well-formed XML, or unstructured and malformed HTML from the web. The library also provides useful functions to extract information from an HTML document, making it ideal for screen-scraping.
Jackson is a multi-purpose Java library for processing JSON data format. This project contains core low-level incremental ("streaming") parser and generator abstractions used by Jackson Data Processor. It also includes the default implementation of handler types (parser, generator) that handle JSON format.