RichSnippetExtractor (By NowFloats)

  •        0

A library to extract Rich Snippet from HTML source documents or direct URL



comments powered by Disqus

Related Projects

TagSoup - SAX-compliant parser in Java

TagSoup, a SAX-compliant parser written in Java that, instead of parsing well-formed or valid XML, parses HTML as it is found in the wild: poor, nasty and brutish, though quite often far from short. TagSoup is designed for people who have to process this stuff using some semblance of a rational application design. TagSoup also includes a command-line processor that reads HTML files and can generate either clean HTML or well-formed XML that is a close approximation to XHTML.

Emmet — The Essential Toolkit for Web Developers

Emmet (formerly Zen Coding) is a web-developer’s toolkit that can greatly improve your HTML & CSS workflow. With Emmet, you can type CSS-like expressions that can be dynamically parsed, and produce output depending on what you type in the abbreviation. Emmet is developed and optimised for web-developers whose workflow depends on HTML/XML and CSS, but can be used with programming languages too.

Tablesorter -Flexible client-side table sorting

Tablesorter is a jQuery plugin for turning a standard HTML table with THEAD and TBODY tags into a sortable table without page refreshes. tablesorter can successfully parse and sort many types of data including linked data in a cell.


Nutch is open source web-search software. It builds on Lucene Java, adding web-specifics, such as a crawler, a link-graph database, parsers for HTML and other document formats, etc.


Solr is the popular, blazing fast open source enterprise search platform from the Apache Lucene project. Its major features include powerful full-text search, hit highlighting, faceted search, dynamic clustering, database integration, and rich document (e.g., Word, PDF) handling. Solr is highly scalable, providing distributed search and index replication, and it powers the search and navigation features of many of the world's largest internet sites.

TagSoup - HTML/XML parser for Haskell

TagSoup is a library for parsing HTML/XML. It supports the HTML 5 specification, and can be used to parse either well-formed XML, or unstructured and malformed HTML from the web. The library also provides useful functions to extract information from an HTML document, making it ideal for screen-scraping.


Small Python library with various things such as Configuration file parsing (in Python syntax), HTML and PDF parsing. Used in others of my projects.


hcxselect is a small and fast CSS selector engine for C++. It parses CSS selector expressions and applies them to a set of document nodes (or a whole tree) parsed via htmlcxx, a simple non-validating HTML parser. Thus, it allows you to use CSS selectors in your C++ program without much bloat.

Hpricot - HTML parser for Ruby

Hpricot is a fast, flexible HTML parser. Hpricot can be handy for reading broken XML files, since many of the same techniques can be used. If a quote is missing, Hpricot tries to figure it out. If tags overlap, Hpricot works on sorting them out.

Beautiful Soup - Python HTML/XML parser

Beautiful Soup is a Python HTML/XML parser designed for quick turnaround projects like screen-scraping. Beautiful Soup parses anything you give it, and does the tree traversal stuff for you. You can tell it "Find all the links", or "Find all the links of class externalLink", or "Find all the links whose urls match "", or "Find the table heading that's got bold text, then give me that text."

Open source products are scattered around the web. Please provide information about the open source projects you own / you use. Add Projects.

Tag Cloud >>