Automatically extract body content (and other cool stuff) from an html document
content-extraction html scraping scrape web-page body-textThis is a port of the algorithm used by the Readability bookmarklet to extract relevant pieces of information out of websites to a SAX parser. The advantage over other ports, e.g. arrix/node-readability, is a smaller memory footprint and a much faster execution. In my tests, most pages, even large ones, were finished within 15ms (on node, see below for more information). It works with Rhino, so it runs on YQL, which may have interesting uses. And it works within a browser.
readability html content-extraction instapaper
We have large collection of open source products. Follow the tags from
Tag Cloud >>
Open source products are scattered around the web. Please provide information
about the open source projects you own / you use.
Add Projects.