node-htmlparser - Forgiving HTML/XML/RSS Parser in JS for *both* Node and Browsers

  •        24

#NodeHtmlParser A forgiving HTML/XML/RSS parser written in JS for both the browser and NodeJS (yes, despite the name it works just fine in any modern browser). The parser can handle streams (chunked data) and supports custom handlers for writing custom DOMs/output.

http://github.com/tautologistics/node-htmlparser

Tags
Implementation
License
Platform

   




Related Projects

parse5 - HTML parsing/serialization toolset for Node

  •    Javascript

HTML parsing/serialization toolset for Node.js. WHATWG HTML Living Standard (aka HTML5)-compliant.parse5 provides nearly everything you may need when dealing with HTML. It's the fastest spec-compliant HTML parser for Node to date. It parses HTML the way the latest version of your browser does. It has proven itself reliable in such projects as jsdom, Angular2, Polymer and many more.

htmlparser2 - forgiving html and xml parser

  •    Javascript

A forgiving HTML/XML/RSS parser. The parser can handle streams and provides a callback interface. A live demo of htmlparser2 is available here.

posthtml - PostHTML is a tool to transform HTML/XML with JS plugins

  •    Javascript

PostHTML is a tool for transforming HTML/XML with JS plugins. PostHTML itself is very small. It includes only a HTML parser, a HTML node tree API and a node tree stringifier. All HTML transformations are made by plugins. And these plugins are just small plain JS functions, which receive a HTML node tree, transform it, and return a modified tree.

HtmlParser

  •    

HtmlParser is a collection of Html processing and querying projects. The first element, the low level parser, is based on and extends Html::Parser from CPAN. This core is an event producing document parser with all other tools and libraries acting as subscribers. Using this...

RSS Parser and XML Parser for PHP 5+

  •    PHP

A full XML Parser for PHP with RSS Parser specific functionsl; think of it as an interface to the PHP DOM which allows easy access to your XML based documents. Auto encoding conversion to UTF-8 + Array to XML Conversion. V3 is now a commercial product


TagSoup - HTML/XML parser for Haskell

  •    Haskell

TagSoup is a library for parsing HTML/XML. It supports the HTML 5 specification, and can be used to parse either well-formed XML, or unstructured and malformed HTML from the web. The library also provides useful functions to extract information from an HTML document, making it ideal for screen-scraping.

Babelfish.NET

  •    DotNet

Babelfish was created as a common framework for navigating several different node-to-node structured data sources, such as HTML, CSS, Javascript, XML & JSON. Developed in C# .NET 3.5

Nokogiri - HTML, XML, SAX, and Reader parser with XPath and CSS selector support

  •    Ruby

Nokogiri (?) is an HTML, XML, SAX, DOM parser. Among Nokogiri's many features is the ability to search documents via XPath or CSS3 selectors, XML/HTML builder, XSLT transformer. Nokogiri parses and searches XML/HTML using native libraries (either C or Java, depending on your Ruby), which means it's fast and standards-compliant.

RSParser - Parser for RSS, Atom, JSON Feed, RSS-inJSON, OPML, and HTML.

  •    HTML

This framework was developed for NetNewsWire and is made available here for developers who just need the parsing code. It has no depencies that aren’t provided by the system. It also includes Objective-C wrappers for libXML2’s XML SAX and HTML SAX parsers. You can write your own parsers on top of these.

HTML Parser for PHP 4

  •    PHP

Object oriented PHP based HTML parser. The HtmlParser class allows you to interate through HTML nodes and get their attributes, names and values. It also comes with an example class for converting HTML to formatted ASCII text.

xml-stream - XML stream parser based on Expat. Made for Node.

  •    Javascript

XmlStream is a Node.js XML stream parser and editor, based on node-expat (libexpat SAX-like parser binding). When working with large XML files, it is probably a bad idea to use an XML to JavaScript object converter, or simply buffer the whole document in memory. Then again, a typical SAX parser might be too low-level for some tasks (and often a real pain).

TagSoup - SAX-compliant parser in Java

  •    Java

TagSoup, a SAX-compliant parser written in Java that, instead of parsing well-formed or valid XML, parses HTML as it is found in the wild: poor, nasty and brutish, though quite often far from short. TagSoup is designed for people who have to process this stuff using some semblance of a rational application design. TagSoup also includes a command-line processor that reads HTML files and can generate either clean HTML or well-formed XML that is a close approximation to XHTML.

Fuzi - A fast & lightweight XML & HTML parser in Swift with XPath & CSS support

  •    Swift

Fuzi is based on a Swift port of Mattt Thompson's Ono(斧), using most of its low level implementaions with moderate class & interface redesign following standard Swift conventions, along with several bug fixes. Fuzi(斧子) means "axe", in homage to Ono(斧), which in turn is inspired by Nokogiri (鋸), which means "saw".

XML::RSS - RSS feeds using Perl

  •    Perl

This module provides a basic framework for creating and maintaining RDF Site Summary (RSS) files. This distribution also contains many examples that allow you to generate HTML from an RSS, convert between 0.9, 0.91, and 1.0 version, and other nifty things.

oga - Moved to https://gitlab.com/yorickpeterse/oga

  •    Ruby

NOTE: my spare time is limited which means I am unable to dedicate a lot of time on Oga. If you're interested in contributing to FOSS, please take a look at the open issues and submit a pull request to address them where possible. Oga is an XML/HTML parser written in Ruby. It provides an easy to use API for parsing, modifying and querying documents (using XPath expressions). Oga does not require system libraries such as libxml, making it easier and faster to install on various platforms. To achieve better performance Oga uses a small, native extension (C for MRI/Rubinius, Java for JRuby).

Kanna - Kanna(鉋) is an XML/HTML parser for Swift.

  •    Swift

Kanna(鉋) is an XML/HTML parser for cross-platform(macOS, iOS, tvOS, watchOS and Linux!). It was inspired by Nokogiri(鋸).

node-xml2js - XML to JavaScript object converter.

  •    CoffeeScript

Simple XML to JavaScript object converter. It supports bi-directional conversion. Uses sax-js and xmlbuilder-js.Note: If you're looking for a full DOM parser, you probably want JSDom.

node-rss - RSS feed generator for Node.

  •    Javascript

RSS feed generator. Add RSS feeds to any project. Supports enclosures and GeoRSS. An item can be used for a blog entry, project update, log entry, etc. Your RSS feed can have any number of items. Most feeds use 20 or fewer items.

ROME

  •    Java

ROME is an set of Java tools for parsing, generating and publishing RSS and Atom feeds. The core ROME library depends only on the JDOM XML parser and supports parsing, generating and converting all of the popular RSS and Atom formats including RSS 0.90, RSS 0.91 Netscape, RSS 0.91 Userland, RSS 0.92, RSS 0.93, RSS 0.94, RSS 1.0, RSS 2.0, Atom 0.3, and Atom 1.0. You can parse to an RSS object model, an Atom object model or an abstract SyndFeed model that can model either family of formats.

Beautiful Soup - Python HTML/XML parser

  •    Python

Beautiful Soup is a Python HTML/XML parser designed for quick turnaround projects like screen-scraping. Beautiful Soup parses anything you give it, and does the tree traversal stuff for you. You can tell it "Find all the links", or "Find all the links of class externalLink", or "Find all the links whose urls match "foo.com", or "Find the table heading that's got bold text, then give me that text."