Arabica is a full SAX2 implementation, including the optional interfaces and helper classes. It provides uniform SAX2 wrappers for the Expat parser, Xerces, Libxml2 and, on Windows, for the Microsoft XML parser. It implements the DOM Level 2 Core on top of the SAX layer. Arabica implements XPath 1.0 over its DOM implementation. It builds XSLT over its XPath engine. In addition to the XML parser, Arabica includes Taggle, an HTML parser derived from TagSoup. Arabica is written in Standard C++ and should be portable to most platforms. It is parameterised on string type. Out of the box, it can provide UTF-8 encoded std::strings or UTF-16 encoded std::wstrings, but can easily be customised for arbitrary string types.
comments powered by Disqus
TagSoup, a SAX-compliant parser written in Java that, instead of parsing well-formed or valid XML, parses HTML as it is found in the wild: poor, nasty and brutish, though quite often far from short. TagSoup is designed for people who have to process this stuff using some semblance of a rational application design. TagSoup also includes a command-line processor that reads HTML files and can generate either clean HTML or well-formed XML that is a close approximation to XHTML.
ROME is an set of Java tools for parsing, generating and publishing RSS and Atom feeds. The core ROME library depends only on the JDOM XML parser and supports parsing, generating and converting all of the popular RSS and Atom formats including RSS 0.90, RSS 0.91 Netscape, RSS 0.91 Userland, RSS 0.92, RSS 0.93, RSS 0.94, RSS 1.0, RSS 2.0, Atom 0.3, and Atom 1.0. You can parse to an RSS object model, an Atom object model or an abstract SyndFeed model that can model either family of formats.
libxml++ is a C++ wrapper for the libxml XML parser library.
dom4j is a simple and flexible open source library for working with XML, XPath and XSLT on the Java platform using the Java Collections Framework with full integration with DOM, SAX and JAXP.
JsonFx v2.0 - JSON serialization framework for .NET. It has unified interface for reading / writing JSON, BSON, XML, JsonML. It implements LINQ-to-JSON, Supports reading/writing using DataContract, XmlSerialization, JsonName, attributes and lot more.
TagSoup is a library for parsing HTML/XML. It supports the HTML 5 specification, and can be used to parse either well-formed XML, or unstructured and malformed HTML from the web. The library also provides useful functions to extract information from an HTML document, making it ideal for screen-scraping.