parse5 provides nearly everything you may need when dealing with HTML. It's the fastest spec-compliant HTML parser for Node to date. It parses HTML the way the latest version of your browser does. It has proven itself reliable in such projects as jsdom, Angular2, Polymer and many more.
https://github.com/inikulin/parse5Tags | html-parsing html html5 serialization serializer parser whatwg specification fast html-parser html5-parser htmlparser parse5 html-serializer htmlserializer sax simple-api parse tokenize serialize tokenizer |
Implementation | Javascript |
License | MIT |
Platform | NodeJS |
Web scraping and HTML-reprocessing. The easy way.ineed doesn't build and traverse DOM-tree, it operates on sequence of HTML tokens instead. Whole processing is done in one-pass, therefore, it's blazing fast! The token stream is produced by parse5 which parses HTML exactly the same way modern browsers do.
html scraper reprocessor parser serializerhtml5ever is an HTML parser developed as part of the Servo project. It can parse and serialize HTML according to the WHATWG specs (aka "HTML5"). There are some omissions at present, most of which are documented in the bug tracker. html5ever passes all tokenizer tests from html5lib-tests, and most tree builder tests outside of the unimplemented features. The goal is to pass all html5lib tests, and also provide all hooks needed by a production web browser, e.g. document.write.
AngleSharp is a .NET library that gives you the ability to parse angle bracket based hyper-texts like HTML, SVG, and MathML. XML without validation is also supported by the library. An important aspect of AngleSharp is that CSS can also be parsed. The included parser is built upon the official W3C specification. This produces a perfectly portable HTML5 DOM representation of the given source code and ensures compatibility with results in evergreen browsers. Also standard DOM features such as querySelector or querySelectorAll work for tree traversal.
anglesharp dom c-sharp html parser library html-parserSwiftSoup is a pure Swift library, cross-platform(macOS, iOS, tvOS, watchOS and Linux!), for working with real-world HTML. It provides a very convenient API for extracting and manipulating data, using the best of DOM, CSS, and jquery-like methods. SwiftSoup implements the WHATWG HTML5 specification, and parses HTML to the same DOM as modern browsers do. After parsing a document, and finding some elements, you'll want to get at the data inside those elements.
HTML5 is a standards-compliant HTML5 parser and writer written entirely in PHP. It is stable and used in many production websites, and has well over one million downloads.HTML5 provides the following features.
xml-namespaces html5-parser dom domdocument html5-php html5-document html5libTagSoup is a library for parsing HTML/XML. It supports the HTML 5 specification, and can be used to parse either well-formed XML, or unstructured and malformed HTML from the web. The library also provides useful functions to extract information from an HTML document, making it ideal for screen-scraping.
html-parser parser xml-parserNokogiri (?) is an HTML, XML, SAX, DOM parser. Among Nokogiri's many features is the ability to search documents via XPath or CSS3 selectors, XML/HTML builder, XSLT transformer. Nokogiri parses and searches XML/HTML using native libraries (either C or Java, depending on your Ruby), which means it's fast and standards-compliant.
xml-parser html-parser sax-reader sax dom xml css-selectorsTagSoup, a SAX-compliant parser written in Java that, instead of parsing well-formed or valid XML, parses HTML as it is found in the wild: poor, nasty and brutish, though quite often far from short. TagSoup is designed for people who have to process this stuff using some semblance of a rational application design. TagSoup also includes a command-line processor that reads HTML files and can generate either clean HTML or well-formed XML that is a close approximation to XHTML.
html-parser parser xml-parser saxJodd is developer-friendly set of Java microframeworks, tools and utilities, under 1.7 MB. Build with common sense to make things simple, but not simpler. Its feature include slick IoC container, elegant MVC framework, unique AOP engine, thin DB-object mapper, standalone transaction manager, focused validation tool, versatile HTML parsers, pages decorator, super properties, powerful BeanUtil, timeless JDateTime, easy email, many super utilities... and more.
micro-framework ioc aop database http-client html-parser mail json-parser utility-libraryhtml5lib is a pure-python library for parsing HTML. It is designed to conform to the WHATWG HTML specification, as is implemented by all major web browsers.By default, the document will be an xml.etree element instance. Whenever possible, html5lib chooses the accelerated ElementTree implementation (i.e. xml.etree.cElementTree on Python 2.x). Two other tree types are supported: xml.dom.minidom and lxml.etree.
html-library html-parser htmlNekoHTML is a simple HTML scanner and tag balancer that enables application programmers to parse HTML documents and access the information using standard XML interfaces. The parser can scan HTML files and fix up many common mistakes that human (and computer) authors make in writing HTML documents. NekoHTML adds missing parent elements. Automatically closes elements with optional end tags and can handle mismatched inline element tags.
html-parser parser xercesA fast, standards compliant, C based, HTML 5 parser for python. Over thirty times as fast as pure python based parsers, such as html5lib. See documentation for details.
Hpricot is a fast, flexible HTML parser. Hpricot can be handy for reading broken XML files, since many of the same techniques can be used. If a quote is missing, Hpricot tries to figure it out. If tags overlap, Hpricot works on sorting them out.
html-parser parserHtmlParser is a collection of Html processing and querying projects. The first element, the low level parser, is based on and extends Html::Parser from CPAN. This core is an event producing document parser with all other tools and libraries acting as subscribers. Using this...
html-parserFuzi is based on a Swift port of Mattt Thompson's Ono(斧), using most of its low level implementaions with moderate class & interface redesign following standard Swift conventions, along with several bug fixes. Fuzi(斧子) means "axe", in homage to Ono(斧), which in turn is inspired by Nokogiri (鋸), which means "saw".
xml xml-parsing xml-parser xpath html html-parsing html-parser css iosA set of PHP classes, each representing a Markdown flavor, and a command line tool for converting markdown files to HTML files. The implementation focus is to be fast (see benchmark) and extensible. Parsing Markdown to HTML is as simple as calling a single method (see Usage) providing a solid implementation that gives most expected results even in non-trivial edge cases.
markdown-parser markdown markdown-to-html markdown-converter markdown-flavors markdown-extra gfmNoggit is the world's fastest streaming JSON parser for Java. It is used in Apache Solr.
json-parser json-streaming json-serialization jsonThis is an agile HTML parser that builds a read/write DOM and supports plain XPATH or XSLT (you actually don't HAVE to understand XPATH nor XSLT to use it, don't worry...). It is a .NET code library that allows you to parse "out of the web" HTML files. The parser is very tolerant with "real world" malformed HTML. The object model is very similar to what proposes System.Xml, but for HTML documents (or streams).Have a question? Ask questions and find answers from over 2500 questions.
htmlagilitypack xpath parse hap html-parserMyHTML is a fast HTML Parser using Threads implemented as a pure C99 library with no outside dependencies. Please use the HTML parser from the Lexbor project. It is stable, has more features, and — yes — it's very fast.
html html-parser pure-cA WHATWG-compliant HTML parser with CSS selectors in Objective-C and Foundation. It parses HTML just like a browser. Copy the files in the Sources folder into your project.
We have large collection of open source products. Follow the tags from
Tag Cloud >>
Open source products are scattered around the web. Please provide information
about the open source projects you own / you use.
Add Projects.