htmlquery - htmlquery, an XPath query package for HTML document.

  •        2

htmlquery is an XPath query package for HTML, lets you extract data or evaluate from HTML documents by an XPath expression. Please let me know if you have any questions.

https://github.com/antchfx/htmlquery

Tags
Implementation
License
Platform

   




Related Projects

html-agility-pack - Html Agility Pack (HAP)

  •    CSharp

This is an agile HTML parser that builds a read/write DOM and supports plain XPATH or XSLT (you actually don't HAVE to understand XPATH nor XSLT to use it, don't worry...). It is a .NET code library that allows you to parse "out of the web" HTML files. The parser is very tolerant with "real world" malformed HTML. The object model is very similar to what proposes System.Xml, but for HTML documents (or streams).Have a question? Ask questions and find answers from over 2500 questions.

Nokogiri - HTML, XML, SAX, and Reader parser with XPath and CSS selector support

  •    Ruby

Nokogiri (?) is an HTML, XML, SAX, DOM parser. Among Nokogiri's many features is the ability to search documents via XPath or CSS3 selectors, XML/HTML builder, XSLT transformer. Nokogiri parses and searches XML/HTML using native libraries (either C or Java, depending on your Ruby), which means it's fast and standards-compliant.

System.Html

  •    CSharp

A html parser that turns badly formatted html into XPath query able xml. Similar to html tidy and html agility pack; I suppose you can call it "Just Another Html Parser". Written in c# and does not require anything that isn't found in the dot net framework.

Fuzi - A fast & lightweight XML & HTML parser in Swift with XPath & CSS support

  •    Swift

Fuzi is based on a Swift port of Mattt Thompson's Ono(斧), using most of its low level implementaions with moderate class & interface redesign following standard Swift conventions, along with several bug fixes. Fuzi(斧子) means "axe", in homage to Ono(斧), which in turn is inspired by Nokogiri (鋸), which means "saw".

Html Agility Pack

  •    

This is an HTML parser that builds a read/write DOM from “real world” HTML files. It supports XPATH or XSLT and is tolerant with "real world" malformed HTML.


oga - Moved to https://gitlab.com/yorickpeterse/oga

  •    Ruby

NOTE: my spare time is limited which means I am unable to dedicate a lot of time on Oga. If you're interested in contributing to FOSS, please take a look at the open issues and submit a pull request to address them where possible. Oga is an XML/HTML parser written in Ruby. It provides an easy to use API for parsing, modifying and querying documents (using XPath expressions). Oga does not require system libraries such as libxml, making it easier and faster to install on various platforms. To achieve better performance Oga uses a small, native extension (C for MRI/Rubinius, Java for JRuby).

Arbica

  •    C++

Arabica is an XML and HTML processing toolkit, providing SAX, DOM, XPath, and partial XSLT implementations, written in Standard C++.

parse5 - HTML parsing/serialization toolset for Node

  •    Javascript

HTML parsing/serialization toolset for Node.js. WHATWG HTML Living Standard (aka HTML5)-compliant.parse5 provides nearly everything you may need when dealing with HTML. It's the fastest spec-compliant HTML parser for Node to date. It parses HTML the way the latest version of your browser does. It has proven itself reliable in such projects as jsdom, Angular2, Polymer and many more.

Neko HTML Parser - simple HTML scanner

  •    Java

NekoHTML is a simple HTML scanner and tag balancer that enables application programmers to parse HTML documents and access the information using standard XML interfaces. The parser can scan HTML files and fix up many common mistakes that human (and computer) authors make in writing HTML documents. NekoHTML adds missing parent elements. Automatically closes elements with optional end tags and can handle mismatched inline element tags.

TagSoup - HTML/XML parser for Haskell

  •    Haskell

TagSoup is a library for parsing HTML/XML. It supports the HTML 5 specification, and can be used to parse either well-formed XML, or unstructured and malformed HTML from the web. The library also provides useful functions to extract information from an HTML document, making it ideal for screen-scraping.

posthtml - PostHTML is a tool to transform HTML/XML with JS plugins

  •    Javascript

PostHTML is a tool for transforming HTML/XML with JS plugins. PostHTML itself is very small. It includes only a HTML parser, a HTML node tree API and a node tree stringifier. All HTML transformations are made by plugins. And these plugins are just small plain JS functions, which receive a HTML node tree, transform it, and return a modified tree.

Hpricot - HTML parser for Ruby

  •    C

Hpricot is a fast, flexible HTML parser. Hpricot can be handy for reading broken XML files, since many of the same techniques can be used. If a quote is missing, Hpricot tries to figure it out. If tags overlap, Hpricot works on sorting them out.

php-simple-html-dom-parser - PHP Simple HTML DOM Parser adaptation for Composer and PSR-0

  •    HTML

A HTML DOM parser written in PHP5+ let you manipulate HTML in a very easy way! Require PHP 5+. Supports invalid HTML. Find tags on an HTML page with selectors just like jQuery. Extract contents from HTML in a single line.

HtmlCleaner - HTML parser in Java

  •    Java

HtmlCleaner is HTML parser written in Java. HTML found on Web is usually dirty, ill-formed and unsuitable for further processing. HtmlCleaner reorders individual elements and produces well-formed XML. By default, it follows similar rules that the most of web browsers use in order to create Document Object Model. However, user may provide custom tag and rule set for tag filtering and balancing.

JTidy - HTML parser and pretty printer in Java

  •    Java

JTidy is a Java port of HTML Tidy, a HTML syntax checker and pretty printer. Like its non-Java cousin, JTidy can be used as a tool for cleaning up malformed and faulty HTML. In addition, JTidy provides a DOM interface to the document that is being processed, which effectively makes you able to use JTidy as a DOM parser for real-world HTML.

TagSoup - SAX-compliant parser in Java

  •    Java

TagSoup, a SAX-compliant parser written in Java that, instead of parsing well-formed or valid XML, parses HTML as it is found in the wild: poor, nasty and brutish, though quite often far from short. TagSoup is designed for people who have to process this stuff using some semblance of a rational application design. TagSoup also includes a command-line processor that reads HTML files and can generate either clean HTML or well-formed XML that is a close approximation to XHTML.

htmlparser2 - forgiving html and xml parser

  •    Javascript

A forgiving HTML/XML/RSS parser. The parser can handle streams and provides a callback interface. A live demo of htmlparser2 is available here.

XPath Explorer

  •    Java

XPath Explorer (XPE) is a GUI application that lets you interactively experiment with XPath. Given an xpath and URL (to an HTML or XML document), it displays matching nodes and their values. This makes it easy to play with and debug your XPath expression

goquery - A little like that j-thing, only in Go.

  •    Go

goquery brings a syntax and a set of features similar to jQuery to the Go language. It is based on Go's net/html package and the CSS Selector library cascadia. Since the net/html parser returns nodes, and not a full-featured DOM tree, jQuery's stateful manipulation functions (like height(), css(), detach()) have been left off.Also, because the net/html parser requires UTF-8 encoding, so does goquery: it is the caller's responsibility to ensure that the source document provides UTF-8 encoded HTML. See the wiki for various options to do this.

jspoon - Annotation based HTML to Java parser + Retrofit converter

  •    Java

jspoon is a Java library that provides parsing HTML into Java objects basing on CSS selectors. It uses jsoup underneath as a HTML parser. It looks for the first occurrence in HTML and sets its value to a field.