Displaying 1 to 20 from 147 results

Html Agility Pack


This is an HTML parser that builds a read/write DOM from “real world” HTML files. It supports XPATH or XSLT and is tolerant with "real world" malformed HTML.

http-parser - http request/response parser for c


This is a parser for HTTP messages written in C. It parses both requests and responses. The parser is designed to be used in performance HTTP applications. It does not make any syscalls nor allocations, it does not buffer data, it can be interrupted at anytime. Depending on your architecture, it only requires about 40 bytes of data per message stream (in a web server that is per connection).

ANTLR - ANother Tool for Language Recognition


ANTLR (ANother Tool for Language Recognition) is a powerful parser generator for reading, processing, executing, or translating structured text or binary files. It's widely used to build languages, tools, and frameworks. From a grammar, ANTLR generates a parser that can build and walk parse trees. Twitter search uses ANTLR for query parsing, with over 2 billion queries a day.

TagSoup - SAX-compliant parser in Java


TagSoup, a SAX-compliant parser written in Java that, instead of parsing well-formed or valid XML, parses HTML as it is found in the wild: poor, nasty and brutish, though quite often far from short. TagSoup is designed for people who have to process this stuff using some semblance of a rational application design. TagSoup also includes a command-line processor that reads HTML files and can generate either clean HTML or well-formed XML that is a close approximation to XHTML.




mathjs - An extensive math library for JavaScript and Node.js


Math.js is an extensive math library for JavaScript and Node.js. It features a flexible expression parser with support for symbolic computation, comes with a large set of built-in functions and constants, and offers an integrated solution to work with different data types like numbers, big numbers, complex numbers, fractions, units, and matrices.

sanitize-html - Clean up user-submitted HTML, preserving whitelisted elements and whitelisted attributes on a per-element basis


Clean up user-submitted HTML, preserving whitelisted elements and whitelisted attributes on a per-element basis



go-humanize - Go Humans! (formatters for units to human friendly sizes)


Just a few functions for helping humanize times and sizes.go get it as github.com/dustin/go-humanize, import it as "github.com/dustin/go-humanize", use it as humanize.

commander.js - node.js command-line interfaces made easy


Options with commander are defined with the .option() method, also serving as documentation for the options. The example below parses args and options from process.argv, leaving remaining args as the program.args array which were not consumed by options.Short flags may be passed as a single arg, for example -abc is equivalent to -a -b -c. Multi-word options such as "--template-engine" are camel-cased, becoming program.templateEngine etc.

toml - TOML parser for Golang with reflection.


This package passes all tests in toml-test for both the decoder and the encoder.This package works similarly to how the Go standard library handles XML and JSON. Namely, data is loaded into Go values via reflection.

goquery - A little like that j-thing, only in Go.


goquery brings a syntax and a set of features similar to jQuery to the Go language. It is based on Go's net/html package and the CSS Selector library cascadia. Since the net/html parser returns nodes, and not a full-featured DOM tree, jQuery's stateful manipulation functions (like height(), css(), detach()) have been left off.Also, because the net/html parser requires UTF-8 encoding, so does goquery: it is the caller's responsibility to ensure that the source document provides UTF-8 encoded HTML. See the wiki for various options to do this.

colly - Fast and Elegant Scraping Framework for Gophers


Colly provides a clean interface to write any kind of crawler/scraper/spider.With Colly you can easily extract structured data from websites, which can be used for a wide range of applications, like data mining, data processing or archiving.

jsonparser - Alternative JSON parser for Go that does not require schema (so far fastest)


It does not require you to know the structure of the payload (eg. create structs), and allows accessing fields by providing the path to them. It is up to 10 times faster than standard encoding/json package (depending on payload size and usage), allocates no memory. See benchmarks below.Originally I made this for a project that relies on a lot of 3rd party APIs that can be unpredictable and complex. I love simplicity and prefer to avoid external dependecies. encoding/json requires you to know exactly your data structures, or if you prefer to use map[string]interface{} instead, it will be very slow and hard to manage. I investigated what's on the market and found that most libraries are just wrappers around encoding/json, there is few options with own parsers (ffjson, easyjson), but they still requires you to create data structures.

Datejs - A JavaScript Date Library


Datejs is an open source JavaScript Date library for parsing, formatting and processing. Comprehensive, yet simple, stealthy and fast. Datejs has passed all trials and is ready to strike. Datejs doesn’t just parse strings, it slices them cleanly in two.

HtmlCleaner - HTML parser in Java


HtmlCleaner is HTML parser written in Java. HTML found on Web is usually dirty, ill-formed and unsuitable for further processing. HtmlCleaner reorders individual elements and produces well-formed XML. By default, it follows similar rules that the most of web browsers use in order to create Document Object Model. However, user may provide custom tag and rule set for tag filtering and balancing.

PEG.js - Parser Generator for JavaScript


PEG.js is a simple parser generator for JavaScript that produces fast parsers with excellent error reporting. You can use it to process complex data or computer languages and build transformers, interpreters, compilers and other tools easily. It integrates both lexical and syntactical analysis.

Beautiful Soup - Python HTML/XML parser


Beautiful Soup is a Python HTML/XML parser designed for quick turnaround projects like screen-scraping. Beautiful Soup parses anything you give it, and does the tree traversal stuff for you. You can tell it "Find all the links", or "Find all the links of class externalLink", or "Find all the links whose urls match "foo.com", or "Find the table heading that's got bold text, then give me that text."