mhtml-parser - A MHTML Parser

  •        25

A MHTML(.mht) file parser. Only tested in .mht converted by Word 2016.

https://github.com/zsxsoft/mhtml-parser#readme

Dependencies:

iconv-lite : ^0.4.13
linebyline : ^1.3.0
object-assign : ^4.0.1
quoted-printable : ^1.0.0

Tags
Implementation
License
Platform

   




Related Projects

Aspose.Words for Java Examples

  •    

This project contains example code for Aspose.Words for Java. Aspose.Words is a class library for generating, converting and rendering wordprocessing documents. Aspose.Words supports DOC, OOXML, RTF, HTML, MHTML, TXT, OpenDocument, PDF, XPS, EPUB, SWF, SVG, Image, printing ...

Aspose.Words for .NET Examples

  •    DotNet

This project contains example code for Aspose.Words for .NET. Aspose.Words is a class library for generating, converting and rendering wordprocessing documents. Aspose.Words supports DOC, OOXML, RTF, HTML, MHTML, TXT, OpenDocument, PDF, XPS, EPUB, SWF, SVG, Image, printing ...

jammit - Industrial Strength Asset Packaging for Rails

  •    Ruby

Jammit is an industrial-strength asset packaging library for Rails, providing both the CSS and JavaScript concatenation and compression that you'd expect, as well as ahead-of-time gzipping, built-in JavaScript template support, and optional Data-URI / MHTML image embedding. Jammit is released under the MIT license (see the LICENSE file).

parser-lib - Collection of parsers written in JavaScript

  •    Javascript

The ParserLib CSS parser is a CSS3 SAX-inspired parser written in JavaScript. It handles standard CSS syntax as well as validation (checking of property names and values) although it is not guaranteed to thoroughly validate all possible CSS properties.The CSS parser is built for a number of different JavaScript environments. The most recently released version of the parser can be found in the dist directory when you check out the repository; run npm run build to regenerate them from the latest sources.

http-parser - http request/response parser for c

  •    C

This is a parser for HTTP messages written in C. It parses both requests and responses. The parser is designed to be used in performance HTTP applications. It does not make any syscalls nor allocations, it does not buffer data, it can be interrupted at anytime. Depending on your architecture, it only requires about 40 bytes of data per message stream (in a web server that is per connection).


ANLP - Another .NET Lexer Parser

  •    Silverlight

This project aims to have a lexer/parser working in Silverlight and help people to write their own grammar and make the lexer/parser available in Silverlight.

yacy_grid_parser - Parser Microservice for the YaCy Grid

  •    Java

The Parser is a microservices which can be deployed i.e. using Docker. When the Parser Component is started, it searches for a MCP and connects to it. By default the local host is searched for a MCP but you can configure one yourself. The Parser is able to read a WARC file and parses it's content. The content is analyzed, the plain text, links, images and more entities are extracted. The result is stored in a JSON Object. Calling the parser will generate a list of JSON Objects, each containing the analyzed content of one internet resource. The parser understands not only HTML but also a wide range of different document formats, including PDF, all OpenOffice and MS Office document formats and much more.

http-parser - HTTP request/response parser for python in C

  •    C

HTTP request/response parser for Python compatible with Python 2.x (>=2.6), Python 3 and Pypy. If possible a C parser based on http-parser from Ryan Dahl will be used. http-parser is under the MIT license.

pikkr - JSON parser which picks up values directly without performing tokenization in Rust

  •    Rust

Pikkr is a JSON parser which picks up values directly without performing tokenization in Rust. This JSON parser is implemented based on Y. Li, N. R. Katsipoulakis, B. Chandramouli, J. Goldstein, and D. Kossmann. Mison: a fast JSON parser for data analytics. In VLDB, 2017. This JSON parser performs well when there are a limited number of different JSON structural variants in a JSON data stream or JSON collection, and that is a common case in data analytics field.

typescript-eslint-parser - An ESLint custom parser which leverages TypeScript ESTree to allow for ESLint to lint TypeScript source code

  •    Javascript

An ESLint custom parser which leverages TypeScript ESTree to allow for ESLint to lint TypeScript source code. There is sometimes an incorrect assumption that the parser itself is what does everything necessary to facilitate the use of ESLint with TypeScript. In actuality, it is the combination of the parser and one or more plugins which allow you to maximize your usage of ESLint with TypeScript.

mercury-parser - 📜 Extract meaningful content from the chaos of a web page

  •    Javascript

Postlight's Mercury Parser extracts the bits that humans care about from any URL you give it. That includes article content, titles, authors, published dates, excerpts, lead images, and more. Mercury Parser powers the Mercury AMP Converter and Mercury Reader, a Chrome extension that removes ads and distractions, leaving only text and images for a beautiful reading view on any site.

Mathos Parser

  •    

Mathos Parser is a math parser, which allows to parse all kinds of expressions, add custom functions, variables, and operators, and define their behaviour.

tolerant-php-parser - An early-stage PHP parser designed for IDE usage scenarios.

  •    PHP

This is an early-stage PHP parser designed, from the beginning, for IDE usage scenarios (see Design Goals for more details). There is still a ton of work to be done, so at this point, this repo mostly serves as an experiment and the start of a conversation.After you've configured your machine, you can use the parser to generate and work with the Abstract Syntax Tree (AST) via a friendly API.

Neko HTML Parser - simple HTML scanner

  •    Java

NekoHTML is a simple HTML scanner and tag balancer that enables application programmers to parse HTML documents and access the information using standard XML interfaces. The parser can scan HTML files and fix up many common mistakes that human (and computer) authors make in writing HTML documents. NekoHTML adds missing parent elements. Automatically closes elements with optional end tags and can handle mismatched inline element tags.

TagSoup - HTML/XML parser for Haskell

  •    Haskell

TagSoup is a library for parsing HTML/XML. It supports the HTML 5 specification, and can be used to parse either well-formed XML, or unstructured and malformed HTML from the web. The library also provides useful functions to extract information from an HTML document, making it ideal for screen-scraping.

TagSoup - SAX-compliant parser in Java

  •    Java

TagSoup, a SAX-compliant parser written in Java that, instead of parsing well-formed or valid XML, parses HTML as it is found in the wild: poor, nasty and brutish, though quite often far from short. TagSoup is designed for people who have to process this stuff using some semblance of a rational application design. TagSoup also includes a command-line processor that reads HTML files and can generate either clean HTML or well-formed XML that is a close approximation to XHTML.

Arbica

  •    C++

Arabica is an XML and HTML processing toolkit, providing SAX, DOM, XPath, and partial XSLT implementations, written in Standard C++.

peg - Peg, Parsing Expression Grammar, is an implementation of a Packrat parser generator.

  •    Go

Peg, Parsing Expression Grammar, is an implementation of a Packrat parser generator. A Packrat parser is a descent recursive parser capable of backtracking. The generated parser searches for the correct parsing of the input.

http-parser - http request/response parser for c

  •    C

This is a parser for HTTP messages written in C. It parses both requests and responses. The parser is designed to be used in performance HTTP applications. It does not make any syscalls nor allocations, it does not buffer data, it can be interrupted at anytime. Depending on your architecture, it only requires about 40 bytes of data per message stream (in a web server that is per connection).When data is received on the socket execute the parser and check for errors.

pegjs - PEG.js: Parser generator for JavaScript

  •    Javascript

PEG.js is a simple parser generator for JavaScript that produces fast parsers with excellent error reporting. You can use it to process complex data or computer languages and build transformers, interpreters, compilers and other tools easily.Online version is the easiest way to generate a parser. Just enter your grammar, try parsing few inputs, and download generated parser code.