lark - A modern parsing library for Python, implementing Earley & LALR(1) and an easy interface

  •        594

Beginners: Lark is not just another parser. It can parse any grammar you throw at it, no matter how complicated or ambiguous, and do so efficiently. It also constructs a parse-tree for you, without additional code on your part. Experts: Lark lets you choose between Earley and LALR(1), to trade-off power and speed. It also contains a CYK parser and experimental features such as a contextual-lexer.



Related Projects

lark - Lark is a parsing toolkit for Python, built with a focus on ergonomics, performance and modularity

  •    Python

Lark is a parsing toolkit for Python, built with a focus on ergonomics, performance and modularity. Lark can parse all context-free languages. To put it simply, it means that it is capable of parsing almost any programming language out there, and to some degree most natural languages too.

PEGTL - Parsing Expression Grammar Template Library

  •    C++

The Parsing Expression Grammar Template Library (PEGTL) is a zero-dependency C++ header-only parser combinator library for creating parsers according to a Parsing Expression Grammar (PEG). Grammars are written as regular C++ code, created with template programming (not template meta programming), i.e. nested template instantiations that naturally correspond to the inductive definition of PEGs (and other parser-combinator approaches).

ResumeParser - Resume Parser using rule based approach. Developed using framework provided by GATE

  •    HTML

Parser that extracts information from any resume and converts into a structured .json format to be used by internal systems. The parser uses a rule-based approach that focuses on semantic rather than syntactic parsing. The parser can handle document types in .pdf, .txt, .doc and .docx (Microsoft word). In its current form, this application is a console based application. Parse uses the Engligh grammar engine provided by GATE through its ANNIE framework. The output is then transduced using the grammar rules and lists specifically written for resume parsing. The JAPE grammar defines a generic set of rules that complies with popular ways of resume writing. It takes Proper nouns from lists and applies them to rules to identify entities. Explore the source code and read about GATE for more details. Also, feel free to pose questions.

language - A fast PEG parser written in JavaScript with first class errors

  •    Objective-J

Language.js is an experimental new parser based on PEG (Parsing Expression Grammar), with the special addition of the "naughty OR" operator to handle errors in a unique new way. It makes use of memoization to achieve linear time parsing speed, and support for automatic cut placement is coming to maintain mostly constant space as well (for a discussion of cut operators see: The most unique addition Language.js makes to PEG is how it handles errors. No parse ever fails in Language.js, instead SyntaxErrorNodes are placed into the resultant tree. This makes it trivial to do things like write syntax highlighters that have live error reporting. This also means that Language.js is very competent at handling multiple errors (as opposed to aborting on the first one that is reached).

AngleSharp - The ultimate angle brackets parser library parsing HTML5, MathML, SVG and CSS to construct a DOM based on the official W3C specifications

  •    CSharp

AngleSharp is a .NET library that gives you the ability to parse angle bracket based hyper-texts like HTML, SVG, and MathML. XML without validation is also supported by the library. An important aspect of AngleSharp is that CSS can also be parsed. The included parser is built upon the official W3C specification. This produces a perfectly portable HTML5 DOM representation of the given source code and ensures compatibility with results in evergreen browsers. Also standard DOM features such as querySelector or querySelectorAll work for tree traversal.

eu.h8me.Parsing - (G)LR parsing/lexing library for .NET


Library implementing a regular expression based lexer and LR(1) based parser generator, using inline code to define lexer and grammar rules. The generated parser uses a GLR parsing engine to allow for ambiguous grammars.


  •    Python

The pyparsing module is an alternative approach to creating and executing simple grammars, vs. the traditional lex/yacc approach, or the use of regular expressions. The pyparsing module provides a library of classes that client code uses to construct the grammar directly in Python code. The Python representation of the grammar is quite readable, owing to the self-explanatory class names, and the use of '+', '|' and '^' operator definitions.

ClearParse Open Edition

  •    C

The fast, flexible parsing engine. Parse anything in 4 steps: (1) define a grammar, (2) load the grammar into ClearParse, (3) call the engine to parse the source, and (4) traverse the parsing tree. You can even change your grammar at run time.

peg - Peg, Parsing Expression Grammar, is an implementation of a Packrat parser generator.

  •    Go

Peg, Parsing Expression Grammar, is an implementation of a Packrat parser generator. A Packrat parser is a descent recursive parser capable of backtracking. The generated parser searches for the correct parsing of the input.

chevrotain - Parser Building Toolkit for JavaScript

  •    TypeScript

Chevrotain is a blazing fast and feature rich Parser Building Toolkit for JavaScript. It can be used to build parsers/compilers/interpreters for various use cases ranging from simple configuration files, to full fledged programing languages. A more in depth description of Chevrotain can be found in this great article on: Parsing in JavaScript: Tools and Libraries.

gofeed - Parse RSS and Atom feeds in Go

  •    Go

The gofeed library is a robust feed parser that supports parsing both RSS and Atom feeds. The universal gofeed.Parser will parse and convert all feed types into a hybrid gofeed.Feed model. You also have the option of parsing them into their respective atom.Feed and rss.Feed models using the feed specific atom.Parser or rss.Parser.It also provides support for parsing several popular predefined extension modules, including Dublin Core and Apple’s iTunes, as well as arbitrary extensions. See the Extensions section for more details.

rust-peg - Parsing Expression Grammar (PEG) parser generator for Rust

  •    Rust

This is a simple parser generator based on Parsing Expression Grammars. Please see the release notes for updates.

kotlin-argparser - Easy to use and concise yet powerful and robust command line argument parsing for Kotlin

  •    Kotlin

This is a library for parsing command-line arguments. It can parse both options and positional arguments. It aims to be easy to use and concise yet powerful and robust. An instance of MyArgs will represent the set of parsed arguments. Each option and positional argument is declared as a property that delegates through a delegate factory method on an instance of ArgParser.

neotoma - Erlang library and packrat parser-generator for parsing expression grammars.

  •    Erlang

Neotoma is a packrat parser-generator for Erlang for Parsing Expression Grammars (PEGs). It consists of a parsing-combinator library with memoization routines, a parser for PEGs, and a utility to generate parsers from PEGs. It is inspired by treetop, a Ruby library with similar aims, and parsec, the parser-combinator library for Haskell. Neotoma is licensed under the MIT License (see LICENSE).

TagSoup - HTML/XML parser for Haskell

  •    Haskell

TagSoup is a library for parsing HTML/XML. It supports the HTML 5 specification, and can be used to parse either well-formed XML, or unstructured and malformed HTML from the web. The library also provides useful functions to extract information from an HTML document, making it ideal for screen-scraping.

clap - A full featured, fast Command Line Argument Parser for Rust

  •    Rust

It is a simple-to-use, efficient, and full-featured library for parsing command line arguments and subcommands when writing console/terminal applications. clap is used to parse and validate the string of command line arguments provided by a user at runtime. You provide the list of valid possibilities, and clap handles the rest. This means you focus on your applications functionality, and less on the parsing and validating of arguments.

parboiled2 - A macro-based PEG parser generator for Scala 2.10+

  •    Scala

parboiled2 is a Scala 2.11+ library enabling lightweight and easy-to-use, yet powerful, fast and elegant parsing of arbitrary input text. It implements a macro-based parser generator for Parsing Expression Grammars (PEGs), which runs at compile time and translates a grammar rule definition (written in an internal Scala DSL) into corresponding JVM bytecode.PEGs are an alternative to Context-Free Grammars (CFGs) for formally specifying syntax, they make a good replacement for regular expressions and have some advantages over the "traditional" way of building parsers via CFGs (like not needing a separate lexer/scanner phase).

Detect.js - :mag: Library to detect browser, os and device based on the UserAgent String

  •    Javascript

❗️ NOTE: THIS PLUGIN IS NO LONGER MAINTAINED. If you encounter a bug then you're probably on your own. Try Bowser as an alternative. Note: Detect.js is a JavaScript library to detect platforms, versions, manufacturers and types based on the navigator.userAgent string. This code is based on, and modified from, the original work of Tobie Langel's UA-Parser: UA-Parser is subsequently a port of BrowserScope's user agent string parser.

megaparsec - Industrial-strength monadic parser combinator library

  •    Haskell

This is an industrial-strength monadic parser combinator library. Megaparsec is a feature-rich package that strikes a nice balance between speed, flexibility, and quality of parse errors. The project provides flexible solutions to satisfy common parsing needs. The section describes them shortly. If you're looking for comprehensive documentation, see the section about documentation.

pigeon - Command pigeon generates parsers in Go from a PEG grammar.

  •    Go

The pigeon command generates parsers based on a parsing expression grammar (PEG). Its grammar and syntax is inspired by the PEG.js project, while the implementation is loosely based on the parsing expression grammar for C# 3.0 article. It parses Unicode text encoded in UTF-8. See the godoc page for detailed usage. Also have a look at the Pigeon Wiki for additional information about Pigeon and PEG in general.

We have large collection of open source products. Follow the tags from Tag Cloud >>

Open source products are scattered around the web. Please provide information about the open source projects you own / you use. Add Projects.