ANTLR - ANother Tool for Language Recognition

  •        1850

ANTLR (ANother Tool for Language Recognition) is a powerful parser generator for reading, processing, executing, or translating structured text or binary files. It's widely used to build languages, tools, and frameworks. From a grammar, ANTLR generates a parser that can build and walk parse trees. Twitter search uses ANTLR for query parsing, with over 2 billion queries a day.

The languages for Hive and Pig, the data warehouse and analysis systems for Hadoop, both use ANTLR. Lex Machina uses ANTLR for information extraction from legal texts. Oracle uses ANTLR within SQL Developer IDE and their migration tools. NetBeans IDE parses C++ with ANTLR. The HQL language in the Hibernate object-relational mapping framework is built with ANTLR.



Related Projects

treetop - A Ruby-based parsing DSL based on parsing expression grammars.

  •    Ruby

Languages can be split into two components, their syntax and their semantics. It's your understanding of English syntax that tells you the stream of words "Sleep furiously green ideas colorless" is not a valid sentence. Semantics is deeper. Even if we rearrange the above sentence to be "Colorless green ideas sleep furiously", which is syntactically correct, it remains nonsensical on a semantic level. With Treetop, you'll be dealing with languages that are much simpler than English, but these basic concepts apply. Your programs will need to address both the syntax and the semantics of the languages they interpret. Treetop equips you with powerful tools for each of these two aspects of interpreter writing. You'll describe the syntax of your language with a parsing expression grammar. From this description, Treetop will generate a Ruby parser that transforms streams of characters written into your language into abstract syntax trees representing their structure. You'll then describe the semantics of your language in Ruby by defining methods on the syntax trees the parser generates.

sling - SLING - A natural language frame semantics parser

  •    C++

SLING is a parser for annotating text with frame semantic annotations. It is trained on an annotated corpus using Tensorflow and Dragnn.The parser is a general transition-based frame semantic parser using bi-directional LSTMs for input encoding and a Transition Based Recurrent Unit (TBRU) for output decoding. It is a jointly trained model using only the text tokens as input and the transition system has been designed to output frame graphs directly without any intervening symbolic representation.

language - A fast PEG parser written in JavaScript with first class errors

  •    Objective-J

Language.js is an experimental new parser based on PEG (Parsing Expression Grammar), with the special addition of the "naughty OR" operator to handle errors in a unique new way. It makes use of memoization to achieve linear time parsing speed, and support for automatic cut placement is coming to maintain mostly constant space as well (for a discussion of cut operators see: The most unique addition Language.js makes to PEG is how it handles errors. No parse ever fails in Language.js, instead SyntaxErrorNodes are placed into the resultant tree. This makes it trivial to do things like write syntax highlighters that have live error reporting. This also means that Language.js is very competent at handling multiple errors (as opposed to aborting on the first one that is reached).

Scalable Language API

  •    Java

Scalable Language API (SLAPI) The most comprehensive architecture for conversational natural-language applications including speech recognition/synthesis, semantics, amp; machine translation. Used on Android amp; other mobile app platforms.

duckling - Probabilistic parser

  •    Clojure

Duckling is a Clojure library that parses text into structured data: “the first Tuesday of October” => {:value "2014-10-07T00:00:00.000-07:00" :grain :day}<div class="doc-website-link"> <p>You can try it out at <a href=""></a></p></div>See our [blog post announcement]( for more context.Duckling is shipped with modules that parse temporal expressions i

zetasql - ZetaSQL - Analyzer Framework for SQL

  •    C++

ZetaSQL defines a language (grammar, types, data model, and semantics) as well as a parser and analyzer. It is not itself a database or query engine. Instead it is intended to be used by multiple engines wanting to provide consistent behavior for all semantic analysis, name resolution, type checking, implicit casting, etc. Specific query engines may not implement all features in the ZetaSQL language and may give errors if specific features are not supported. For example, engine A may not support any updates and engine B may not support analytic functions. Until all this code is released, we cannot provide any guarantees of API stability and cannot accept contributions. We will also be releasing more documentation over time, particular related to developing engines with this framework. Documentation on the language itself is fairly complete.

ANTLR C# grammar


This project will generate a C# 4.0 parser using ANTLR v3.2. Then you can add your own logic to do what you want. It generates CSharp target and has a reasonably small amount of code for the preprocessor in the lexar.

spaCy - 💫 Industrial-strength Natural Language Processing (NLP) with Python and Cython

  •    Python

spaCy is a library for advanced Natural Language Processing in Python and Cython. It's built on the very latest research, and was designed from day one to be used in real products. spaCy comes with pre-trained statistical models and word vectors, and currently supports tokenization for 20+ languages. It features the fastest syntactic parser in the world, convolutional neural network models for tagging, parsing and named entity recognition and easy deep learning integration. It's commercial open-source software, released under the MIT license. 💫 Version 2.0 out now! Check out the new features here.

snips-nlu - Snips Python library to extract meaning from text

  •    Python

Snips NLU (Natural Language Understanding) is a Python library that allows to parse sentences written in natural language and extracts structured information. To find out how to use Snips NLU please refer to our documentation, it will provide you with a step-by-step guide on how to use and setup our library.

Yet Another Compiler Compiler Language

  •    Java

YACCL is a simple Java recursive descent parser, in the tradition of other RDP technologies such as ANTLR and JavaCC. It differs from these in a number of ways: it is smaller and simpler (though not necessarily faster).

Sprache - Tiny C# Monadic Parser Framework

  •    CSharp

Sprache is a simple, lightweight library for constructing parsers directly in C# code.It doesn't compete with "industrial strength" language workbenches - it fits somewhere in between regular expressions and a full-featured toolset like ANTLR.

EasyOCR - Java OCR 识别组件(基于Tesseract OCR 引擎)。能自动完成图片清理、识别 CAPTCHA 验证码图片内容的一体化工作。Java Image cleanup, OCR recognition component (based Tesseract OCR engine, automatically cleanup image and identification CAPTCHA verification code picture content)


EasyOCR is a Java language using OCR recognition engine (based Tesseract). By means of a few simple API, the Java language can be used to complete the picture content identification work. And integrated image cleanup, recognition CAPTCHA image, bill notes and other content integration efforts. EasyOCR engine supports plugin programming, ETD templates support, provide a graphical ETD template design tools (EasyTemplateDesigner GUI). EasyOCR not only provide services for consumers, but mainly oriented to provide localized development SDK integration with C/S, B/S and Android mobile terminal native integration projects.

Multi-Language Words Memorizer

  •    DotNet

Language learning tool. This .net application is designed for learning words and help foreign language learners by lots of automatic features.

Verilog 2005 parser

  •    Java

Verilog 2005 synthesizable subset parser built on ANTLR framework. Branch continues to evolve here:


  •    Java

JFlex is a lexical analyzer generator (also known as scanner generator). It is also a rewrite of the very useful tool JLex which was developed by Elliot Berk at Princeton University. JFlex is designed to work together with the LALR parser generator CUP by Scott Hudson, and the Java modification of Berkeley Yacc BYacc/J by Bob Jamison. It can also be used together with other parser generators like ANTLR or as a standalone tool.

grammars-v4 - Grammars written for ANTLR v4; expectation that the grammars are free of actions.

  •    ANTLR

This repository is a collection of Antlr4 grammars. The root directory name is the all-lowercase name of the language parsed by the grammar. For example, java, cpp, csharp, c, etc...

delta - DELTA is a deep learning based natural language and speech processing platform.

  •    Python

DELTA is a deep learning based end-to-end natural language and speech processing platform. DELTA aims to provide easy and fast experiences for using, deploying, and developing natural language processing and speech models for both academia and industry use cases. DELTA is mainly implemented using TensorFlow and Python 3. For details of DELTA, please refer to this paper.

We have large collection of open source products. Follow the tags from Tag Cloud >>

Open source products are scattered around the web. Please provide information about the open source projects you own / you use. Add Projects.