Displaying 1 to 14 from 14 results

snips-nlu - Snips Python library to extract meaning from text

  •    Python

Snips NLU (Natural Language Understanding) is a Python library that allows to parse sentences written in natural language and extracts structured information. To find out how to use Snips NLU please refer to our documentation, it will provide you with a step-by-step guide on how to use and setup our library.

MITIE - MITIE: library and tools for information extraction

  •    C++

This project provides free (even for commercial use) state-of-the-art information extraction tools. The current release includes tools for performing named entity extraction and binary relation detection as well as tools for training custom extractors and relation detectors. MITIE is built on top of dlib, a high-performance machine-learning library[1], MITIE makes use of several state-of-the-art techniques including the use of distributional word embeddings[2] and Structural Support Vector Machines[3]. MITIE offers several pre-trained models providing varying levels of support for both English, Spanish, and German trained using a variety of linguistic resources (e.g., CoNLL 2003, ACE, Wikipedia, Freebase, and Gigaword). The core MITIE software is written in C++, but bindings for several other software languages including Python, R, Java, C, and MATLAB allow a user to quickly integrate MITIE into his/her own applications.

Snorkel - A system for quickly generating training data with weak supervision

  •    Jupyter

Snorkel is a system for rapidly creating, modeling, and managing training data, currently focused on accelerating the development of structured or "dark" data extraction applications for domains in which large labeled training sets are not available or easy to obtain. <BR><BR> Today's state-of-the-art machine learning models require massive labeled training sets--which usually do not exist for real-world applications. Instead, Snorkel is based around the new data programming paradigm, in which the developer focuses on writing a set of labeling functions, which are just scripts that programmatically label data. The resulting labels are noisy, but Snorkel automatically models this process—learning, essentially, which labeling functions are more accurate than others—and then uses this to train an end model (for example, a deep neural network in TensorFlow).

dt - DNS tool - display information about your domain

  •    Go

DNS tool that displays information about your domain. Feedback, issues and PR's are welcome.

ChemDataExtractor - Automatically extract chemical information from scientific documents

  •    Python

ChemDataExtractor is a toolkit for extracting chemical information from the scientific literature. Alternatively, try one of the other installation options.

abbreviation-extraction - Python3 implementation of the Schwartz-Hearst algorithm for extracting abbreviation-definition pairs

  •    Python

This is a Python3 implementation of the Schwartz-Hearst algorithm for identifying abbreviations and their corresponding definitions in free text[1]. I have taken the liberty of taking Vincent's code, simplifying it a little, refactoring it for Python 3, and adding some tests.

dig-etl-engine - Download DIG to run on your laptop or server.


myDIG is a tool to build pipelines that crawl the web, extract information, build a knowledge graph (KG) from the extractions and provide an easy to user interface to query the KG. The project web page is DIG. You can install myDIG in a laptop or server and use it to build a domain specific search application for any corpus of web pages, CSV, JSON and a variety of other files.

We have large collection of open source products. Follow the tags from Tag Cloud >>

Open source products are scattered around the web. Please provide information about the open source projects you own / you use. Add Projects.