abbreviation-extraction - Python3 implementation of the Schwartz-Hearst algorithm for extracting abbreviation-definition pairs

  •        14

This is a Python3 implementation of the Schwartz-Hearst algorithm for identifying abbreviations and their corresponding definitions in free text[1]. I have taken the liberty of taking Vincent's code, simplifying it a little, refactoring it for Python 3, and adding some tests.

https://github.com/philgooch/abbreviation-extraction

Tags
Implementation
License
Platform

   




Related Projects

rake-nltk - Python implementation of the Rapid Automatic Keyword Extraction algorithm using NLTK.

  •    Python

RAKE short for Rapid Automatic Keyword Extraction algorithm, is a domain independent keyword extraction algorithm which tries to determine key phrases in a body of text by analyzing the frequency of word appearance and its co-occurance with other words in the text. If you see a stopwords error, it means that you do not have the corpus stopwords downloaded from NLTK. You can download it using command below.

Gate - General Architecture for Text Engineering

  •    Java

GATE excels at text analysis of all shapes and sizes. It provides support for diverse language processing tasks such as parsers, morphology, tagging, Information Retrieval tools, Information Extraction components for various languages, and many others. It provides support to measure, evaluate, model and persist the data structure. It could analyze text or speech. It has built-in support for machine learning and also adds support for different implementation of machine learning via plugin.

RAKE - A python implementation of the Rapid Automatic Keyword Extraction

  •    Python

A Python implementation of the Rapid Automatic Keyword Extraction (RAKE) algorithm as described in: Rose, S., Engel, D., Cramer, N., & Cowley, W. (2010). Automatic Keyword Extraction from Individual Documents. In M. W. Berry & J. Kogan (Eds.), Text Mining: Theory and Applications: John Wiley & Sons. The source code is released under the MIT License.

meyda - Audio feature extraction for JavaScript.

  •    Javascript

Meyda is a Javascript audio feature extraction library. Meyda supports both offline feature extraction as well as real-time feature extraction using the Web Audio API. We wrote a paper about it, which is available here. Please see the documentation for setup and usage instructions.


flashtext - Extract Keywords from sentence or Replace keywords in sentences.

  •    Python

This module can be used to replace keywords in sentences or extract keywords from sentences. It is based on the FlashText algorithm. Documentation can be found at FlashText Read the Docs.

MITIE - MITIE: library and tools for information extraction

  •    C++

This project provides free (even for commercial use) state-of-the-art information extraction tools. The current release includes tools for performing named entity extraction and binary relation detection as well as tools for training custom extractors and relation detectors. MITIE is built on top of dlib, a high-performance machine-learning library[1], MITIE makes use of several state-of-the-art techniques including the use of distributional word embeddings[2] and Structural Support Vector Machines[3]. MITIE offers several pre-trained models providing varying levels of support for both English, Spanish, and German trained using a variety of linguistic resources (e.g., CoNLL 2003, ACE, Wikipedia, Freebase, and Gigaword). The core MITIE software is written in C++, but bindings for several other software languages including Python, R, Java, C, and MATLAB allow a user to quickly integrate MITIE into his/her own applications.

Trainable Relation Extraction framework

  •    Java

T-Rex (Trainable Relation Extraction) is a highly configurable machine learning-based Information Extraction from Text framework, which includes tools for document classification, entity extraction and relation extraction.

prose - :book: A Golang library for text processing, including tokenization, part-of-speech tagging, and named-entity extraction

  •    Go

prose is Go library for text (primarily English at the moment) processing that supports tokenization, part-of-speech tagging, named-entity extraction, and more. The library's functionality is split into subpackages designed for modular use.See the GoDoc documentation for more information.

TextBlob - Simple, Pythonic, text processing--Sentiment analysis, part-of-speech tagging, noun phrase extraction, translation, and more

  •    Python

TextBlob is a Python (2 and 3) library for processing textual data. It provides a simple API for diving into common natural language processing (NLP) tasks such as part-of-speech tagging, noun phrase extraction, sentiment analysis, classification, translation, and more. TextBlob stands on the giant shoulders of NLTK and pattern, and plays nicely with both.

nlp-with-ruby - Practical Natural Language Processing done in Ruby.

  •    Ruby

This curated list comprises awesome resources, libraries, information sources about computational processing of texts in human languages with the Ruby programming language. That field is often referred to as NLP, Computational Linguistics, HLT (Human Language Technology) and can be brought in conjunction with Artificial Intelligence, Machine Learning, Information Retrieval, Text Mining, Knowledge Extraction and other related disciplines. This list comes from our day to day work on Language Models and NLP Tools. Read why this list is awesome. Our FAQ describes the important decisions and useful answers you may be interested in.

iepy - Information Extraction in Python

  •    Python

IEPY is an open source tool for Information Extraction focused on Relation Extraction. then IEPY's task is to identify "John von Neumann" and "December 28, 1903" as the subject and object entities of the "was born in" relation.

openie-standalone - Quality information extraction at web scale. Edit

  •    Scala

This project contains the principal Open Information Extraction (Open IE) system from the University of Washington (UW). An Open IE system runs over sentences and creates extractions that represent relations in text. For example, consider the following sentence. We would not want to extract that (Barack Obama, was born, in Kenya) alone because this is not true. However, if we have the condition as well, we can have a correct extraction.

JWebPro: A Java Web Processing Toolkit

  •    Java

JWebPro: A Java tool that can interact with Google search and then process the returned Web documents in a couple of ways. The outputs can serve as inputs for NLP, IR, infor extraction, Web mining, online social network extraction/analysis applications.

treat - Natural language processing framework for Ruby.

  •    Ruby

Treat is a toolkit for natural language processing and computational linguistics in Ruby. The Treat project aims to build a language- and algorithm- agnostic NLP framework for Ruby with support for tasks such as document retrieval, text chunking, segmentation and tokenization, natural language parsing, part-of-speech tagging, keyword extraction and named entity recognition. Learn more by taking a quick tour or by reading the manual. I am actively seeking developers that can help maintain and expand this project. You can find a list of ideas for contributing to the project here.

tsfresh - Automatic extraction of relevant features from time series:

  •    Jupyter

"Time Series Feature extraction based on scalable hypothesis tests". The package contains many feature extraction methods and a robust feature selection algorithm.

NRE - Neural Relation Extraction, including CNN, PCNN, CNN+ATT, PCNN+ATT

  •    C++

Neural relation extraction aims to extract relations from plain text with neural models, which has been the state-of-the-art methods for relation extraction. In this project, we provide our implementations of CNN [Zeng et al., 2014] and PCNN [Zeng et al.,2015] and their extended version with sentence-level attention scheme [Lin et al., 2016] . Pre-Trained Word Vectors are learned from New York Times Annotated Corpus (LDC Data LDC2008T19), which should be obtained from LDC (https://catalog.ldc.upenn.edu/LDC2008T19).

OpenNRE - Neural Relation Extraction implemented in TensorFlow

  •    Python

An open-source framework for neural relation extraction. It is a TensorFlow-based framwork for easily building relation extraction models. We divide the pipeline of relation extraction into four parts, which are embedding, encoder, selector and classifier. For each part we have implemented several methods.