Displaying 1 to 20 from 26 results

pytextrank - Python implementation of TextRank for text document NLP parsing and summarization

  •    Jupyter

Python implementation of TextRank, based on the Mihalcea 2004 paper. The results produced by this implementation are intended more for use as feature vectors in machine learning, not as academic paper summaries.

node-summary - Node module that summarizes text using a naive summarization algorithm

  •    Javascript

Summarizes text using a naive summarization algorithm, based off of the Python implementation by shlomibabluki. And now with UTF8 support, thanks to xissy.

sotawhat - Returns latest research results by crawling arxiv papers and summarizing abstracts

  •    Python

This script runs using Python 3. First, install the required packages. This script only requires nltk and PyEnchant.

prose - :book: A Golang library for text processing, including tokenization, part-of-speech tagging, and named-entity extraction

  •    Go

prose is Go library for text (primarily English at the moment) processing that supports tokenization, part-of-speech tagging, named-entity extraction, and more. The library's functionality is split into subpackages designed for modular use.See the GoDoc documentation for more information.




headlines - Automatically generate headlines to short articles

  •    Jupyter

It is assumed that you already have training and test data. The data is made from many examples (I'm using 684K examples), each example is made from the text from the start of the article, which I call description (or desc), and the text of the original headline (or head). The texts should be already tokenized and the tokens separated by spaces. Once you have the data ready save it in a python pickle file as a tuple: (heads, descs, keywords) were heads is a list of all the head strings, descs is a list of all the article strings in the same order and length as heads. I ignore the keywrods information so you can place None.

TextTeaser - Automatic Summarization Algorithm

  •    Scala

TextTeaser is an automatic summarization algorithm that combines the power of natural language processing and machine learning to produce good results. It can provide provide a gist of an article, Better previews in news readers.

sum - js utility for summarizing large bodies of text using a basic sentence relevance ranking algorithm

  •    Javascript

A simple function for summarizing text e.g. for automatically determining the sentences that are most relevant to the context of the corpus. This library depends on the underscore, underscore.string and porter-stemmer. Run /tests/browser/specrunner.html in your favourite browser.

rouge - A Javascript implementation of the Recall-Oriented Understudy for Gisting Evaluation (ROUGE) evaluation metric for summaries

  •    Javascript

ROUGE is somewhat a standard metric for evaluating the performance of auto-summarization algorithms. However, with the exception of MEAD (which is written in Perl. Yes. Perl.), requesting a copy of ROUGE to work with requires one to navigate a barely functional webpage, fill up forms, and sign a legal release somewhere along the way while at it. These definitely exist for good reason, but it gets irritating when all one wishes to do is benchmark an algorithm. This should give you many lines of colorful text in your CLI. Naturally, you'll need to have Mocha installed, but you knew that already.


a41124835ed0 - Course materials for O'Reilly Media "Get Started with NLP in Python" -- see private Slack room for more details

  •    Jupyter

This repo includes the notebooks, source data, and other materials for: Get Started with Natural Language Processing in Python.

tldr - Text summarizer for golang using LexRank

  •    Go

tldr is a golang package to summarize a text automatically using lexrank algorithm. There are two main steps in lexrank, weighing, and ranking. tldr have two weighing and two ranking algorithm included, they are Jaccard coeficient and Hamming distance, then PageRank and centrality, respectively. The default settings use Hamming distance and pagerank.

StatsBase.jl - Basic statistics for Julia

  •    Julia

StatsBase.jl is a Julia package that provides basic support for statistics. Particularly, it implements a variety of statistics-related functions, such as scalar statistics, high-order moment computation, counting, ranking, covariances, sampling, and empirical density estimation.

node-pullquoter - Automatically pull interesting quotes out of an article.

  •    CoffeeScript

Automatically pull interesting quotes out of an article. Well, until now a human being had to spend several moments choosing which quotes to feature. This node module uses basic text summarization techniques to find interesting sentences to use as pull quotes automatically.

flip - 🎲 Fast, Lightweight library for Information and Probability

  •    Scala

Sketch is the probablistic data structure that quickly measures the probalility density for the real number random variable data stream with limited memory without prior knowledge. Simply put, Sketch is a special histogram in which the width of each bin is adaptively adjusted to the input data stream, unlike conventional histograms, which require the user to specify the width and start/end point of the bin. It follows the change of probability distribution, and adapts to the sudden/incremental concept drift. Also, more than two Sketch can be combined in monadic way. This is what we call the probability monad in functional programming. Sketch is a better alternative to kernel density estimation and histogram in most cases. Here is an example of how Sketch estimates the density using the dataset sampled from the standard normal distribution.

technical-articles - Technical Pieces collected in practices

  •    Go

These repo collects some technical summaries in my daily work.

text-summarization-and-visualization-using-watson-studio - Can we quickly summarize & visualize text to get the details about the unstructured data? Yes we can! Please review this code pattern for all the steps involved to quickly summarize & visualize the data

  •    Jupyter

We will demonstrate a methodology to summarize & visualize text using Watson Studio. Text summarization is the process of creating a short and coherent version of a longer document. There are two methods to summarize the text, extractive & abstractive summarization. We will focus on extractive summarization which involves the selection of phrases and sentences from the source document to make up the new summary. Techniques involve ranking the relevance of phrases in order to choose only those most relevant to the meaning of the source. Some of the advantages of text summarization are below. We will also demonstrate different methods to visualize the data which can aid in providing quick peek of the data. Summaries reduce reading time. When researching documents, summaries make the selection process easier.Text summarization improves the effectiveness of indexing.Text summarization algorithms are less biased than human summarizers. Personalized summaries are useful in question-answering systems as they provide personalized information.Using automatic or semi-automatic summarization systems enables commercial abstract services to increase the number of texts they are able to process.

pythonrouge - Python wrapper for evaluating summarization quality by ROUGE package

  •    Perl

This is the python wrapper to use ROUGE, summarization evaluation toolkit. In this implementation, you can evaluate various types of ROUGE metrics. You can evaluate your system summaries with reference summaries right now. It's not necessary to make an xml file as in the general ROUGE package. However, you can evaluate ROUGE scores in a standard way if you saved system summaries and reference summaries in specific directories. In the document summarization research, recall or F-measure of ROUGE metrics is used in most cases. So you can choose either recall or F-measure or both of these of ROUGE evaluation result for convenience.