rake-nltk - Python implementation of the Rapid Automatic Keyword Extraction algorithm using NLTK.

  •        191

RAKE short for Rapid Automatic Keyword Extraction algorithm, is a domain independent keyword extraction algorithm which tries to determine key phrases in a body of text by analyzing the frequency of word appearance and its co-occurance with other words in the text. If you see a stopwords error, it means that you do not have the corpus stopwords downloaded from NLTK. You can download it using command below.




Related Projects

RAKE - A python implementation of the Rapid Automatic Keyword Extraction

  •    Python

A Python implementation of the Rapid Automatic Keyword Extraction (RAKE) algorithm as described in: Rose, S., Engel, D., Cramer, N., & Cowley, W. (2010). Automatic Keyword Extraction from Individual Documents. In M. W. Berry & J. Kogan (Eds.), Text Mining: Theory and Applications: John Wiley & Sons. The source code is released under the MIT License.

TextBlob - Simple, Pythonic, text processing--Sentiment analysis, part-of-speech tagging, noun phrase extraction, translation, and more

  •    Python

TextBlob is a Python (2 and 3) library for processing textual data. It provides a simple API for diving into common natural language processing (NLP) tasks such as part-of-speech tagging, noun phrase extraction, sentiment analysis, classification, translation, and more. TextBlob stands on the giant shoulders of NLTK and pattern, and plays nicely with both.

text-analytics-with-python - Learn how to process, classify, cluster, summarize, understand syntax, semantics and sentiment of text data with the power of Python! This repository contains code and datasets used in my book, "Text Analytics with Python" published by Apress/Springer

  •    Python

Derive useful insights from your data using Python. Learn the techniques related to natural language processing and text analytics, and gain the skills to know which technique is best suited to solve a particular problem. A structured and comprehensive approach is followed in this book so that readers with little or no experience do not find themselves overwhelmed. You will start with the basics of natural language and Python and move on to advanced analytical and machine learning concepts. You will look at each technique and algorithm with both a bird's eye view to understand how it can be used as well as with a microscopic view to understand the mathematical concepts and to implement them to solve your own problems.

nltk-trainer - Train NLTK objects with zero code

  •    Python

Train NLTK objects with zero code

nltk - NLTK Source

  •    Python

NLTK Source

flashtext - Extract Keywords from sentence or Replace keywords in sentences.

  •    Python

This module can be used to replace keywords in sentences or extract keywords from sentences. It is based on the FlashText algorithm. Documentation can be found at FlashText Read the Docs.

nltk_book - NLTK Book

  •    Python


WikiQuiz - Generates a quiz for a Wikipedia page using parts of speech and text chunking.

  •    CSS

ie. if the answer is '1960s', show '1950s' as another option. Ignoring the less text heavy parts of a Wikipedia page.

ann-writer - An artificial machine learning program that attempts to impersonate the writing style of any given text training set

  •    Python

Machine learning that utilises sk-learn, numpy and nltk in an attempt to generate text in the style of any given training data. Written in python3. Dataset has to contain more words than the training range (default = 3).

treat - Natural language processing framework for Ruby.

  •    Ruby

Treat is a toolkit for natural language processing and computational linguistics in Ruby. The Treat project aims to build a language- and algorithm- agnostic NLP framework for Ruby with support for tasks such as document retrieval, text chunking, segmentation and tokenization, natural language parsing, part-of-speech tagging, keyword extraction and named entity recognition. Learn more by taking a quick tour or by reading the manual. I am actively seeking developers that can help maintain and expand this project. You can find a list of ideas for contributing to the project here.

BlingFire - A lightning fast Finite State machine and REgular expression manipulation library.

  •    C++

Hi, we are a team at Microsoft called Bling (Beyond Language Understanding), we help Bing be smarter. Here we wanted to share with all of you our FInite State machine and REgular expression manipulation library (FIRE). We use Fire for many linguistic operations inside Bing such as Tokenization, Multi-word expression matching, Unknown word-guessing, Stemming / Lemmatization just to mention a few. Bling Fire Tokenizer is a tokenizer designed for fast-speed and quality tokenization of Natural Language text. It mostly follows the tokenization logic of NLTK, except hyphenated words are split and a few errors are fixed.

practical-machine-learning-with-python - Master the essential skills needed to recognize and solve complex real-world problems with Machine Learning and Deep Learning by leveraging the highly popular Python Machine Learning Eco-system

  •    Jupyter

"Data is the new oil" is a saying which you must have heard by now along with the huge interest building up around Big Data and Machine Learning in the recent past along with Artificial Intelligence and Deep Learning. Besides this, data scientists have been termed as having "The sexiest job in the 21st Century" which makes it all the more worthwhile to build up some valuable expertise in these areas. Getting started with machine learning in the real world can be overwhelming with the vast amount of resources out there on the web. "Practical Machine Learning with Python" follows a structured and comprehensive three-tiered approach packed with concepts, methodologies, hands-on examples, and code. This book is packed with over 500 pages of useful information which helps its readers master the essential skills needed to recognize and solve complex problems with Machine Learning and Deep Learning by following a data-driven mindset. By using real-world case studies that leverage the popular Python Machine Learning ecosystem, this book is your perfect companion for learning the art and science of Machine Learning to become a successful practitioner. The concepts, techniques, tools, frameworks, and methodologies used in this book will teach you how to think, design, build, and execute Machine Learning systems and projects successfully.

Content-Based Cross-Site Web Data Mining


Content-Based Cross-Site Mining (CCM) of Web Data Records algorithm combines techniques of extracting data records based on the structure of documents (HTML tags) with an analysis of the semantics of the content for better data record extraction

WebHarvest - web data extraction tool

  •    Java

Web data extraction (web data mining, web scraping) tool. It leverages well proved XML and text processing techologies in order to easely extract useful data from arbitrary web pages.

tidytext - Text mining using dplyr, ggplot2, and other tidy tools :sparkles::page_facing_up::sparkles::page_facing_up::sparkles:

  •    R

Using tidy data principles can make many text mining tasks easier, more effective, and consistent with tools already in wide use. Much of the infrastructure needed for text mining with tidy data frames already exists in packages like dplyr, broom, tidyr and ggplot2. In this package, we provide functions and supporting data sets to allow conversion of text to and from tidy formats, and to switch seamlessly between tidy tools and existing text mining packages. Check out our book to learn more about text mining using tidy data principles. This function uses the tokenizers package to separate each line into words. The default tokenizing is for words, but other options include characters, n-grams, sentences, lines, paragraphs, or separation around a regex pattern.

sumy - Module for automatic summarization of text documents and HTML pages.

  •    Python

Sumy contains command line utility for quick summarization of documents. Or you can use sumy like a library in your project. Create file sumy_example.py (don't name it sumy.py) with the code below to test it.


  •    Java

GraphSpider is a pattern matcher which searches parsed text in phrase-structure tree or dependency graph format for syntactic structures matching a set of patterns in MPL, a regexp-like pattern language. Applications: information extraction, text mining.


  •    Python

A text mining system for extraction of protein-protein interactions from biomedical text.