Displaying 1 to 19 from 19 results

WikiQuiz - Generates a quiz for a Wikipedia page using parts of speech and text chunking.

  •    CSS

ie. if the answer is '1960s', show '1950s' as another option. Ignoring the less text heavy parts of a Wikipedia page.

TextBlob - Simple, Pythonic, text processing--Sentiment analysis, part-of-speech tagging, noun phrase extraction, translation, and more

  •    Python

TextBlob is a Python (2 and 3) library for processing textual data. It provides a simple API for diving into common natural language processing (NLP) tasks such as part-of-speech tagging, noun phrase extraction, sentiment analysis, classification, translation, and more. TextBlob stands on the giant shoulders of NLTK and pattern, and plays nicely with both.

practical-machine-learning-with-python - Master the essential skills needed to recognize and solve complex real-world problems with Machine Learning and Deep Learning by leveraging the highly popular Python Machine Learning Eco-system

  •    Jupyter

"Data is the new oil" is a saying which you must have heard by now along with the huge interest building up around Big Data and Machine Learning in the recent past along with Artificial Intelligence and Deep Learning. Besides this, data scientists have been termed as having "The sexiest job in the 21st Century" which makes it all the more worthwhile to build up some valuable expertise in these areas. Getting started with machine learning in the real world can be overwhelming with the vast amount of resources out there on the web. "Practical Machine Learning with Python" follows a structured and comprehensive three-tiered approach packed with concepts, methodologies, hands-on examples, and code. This book is packed with over 500 pages of useful information which helps its readers master the essential skills needed to recognize and solve complex problems with Machine Learning and Deep Learning by following a data-driven mindset. By using real-world case studies that leverage the popular Python Machine Learning ecosystem, this book is your perfect companion for learning the art and science of Machine Learning to become a successful practitioner. The concepts, techniques, tools, frameworks, and methodologies used in this book will teach you how to think, design, build, and execute Machine Learning systems and projects successfully.

text-analytics-with-python - Learn how to process, classify, cluster, summarize, understand syntax, semantics and sentiment of text data with the power of Python! This repository contains code and datasets used in my book, "Text Analytics with Python" published by Apress/Springer

  •    Python

Derive useful insights from your data using Python. Learn the techniques related to natural language processing and text analytics, and gain the skills to know which technique is best suited to solve a particular problem. A structured and comprehensive approach is followed in this book so that readers with little or no experience do not find themselves overwhelmed. You will start with the basics of natural language and Python and move on to advanced analytical and machine learning concepts. You will look at each technique and algorithm with both a bird's eye view to understand how it can be used as well as with a microscopic view to understand the mathematical concepts and to implement them to solve your own problems.

gitsuggest - A tool to suggest github repositories based on the repositories you have shown interest in

  •    Python

A tool to suggest github repositories based on the repositories you have shown interest in. One quick way to become a better programmer is by reading code written by smart people. Github makes finding such code/repositories easy. At the end of the day we all are interested in our own specific areas and we express this interest by “starring” repositories and/or “following” people who contribute to such repositories.

rake-nltk - Python implementation of the Rapid Automatic Keyword Extraction algorithm using NLTK.

  •    Python

RAKE short for Rapid Automatic Keyword Extraction algorithm, is a domain independent keyword extraction algorithm which tries to determine key phrases in a body of text by analyzing the frequency of word appearance and its co-occurance with other words in the text. If you see a stopwords error, it means that you do not have the corpus stopwords downloaded from NLTK. You can download it using command below.

cltk - The Classical Language Toolkit

  •    Python

The docs are at docs.cltk.org. CLTK supports Python version 3.6. The software only runs on POSIX–compliant operating systems (Linux, Mac OS X, FreeBSD, etc.).

punkt-segmenter - Ruby port of the NLTK Punkt sentence segmentation algorithm

  •    Ruby

This code is a ruby 1.9.x port of the Punkt sentence tokenizer algorithm implemented by the NLTK Project (http://www.nltk.org/). Punkt is a language-independent, unsupervised approach to sentence boundary detection. It is based on the assumption that a large number of ambiguities in the determination of sentence boundaries can be eliminated once abbreviations have been identiļ¬ed. I simply did the ruby port and some API changes.

pyEssayAnalyser - An essay Analyser & Summariser, using Flask for the API and NLTK for the language processing

  •    Javascript

An essay Analyser & Summariser, using Flask for the API and NLTK for the language processing. Head over to in your browser, you should see the Hello World greetings.


  •    Javascript

An ongoing attempt at tying together various ML techniques for trending topic and sentiment analysis, coupled with some experimental Python async coding, a distributed architecture, EventSource and lots of Docker goodness. I needed a readily available corpus for doing text analytics and sentiment analysis, so I decided to make one from my RSS feeds.

pygermanet - GermaNet API for Python

  •    Python

GermaNet API for Python. Copyright (c) 23 March, 2014 Will Roberts <wildwilhelm@gmail.com>.

presswork - Text generation workbench, starting with Markov Chains

  •    Python

So far, it's all about Markov Chains. Here's a great visual explanation of Markov Chains. Given a bunch of text, model it, and generate "probable" new sentences. I'd like to add other tools to the toolkit, building off of this foundation.

stocksight - Stock analyzer and predictor using Elasticsearch, Twitter, News headlines and Python natural language processing and sentiment analysis

  •    Python

Stock analyzer and stock predictor using Elasticsearch, Twitter, News headlines and Python natural language processing and sentiment analysis. How much do emotions on Twitter and news headlines affect a stock's price? Let's find out ... Edit config.py and modify NLTK tokens required/ignored and twitter feeds you want to mine. NLTK tokens required are keywords which must be in tweet before adding it to Elasticsearch (whitelist). NLTK tokens ignored are keywords which if are found in tweet, it will not be added to Elasticsearch (blacklist).

vertikin - :eyeglasses: Platform to automatically detect what user might be interested in buying in near future

  •    Python

VertiKin is an e-commerce platform that allows the user to search through an online product inventory. It is also able to automatically detect what users might be interested in buying. VertiKin Mobile app learns from user inputs on the mobile device (we do not read passwords and private information, so the user can be assured of his or her security). User data is then sent to the VertiKin server and analyzed with natural language processing (NLP). NLP identifies key information, especially frequency, to predict potential product interests. If VertiKin identifies an interest, the server sends a GCM push notification to the user.

Natural-Language-Processing-in-Practice - Natural Language Processing in Practice [Video], by Packt Publishing

  •    Python

This is the code repository for Natural Language Processing in Practice [Video], published by Packt. It contains all the supporting project files necessary to work through the video course from start to finish. Natural Language Processing (NLP) offers powerful ways to interpret and act on spoken and written language. It can help you with tasks such as customer support enquiries and customer feedback analysis. As the quantity of data continues to grow at an incomprehensible rate, being able to understand and process data is becoming a key differentiator for competitive organizations.

TextClassificationApp - Building and Deploying A Serverless Text Classification Web App

  •    Jupyter

In this project, over a series of blog posts I'll be buidling a model document classification, also known as text classification and deploying the model as part of a web application to predict the topic of research papers from their abstract. In the first blog post I will be working with the Scikit-learn library and an imbalanced dataset (corpus) that I will create from summaries of papers published on arxiv. The topic of each paper is already labeled as the category therefore alleviating the need for me to label the dataset. The imbalance in the dataset will be caused by the imbalance in the number of samples in each of the categories we are trying to predict. Imbalanced data occurs quite frequently in classification problems and makes developing a good model more challenging. Often times it is too expensive or not possible to get more data on the classes that have to few samples. Developing strategies for dealing with imbalanced data is therefore paramount for creating a good classification model. I will cover some of the basics of dealing with imbalanced data using the Imbalance-Learn library as well as building a Naive Bayes classifier and Support Vector Machine using from Scikit-learn. I will also over the basics of term frequency-inverse document frequency and visualizing it using the Plotly library.

We have large collection of open source products. Follow the tags from Tag Cloud >>

Open source products are scattered around the web. Please provide information about the open source projects you own / you use. Add Projects.