text-analytics-with-python - Learn how to process, classify, cluster, summarize, understand syntax, semantics and sentiment of text data with the power of Python! This repository contains code and datasets used in my book, "Text Analytics with Python" published by Apress/Springer

  •        522

Derive useful insights from your data using Python. Learn the techniques related to natural language processing and text analytics, and gain the skills to know which technique is best suited to solve a particular problem. A structured and comprehensive approach is followed in this book so that readers with little or no experience do not find themselves overwhelmed. You will start with the basics of natural language and Python and move on to advanced analytical and machine learning concepts. You will look at each technique and algorithm with both a bird's eye view to understand how it can be used as well as with a microscopic view to understand the mathematical concepts and to implement them to solve your own problems.

https://github.com/dipanjanS/text-analytics-with-python

Tags
Implementation
License
Platform

   




Related Projects

TextBlob - Simple, Pythonic, text processing--Sentiment analysis, part-of-speech tagging, noun phrase extraction, translation, and more

  •    Python

TextBlob is a Python (2 and 3) library for processing textual data. It provides a simple API for diving into common natural language processing (NLP) tasks such as part-of-speech tagging, noun phrase extraction, sentiment analysis, classification, translation, and more. TextBlob stands on the giant shoulders of NLTK and pattern, and plays nicely with both.

Obsei - Low code AI powered automation tool

  •    Python

Obsei is a low code AI powered automation tool. It can be used in various business flows like social listening, AI based alerting, brand image analysis, comparative study and more. It consist of Observer, Analyzer and Informer. Observer observes the platform like Twitter, Facebook, App Stores, Google reviews, Amazon reviews, News, Website etc and feed that information. Analyzer performs text analysis like classification, sentiment, translation, PII on the analyzed data. Informer sends it to ticketing system, data store, dataframe etc for further action and analysis.

practical-machine-learning-with-python - Master the essential skills needed to recognize and solve complex real-world problems with Machine Learning and Deep Learning by leveraging the highly popular Python Machine Learning Eco-system

  •    Jupyter

"Data is the new oil" is a saying which you must have heard by now along with the huge interest building up around Big Data and Machine Learning in the recent past along with Artificial Intelligence and Deep Learning. Besides this, data scientists have been termed as having "The sexiest job in the 21st Century" which makes it all the more worthwhile to build up some valuable expertise in these areas. Getting started with machine learning in the real world can be overwhelming with the vast amount of resources out there on the web. "Practical Machine Learning with Python" follows a structured and comprehensive three-tiered approach packed with concepts, methodologies, hands-on examples, and code. This book is packed with over 500 pages of useful information which helps its readers master the essential skills needed to recognize and solve complex problems with Machine Learning and Deep Learning by following a data-driven mindset. By using real-world case studies that leverage the popular Python Machine Learning ecosystem, this book is your perfect companion for learning the art and science of Machine Learning to become a successful practitioner. The concepts, techniques, tools, frameworks, and methodologies used in this book will teach you how to think, design, build, and execute Machine Learning systems and projects successfully.

transformers - 🤗Transformers: State-of-the-art Natural Language Processing for Pytorch, TensorFlow, and JAX

  •    Python

🤗 Transformers provides thousands of pretrained models to perform tasks on texts such as classification, information extraction, question answering, summarization, translation, text generation and more in over 100 languages. Its aim is to make cutting-edge NLP easier to use for everyone. 🤗 Transformers provides APIs to quickly download and use those pretrained models on a given text, fine-tune them on your own datasets and then share them with the community on our model hub. At the same time, each python module defining an architecture is fully standalone and can be modified to enable quick research experiments.

gensim - Topic Modelling for Humans

  •    Python

Gensim is a Python library for topic modelling, document indexing and similarity retrieval with large corpora. Target audience is the natural language processing (NLP) and information retrieval (IR) community. If this feature list left you scratching your head, you can first read more about the Vector Space Model and unsupervised document analysis on Wikipedia.


spacy-streamlit - 👑 spaCy building blocks and visualizers for Streamlit apps

  •    Python

This package contains utilities for visualizing spaCy models and building interactive spaCy-powered apps with Streamlit. It includes various building blocks you can use in your own Streamlit app, like visualizers for syntactic dependencies, named entities, text classification, semantic similarity via word vectors, token attributes, and more. The package includes building blocks that call into Streamlit and set up all the required elements for you. You can either use the individual components directly and combine them with other elements in your app, or call the visualize function to embed the whole visualizer.

doccano - Open source annotation tool for machine learning practitioners.

  •    Python

doccano is an open source text annotation tool for humans. It provides annotation features for text classification, sequence labeling and sequence to sequence tasks. So, you can create labeled data for sentiment analysis, named entity recognition, text summarization and so on. Just create a project, upload data and start annotating. You can build a dataset in hours. You can try the annotation demo.

nlp-with-ruby - Practical Natural Language Processing done in Ruby.

  •    Ruby

This curated list comprises awesome resources, libraries, information sources about computational processing of texts in human languages with the Ruby programming language. That field is often referred to as NLP, Computational Linguistics, HLT (Human Language Technology) and can be brought in conjunction with Artificial Intelligence, Machine Learning, Information Retrieval, Text Mining, Knowledge Extraction and other related disciplines. This list comes from our day to day work on Language Models and NLP Tools. Read why this list is awesome. Our FAQ describes the important decisions and useful answers you may be interested in.

decaNLP - The Natural Language Decathlon: A Multitask Challenge for NLP

  •    Python

The Natural Language Decathlon is a multitask challenge that spans ten tasks: question answering (SQuAD), machine translation (IWSLT), summarization (CNN/DM), natural language inference (MNLI), sentiment analysis (SST), semantic role labeling(QA‑SRL), zero-shot relation extraction (QA‑ZRE), goal-oriented dialogue (WOZ, semantic parsing (WikiSQL), and commonsense reasoning (MWSC). Each task is cast as question answering, which makes it possible to use our new Multitask Question Answering Network (MQAN). This model jointly learns all tasks in decaNLP without any task-specific modules or parameters in the multitask setting. For a more thorough introduction to decaNLP and the tasks, see the main website, our blog post, or the paper. While the research direction associated with this repository focused on multitask learning, the framework itself is designed in a way that should make single-task training, transfer learning, and zero-shot evaluation simple. Similarly, the paper focused on multitask learning as a form of question answering, but this framework can be easily adapted for different approached to single-task or multitask learning.

pytextrank - Python implementation of TextRank for text document NLP parsing and summarization

  •    Jupyter

Python implementation of TextRank, based on the Mihalcea 2004 paper. The results produced by this implementation are intended more for use as feature vectors in machine learning, not as academic paper summaries.

CoreNLP - Stanford CoreNLP: A Java suite of core NLP tools.

  •    Java

Stanford CoreNLP provides a set of natural language analysis tools which can take raw English language text input and give the base forms of words, their parts of speech, whether they are names of companies, people, etc., normalize dates, times, and numeric quantities, mark up the structure of sentences in terms of phrases and word dependencies, and indicate which noun phrases refer to the same entities. It provides the foundational building blocks for higher level text understanding applications.

snips-nlu - Snips Python library to extract meaning from text

  •    Python

Snips NLU (Natural Language Understanding) is a Python library that allows to parse sentences written in natural language and extracts structured information. To find out how to use Snips NLU please refer to our documentation, it will provide you with a step-by-step guide on how to use and setup our library.

OpenNLP - Machine learning based toolkit for the processing of natural language text

  •    Java

The Apache OpenNLP library is a machine learning based toolkit for the processing of natural language text. It supports the most common NLP tasks, such as tokenization, sentence segmentation, part-of-speech tagging, named entity extraction, chunking, parsing, and coreference resolution. These tasks are usually required to build more advanced text processing services. OpenNLP also includes maximum entropy and perceptron based machine learning.

delta - DELTA is a deep learning based natural language and speech processing platform.

  •    Python

DELTA is a deep learning based end-to-end natural language and speech processing platform. DELTA aims to provide easy and fast experiences for using, deploying, and developing natural language processing and speech models for both academia and industry use cases. DELTA is mainly implemented using TensorFlow and Python 3. For details of DELTA, please refer to this paper.

sense2vec - 🦆 Use NLP to go beyond vanilla word2vec

  •    C++

sense2vec (Trask et. al, 2015) is a nice twist on word2vec that lets you learn more interesting, detailed and context-sensitive word vectors. For an interactive example of the technology, see our sense2vec demo that lets you explore semantic similarities across all Reddit comments of 2015. This library is a simple Python/Cython implementation for loading and querying sense2vec models. While it's best used in combination with spaCy, the sense2vec library itself is very lightweight and can also be used as a standalone module. See below for usage details.

Stanza - A Python NLP Library for Many Human Languages

  •    Python

Stanza is a Python NLP Library for Many Human Languages. It contains support for running various accurate natural language processing tools on 60+ languages and for accessing the Java Stanford CoreNLP software from Python. A new collection of biomedical and clinical English model packages are now available, offering seamless experience for syntactic analysis and named entity recognition (NER) from biomedical literature text and clinical notes.

sling - SLING - A natural language frame semantics parser

  •    C++

SLING is a parser for annotating text with frame semantic annotations. It is trained on an annotated corpus using Tensorflow and Dragnn.The parser is a general transition-based frame semantic parser using bi-directional LSTMs for input encoding and a Transition Based Recurrent Unit (TBRU) for output decoding. It is a jointly trained model using only the text tokens as input and the transition system has been designed to output frame graphs directly without any intervening symbolic representation.

pynlpl - PyNLPl, pronounced as 'pineapple', is a Python library for Natural Language Processing

  •    Python

PyNLPl, pronounced as 'pineapple', is a Python library for Natural Language Processing. It contains various modules useful for common, and less common, NLP tasks. PyNLPl can be used for basic tasks such as the extraction of n-grams and frequency lists, and to build simple language model. There are also more complex data types and algorithms. Moreover, there are parsers for file formats common in NLP (e.g. FoLiA/Giza/Moses/ARPA/Timbl/CQL). There are also clients to interface with various NLP specific servers. PyNLPl most notably features a very extensive library for working with FoLiA XML (Format for Linguistic Annotatation). The library is a divided into several packages and modules. It works on Python 2.7, as well as Python 3.

pattern - Web mining module for Python, with tools for scraping, natural language processing, machine learning, network analysis and visualization

  •    Python

It is well documented, thoroughly tested with 350+ unit tests and comes bundled with 50+ examples. The source code is licensed under BSD and available from http://www.clips.ua.ac.be/pages/pattern. This example trains a classifier on adjectives mined from Twitter using Python 3. First, tweets that contain hashtag #win or #fail are collected. For example: "$20 tip off a sweet little old lady today #win". The word part-of-speech tags are then parsed, keeping only adjectives. Each tweet is transformed to a vector, a dictionary of adjective → count items, labeled WIN or FAIL. The classifier uses the vectors to learn which other tweets look more like WIN or more like FAIL.

Mallet - MALLET is a Java-based package for statistical natural language processing, document classification, clustering, topic modeling, information extraction, and other machine learning applications to text

  •    Java

MALLET is a Java-based package for statistical natural language processing, document classification, clustering, topic modeling, information extraction, and other machine learning applications to text.