lda - LDA topic modeling for node.js

  •        32

Latent Dirichlet allocation (LDA) topic modeling in javascript for node.js. LDA is a machine learning algorithm that extracts topics and their related keywords from a collection of documents. In LDA, a document may contain several different topics, each with their own related terms. The algorithm uses a probabilistic model for detecting the number of topics specified and extracting their related keywords. For example, a document may contain topics that could be classified as beach-related and weather-related. The beach topic may contain related words, such as sand, ocean, and water. Similarly, the weather topic may contain related words, such as sun, temperature, and clouds.

https://github.com/primaryobjects/lda

Dependencies:

stem-porter : *

Tags
Implementation
License
Platform

   




Related Projects

text2vec - Fast vectorization, topic modeling, distances and GloVe word embeddings in R.

  •    R

text2vec is an R package which provides an efficient framework with a concise API for text analysis and natural language processing (NLP). To learn how to use this package, see text2vec.org and the package vignettes. See also the text2vec articles on my blog.

gensim - Topic Modelling for Humans

  •    Python

Gensim is a Python library for topic modelling, document indexing and similarity retrieval with large corpora. Target audience is the natural language processing (NLP) and information retrieval (IR) community. If this feature list left you scratching your head, you can first read more about the Vector Space Model and unsupervised document analysis on Wikipedia.

Yahoo_LDA - Yahoo!'s topic modelling framework using Latent Dirichlet Allocation

  •    C++

Yahoo!'s topic modelling framework using Latent Dirichlet Allocation

GibbsLDA++: A C/C++ Gibbs Sampling LDA

  •    C++

GibbsLDA++: A C/C++ Implementation of Latent Dirichlet Allocation (LDA) using Gibbs Sampling for parameter estimation and inference. GibbsLDA++ is fast and is designed to analyze hidden/latent topic structures of large-scale (text) data collections.


lectures - Oxford Deep NLP 2017 course

  •    

This repository contains the lecture slides and course description for the Deep Natural Language Processing course offered in Hilary Term 2017 at the University of Oxford. This is an applied course focussing on recent advances in analysing and generating speech and text using recurrent neural networks. We introduce the mathematical definitions of the relevant machine learning models and derive their associated optimisation algorithms. The course covers a range of applications of neural networks in NLP including analysing latent dimensions in text, transcribing speech to text, translating between languages, and answering questions. These topics are organised into three high level themes forming a progression from understanding the use of neural networks for sequential language modelling, to understanding their use as conditional language models for transduction tasks, and finally to approaches employing these techniques in combination with other mechanisms for advanced applications. Throughout the course the practical implementation of such models on CPU and GPU hardware is also discussed.

spaCy - 💫 Industrial-strength Natural Language Processing (NLP) with Python and Cython

  •    Python

spaCy is a library for advanced Natural Language Processing in Python and Cython. It's built on the very latest research, and was designed from day one to be used in real products. spaCy comes with pre-trained statistical models and word vectors, and currently supports tokenization for 20+ languages. It features the fastest syntactic parser in the world, convolutional neural network models for tagging, parsing and named entity recognition and easy deep learning integration. It's commercial open-source software, released under the MIT license. 💫 Version 2.0 out now! Check out the new features here.

nlp-with-ruby - Practical Natural Language Processing done in Ruby.

  •    Ruby

This curated list comprises awesome resources, libraries, information sources about computational processing of texts in human languages with the Ruby programming language. That field is often referred to as NLP, Computational Linguistics, HLT (Human Language Technology) and can be brought in conjunction with Artificial Intelligence, Machine Learning, Information Retrieval, Text Mining, Knowledge Extraction and other related disciplines. This list comes from our day to day work on Language Models and NLP Tools. Read why this list is awesome. Our FAQ describes the important decisions and useful answers you may be interested in.

spark-nlp - Natural Language Understanding Library for Apache Spark.

  •    Jupyter

John Snow Labs Spark-NLP is a natural language processing library built on top of Apache Spark ML. It provides simple, performant & accurate NLP annotations for machine learning pipelines, that scale easily in a distributed environment. This library has been uploaded to the spark-packages repository https://spark-packages.org/package/JohnSnowLabs/spark-nlp .

pynlpl - PyNLPl, pronounced as 'pineapple', is a Python library for Natural Language Processing

  •    Python

PyNLPl, pronounced as 'pineapple', is a Python library for Natural Language Processing. It contains various modules useful for common, and less common, NLP tasks. PyNLPl can be used for basic tasks such as the extraction of n-grams and frequency lists, and to build simple language model. There are also more complex data types and algorithms. Moreover, there are parsers for file formats common in NLP (e.g. FoLiA/Giza/Moses/ARPA/Timbl/CQL). There are also clients to interface with various NLP specific servers. PyNLPl most notably features a very extensive library for working with FoLiA XML (Format for Linguistic Annotatation). The library is a divided into several packages and modules. It works on Python 2.7, as well as Python 3.

thinc - 🔮 spaCy's Machine Learning library for NLP in Python

  •    Assembly

Thinc is the machine learning library powering spaCy. It features a battle-tested linear model designed for large sparse learning problems, and a flexible neural network model under development for spaCy v2.0. Thinc is a practical toolkit for implementing models that follow the "Embed, encode, attend, predict" architecture. It's designed to be easy to install, efficient for CPU usage and optimised for NLP and deep learning with text – in particular, hierarchically structured input and variable-length sequences.

knowceans

  •    Java

Utility classes from maps to search engine to random samplers

shravanmn-Yahoo_LDA

  •    C++

Yahoo!'s topic modelling framework using Latent Dirichlet Allocation

opencog - A framework for integrated Artificial Intelligence & Artificial General Intelligence (AGI)

  •    Scheme

OpenCog is a framework for developing AI systems, especially appropriate for integrative multi-algorithm systems, and artificial general intelligence systems. Though much work remains to be done, it currently contains a functional core framework, and a number of cognitive agents at varying levels of completion, some already displaying interesting and useful functionalities alone and in combination. With the exception of MOSES and the CogServer, all of the above are in active development, are half-baked, poorly documented, mis-designed, subject to experimentation, and generally in need of love an attention. This is where experimentation and integration are taking place, and, like any laboratory, things are a bit fluid and chaotic.

Mycroft - an Artificial intelligence for everyone

  •    Python

Mycroft is an Artificial intelligence for everyone. It uses open software to process natural language, determine your intent and take action. It can integrate a host of professional functions – Control scenes to conserve power, grant office access with your voice. It can control all of your media and devices with the sound of your voice. Adjust your thermostat, turn on your lights, water your lawn, play your favorite movie and lot more.

Apache Mahout - Scalable machine learning library

  •    Java

Apache Mahout has implementations of a wide range of machine learning and data mining algorithms: clustering, classification, collaborative filtering and frequent pattern mining.

NeuronBlocks - NLP DNN Toolkit - Building Your NLP DNN Models Like Playing Lego

  •    Python

NeuronBlocks is a NLP deep learning modeling toolkit that helps engineers/researchers to build end-to-end pipelines for neural network model training for NLP tasks. The main goal of this toolkit is to minimize developing cost for NLP deep neural network model building, including both training and inference stages. NeuronBlocks consists of two major components: Block Zoo and Model Zoo.

LDAvis - R package for web-based interactive topic model visualization.

  •    Javascript

R package for interactive topic model visualization. LDAvis is designed to help users interpret the topics in a topic model that has been fit to a corpus of text data. The package extracts information from a fitted LDA topic model to inform an interactive web-based visualization.