- 87

GuidedLDA OR SeededLDA implements latent Dirichlet allocation (LDA) using collapsed Gibbs sampling. GuidedLDA can be guided by setting some seed words per topic. Which will make the topics converge in that direction. You can read more about guidedlda in the documentation.

https://github.com/vi3k6i5/GuidedLDATags | topic-modeling guided-topic-modeling machine-learning data-science guidedlda seededlda |

Implementation | Python |

License | Mozilla |

Platform | Windows Linux |

Gensim is a Python library for topic modelling, document indexing and similarity retrieval with large corpora. Target audience is the natural language processing (NLP) and information retrieval (IR) community. If this feature list left you scratching your head, you can first read more about the Vector Space Model and unsupervised document analysis on Wikipedia.

gensim topic-modeling information-retrieval machine-learning natural-language-processing nlp data-science data-mining word2vec word-embeddings text-summarization neural-network document-similarity word-similarity fasttextA Toolkit for Industrial Topic Modeling

topic-modeling topic-models lda sentence-lda twe nlpLightLDA is a distributed system for large scale topic modeling. It implements a distributed sampler that enables very large data sizes and models. LightLDA improves sampling throughput and convergence speed via a fast O(1) metropolis-Hastings algorithm, and allows small cluster to tackle very large data and model sizes through model scheduling and data parallelism architecture. LightLDA is implemented with C++ for performance consideration.

R package for interactive topic model visualization. LDAvis is designed to help users interpret the topics in a topic model that has been fit to a corpus of text data. The package extracts information from a fitted LDA topic model to inform an interactive web-based visualization.

topic-modeling r visualization text-miningSnorkel is a system for rapidly creating, modeling, and managing training data, currently focused on accelerating the development of structured or "dark" data extraction applications for domains in which large labeled training sets are not available or easy to obtain. <BR><BR> Today's state-of-the-art machine learning models require massive labeled training sets--which usually do not exist for real-world applications. Instead, Snorkel is based around the new data programming paradigm, in which the developer focuses on writing a set of labeling functions, which are just scripts that programmatically label data. The resulting labels are noisy, but Snorkel automatically models this process—learning, essentially, which labeling functions are more accurate than others—and then uses this to train an end model (for example, a deep neural network in TensorFlow).

machine-learning ai information-extraction weak-supervision training-data dark-data training-sets modeling data-modelingEdward is a Python library for probabilistic modeling, inference, and criticism. It is a testbed for fast experimentation and research with probabilistic models, ranging from classical hierarchical models on small data sets to complex deep probabilistic models on large data sets. Edward fuses three fields: Bayesian statistics and machine learning, deep learning, and probabilistic programming. Edward is built on top of TensorFlow. It enables features such as computational graphs, distributed training, CPU/GPU integration, automatic differentiation, and visualization with TensorBoard.

bayesian-methods deep-learning machine-learning data-science tensorflow neural-networks statistics probabilistic-programmingOwl is an emerging numerical library for scientific computing and engineering. The library is developed in the OCaml language and inherits all its powerful features such as static type checking, powerful module system, and superior runtime efficiency. Owl allows you to write succinct type-safe numerical applications in functional language without sacrificing performance, significantly reduces the cost from prototype to production use. Owl's documentation contains a lot of learning materials to help you start. The full documentation consists of two parts: Tutorial Book and API Reference. Both are perfectly synchronised with the code in the repository by the automatic building system. You can access both parts with the following link.

matrix linear-algebra ndarray statistical-functions topic-modeling regression maths gsl plotting sparse-linear-systems scientific-computing numerical-calculations statistics mcmc optimization autograd algorithmic-differentation automatic-differentiation machine-learning neural-networktext2vec is an R package which provides an efficient framework with a concise API for text analysis and natural language processing (NLP). To learn how to use this package, see text2vec.org and the package vignettes. See also the text2vec articles on my blog.

word2vec text-mining natural-language-processing glove vectorization topic-modeling word-embeddings latent-dirichlet-allocationCompared to a classical approach, using a Recurrent Neural Networks (RNN) with Long Short-Term Memory cells (LSTMs) require no or almost no feature engineering. Data can be fed directly into the neural network who acts like a black box, modeling the problem correctly. Other research on the activity recognition dataset can use a big amount of feature engineering, which is rather a signal processing approach combined with classical data science techniques. The approach here is rather very simple in terms of how much was the data preprocessed. Let's use Google's neat Deep Learning library, TensorFlow, demonstrating the usage of an LSTM, a type of Artificial Neural Network that can process sequential data / time series.

machine-learning deep-learning lstm human-activity-recognition neural-network rnn recurrent-neural-networks tensorflow activity-recognitionWelcome to "Bayesian Modelling in Python" - a tutorial for those interested in learning how to apply bayesian modelling techniques in python (PYMC3). This tutorial doesn't aim to be a bayesian statistics tutorial - but rather a programming cookbook for those who understand the fundamental of bayesian statistics and want to learn how to build bayesian models using python. The tutorial sections and topics can be seen below. Statistics is a topic that never resonated with me throughout university. The frequentist techniques that we were taught (p-values etc) felt contrived and ultimately I turned my back on statistics as a topic that I wasn't interested in.

bayesian-statistics tutorial pymcPlease cite our JMLR paper [bibtex]. Some parts of the package were created as part of other publications. If you use these parts, please cite the relevant work appropriately. An overview of all mlr related publications can be found here.

machine-learning data-science tuning cran r-package predictive-modeling classification regression statistics r survival-analysis imbalance-correction tutorial mlr learners hyperparameters-optimization feature-selection multilabel-classification clustering stackingSkater is a unified framework to enable Model Interpretation for all forms of model to help one build an Interpretable machine learning system often needed for real world use-cases(** we are actively working towards to enabling faithful interpretability for all forms models). It is an open source python library designed to demystify the learned structures of a black box model both globally(inference on the basis of a complete data set) and locally(inference about an individual prediction). The project was started as a research idea to find ways to enable better interpretability(preferably human interpretability) to predictive "black boxes" both for researchers and practioners. The project is still in beta phase.

ml predictive-modeling machine-learning modeling-tools model-interpretation blackbox datascience model-explanation explanation-system deep-learning deep-neural-networks attribution lstm-neural-networks cnn-classificationOrange is a component-based data mining software. It includes a range of data visualization, exploration, preprocessing and modeling techniques. It supports . interactive data analysis workflows with a large toolbox.

data-mining data-science machine-learning data-visualization text-miningThe Jupyter Notebook is a web application that allows you to create and share documents that contain live code, equations, visualizations and explanatory text. Uses include: data cleaning and transformation, numerical simulation, statistical modeling, machine learning and much more. It supports over 40 programming languages.

notebook analytics data-visualization data-analytics data-discovery data-sciencePyro is a universal probabilistic programming language (PPL) written in Python and supported by PyTorch on the backend. Pyro enables flexible and expressive deep probabilistic modeling, unifying the best of modern deep learning and Bayesian modeling.

pytorch machine-learning bayesian webppl inference probabilistic-programming probabilistic-graphical-models bayesian-inference variational-inference uber"Data is the new oil" is a saying which you must have heard by now along with the huge interest building up around Big Data and Machine Learning in the recent past along with Artificial Intelligence and Deep Learning. Besides this, data scientists have been termed as having "The sexiest job in the 21st Century" which makes it all the more worthwhile to build up some valuable expertise in these areas. Getting started with machine learning in the real world can be overwhelming with the vast amount of resources out there on the web. "Practical Machine Learning with Python" follows a structured and comprehensive three-tiered approach packed with concepts, methodologies, hands-on examples, and code. This book is packed with over 500 pages of useful information which helps its readers master the essential skills needed to recognize and solve complex problems with Machine Learning and Deep Learning by following a data-driven mindset. By using real-world case studies that leverage the popular Python Machine Learning ecosystem, this book is your perfect companion for learning the art and science of Machine Learning to become a successful practitioner. The concepts, techniques, tools, frameworks, and methodologies used in this book will teach you how to think, design, build, and execute Machine Learning systems and projects successfully.

machine-learning deep-learning text-analytics classification clustering natural-language-processing computer-vision data-science spacy nltk scikit-learn prophet time-series-analysis convolutional-neural-networks tensorflow keras statsmodels pandas deep-neural-networksHyperTools is designed to facilitate dimensionality reduction-based visual explorations of high-dimensional data. The basic pipeline is to feed in a high-dimensional dataset (or a series of high-dimensional datasets) and, in a single function call, reduce the dimensionality of the dataset(s) and create a plot. The package is built atop many familiar friends, including matplotlib, scikit-learn and seaborn. Our package was recently featured on Kaggle's No Free Hunch blog. For a general overview, you may find this talk useful (given as part of the MIND Summer School at Dartmouth). Check the repo of Jupyter notebooks from the HyperTools paper.

data-visualization high-dimensional-data topic-modeling text-vectorization data-wrangling visualization time-seriesWe are a group of people who are excited about open science, open data and machine learning. We want to make machine learning and data analysis simple, accessible, collaborative and open with an optimal division of labour between computers and humans. OpenML is an online machine learning platform for sharing and organizing data, machine learning algorithms and experiments. It is designed to create a frictionless, networked ecosystem, that you can readily integrate into your existing processes/code/environments, allowing people all over the world to collaborate and build directly on each other’s latest ideas, data and results, irrespective of the tools and infrastructure they happen to use.

machine-learning open-science science citizen-scientists collaboration opendata datasetsThis repository was initially created to submit machine learning assignments for Siraj Raval's online machine learning course. The purpose of the course was to learn how to implement the most common machine learning algorithms from scratch (without using machine learning libraries such as tensorflow, PyTorch, scikit-learn, etc). Although that course has ended now, I am continuing to learn data science and machine learning from other sources such as Coursera, online blogs, and attending machine learning lectures at University of Toronto. Sticking to the theme of implementing machine learning algortihms from scratch, I will continue to post detailed notebooks in python here as I learn more.

machine-learning statistical-concepts siraj-raval machine-learning-algorithms machine-learning-from-scratchPyMC3 is a Python package for Bayesian statistical modeling and Probabilistic Machine Learning which focuses on advanced Markov chain Monte Carlo and variational fitting algorithms. Its flexibility and extensibility make it applicable to a large suite of problems.Note: Running pip install pymc will install PyMC 2.3, not PyMC3, from PyPI.

statistical-analysis bayesian-inference mcmc variational-inference theano probabilistic-programming bayesian
We have large collection of open source products. Follow the tags from
Tag Cloud >>

Open source products are scattered around the web. Please provide information
about the open source projects you own / you use.
**Add Projects.**