ISLE - This repository provides code for SVD and Importance sampling-based algorithms for large scale topic modeling

  •        9

We built this project on Ubuntu 16.04LTS with gcc 5.4. Other linux versions with gcc 5+ could also work. This should generate two executables ISLETrain and ISLEInfer in the directory.

https://github.com/Microsoft/ISLE

Tags
Implementation
License
Platform

   




Related Projects

Accord.NET - Machine learning, Computer vision, Statistics and general scientific computing for .NET

  •    CSharp

The Accord.NET project provides machine learning, statistics, artificial intelligence, computer vision and image processing methods to .NET. It can be used on Microsoft Windows, Xamarin, Unity3D, Windows Store applications, Linux or mobile.

owl - Owl is an OCaml library for scientific and engineering computing.

  •    OCaml

Owl is an emerging numerical library for scientific computing and engineering. The library is developed in the OCaml language and inherits all its powerful features such as static type checking, powerful module system, and superior runtime efficiency. Owl allows you to write succinct type-safe numerical applications in functional language without sacrificing performance, significantly reduces the cost from prototype to production use. Owl's documentation contains a lot of learning materials to help you start. The full documentation consists of two parts: Tutorial Book and API Reference. Both are perfectly synchronised with the code in the repository by the automatic building system. You can access both parts with the following link.

hackermath - Introduction to Statistics and Basics of Mathematics for Data Science - The Hacker's Way

  •    Jupyter

Math literacy, including proficiency in Linear Algebra and Statistics,is a must for anyone pursuing a career in data science. The goal of this workshop is to introduce some key concepts from these domains that get used repeatedly in data science applications. Our approach is what we call the “Hacker’s way”. Instead of going back to formulae and proofs, we teach the concepts by writing code. And in practical applications. Concepts don’t remain sticky if the usage is never taught. The focus will be on depth rather than breadth. Three areas are chosen - Hypothesis Testing, Supervised Learning and Unsupervised Learning. They will be covered to sufficient depth - 50% of the time will be on the concepts and 50% of the time will be spent coding them.

Smile - Statistical Machine Intelligence & Learning Engine

  •    Java

Smile (Statistical Machine Intelligence and Learning Engine) is a fast and comprehensive machine learning, NLP, linear algebra, graph, interpolation, and visualization system in Java and Scala. With advanced data structures and algorithms, Smile delivers state-of-art performance.Smile covers every aspect of machine learning, including classification, regression, clustering, association rule mining, feature selection, manifold learning, multidimensional scaling, genetic algorithms, missing value imputation, efficient nearest neighbor search, etc.

Mallet - MALLET is a Java-based package for statistical natural language processing, document classification, clustering, topic modeling, information extraction, and other machine learning applications to text

  •    Java

MALLET is a Java-based package for statistical natural language processing, document classification, clustering, topic modeling, information extraction, and other machine learning applications to text.


lightlda - Scalable, fast, and lightweight system for large-scale topic modeling

  •    C++

LightLDA is a distributed system for large scale topic modeling. It implements a distributed sampler that enables very large data sizes and models. LightLDA improves sampling throughput and convergence speed via a fast O(1) metropolis-Hastings algorithm, and allows small cluster to tackle very large data and model sizes through model scheduling and data parallelism architecture. LightLDA is implemented with C++ for performance consideration.

Dambach Linear Algebra Framework

  •    

The Dambach Linear Algebra Framework is a general purpose Linear Algebra framework for .Net. The main goal is to enable ordinary programmers (who do not have a math degree) to make use of linear algebra methods in solving everyday problems.

tensorflow_cookbook - Code for Tensorflow Machine Learning Cookbook

  •    Jupyter

This chapter intends to introduce the main objects and concepts in TensorFlow. We also introduce how to access the data for the rest of the book and provide additional resources for learning about TensorFlow. After we have established the basic objects and methods in TensorFlow, we now want to establish the components that make up TensorFlow algorithms. We start by introducing computational graphs, and then move to loss functions and back propagation. We end with creating a simple classifier and then show an example of evaluating regression and classification algorithms.

gensim - Topic Modelling for Humans

  •    Python

Gensim is a Python library for topic modelling, document indexing and similarity retrieval with large corpora. Target audience is the natural language processing (NLP) and information retrieval (IR) community. If this feature list left you scratching your head, you can first read more about the Vector Space Model and unsupervised document analysis on Wikipedia.

Modular toolkit for Data Processing MDP

  •    Python

The Modular toolkit for Data Processing (MDP) is a Python data processing framework. From the user's perspective, MDP is a collection of supervised and unsupervised learning algorithms and other data processing units that can be combined into data processing sequences and more complex feed-forward network architectures. From the scientific developer's perspective, MDP is a modular framework, which can easily be expanded. The implementation of new algorithms is easy and intuitive. The new i

interpret - Fit interpretable models. Explain blackbox machine learning.

  •    C++

Historically, the most intelligible models were not very accurate, and the most accurate models were not intelligible. Microsoft Research has developed an algorithm called the Explainable Boosting Machine (EBM)* which has both high accuracy and intelligibility. EBM uses modern machine learning techniques like bagging and boosting to breathe new life into traditional GAMs (Generalized Additive Models). This makes them as accurate as random forests and gradient boosted trees, and also enhances their intelligibility and editability. In addition to EBM, InterpretML also supports methods like LIME, SHAP, linear models, partial dependence, decision trees and rule lists. The package makes it easy to compare and contrast models to find the best one for your needs.

ladder - Ladder network is a deep learning algorithm that combines supervised and unsupervised learning

  •    Python

This is an implementation of Ladder Network in TensorFlow. Ladder network is a deep learning algorithm that combines supervised and unsupervised learning. It was introduced in the paper Semi-Supervised Learning with Ladder Network by A Rasmus, H Valpola, M Honkala, M Berglund, and T Raiko.

opencog - A framework for integrated Artificial Intelligence & Artificial General Intelligence (AGI)

  •    Scheme

OpenCog is a framework for developing AI systems, especially appropriate for integrative multi-algorithm systems, and artificial general intelligence systems. Though much work remains to be done, it currently contains a functional core framework, and a number of cognitive agents at varying levels of completion, some already displaying interesting and useful functionalities alone and in combination. With the exception of MOSES and the CogServer, all of the above are in active development, are half-baked, poorly documented, mis-designed, subject to experimentation, and generally in need of love an attention. This is where experimentation and integration are taking place, and, like any laboratory, things are a bit fluid and chaotic.

OpenUnReID - PyTorch open-source toolbox for unsupervised or domain adaptive object re-ID.

  •    Python

OpenUnReID is an open-source PyTorch-based codebase for both unsupervised learning (USL) and unsupervised domain adaptation (UDA) in the context of object re-ID tasks. It provides strong baselines and multiple state-of-the-art methods with highly refactored codes for both pseudo-label-based and domain-translation-based frameworks. It works with Python >=3.5 and PyTorch >=1.1. We are actively updating this repo, and more methods will be supported soon. Contributions are welcome.

Machine Learning for .NET

  •    

Machine Learning Library for .NET. Initial inclusions will be binary and multi-class classification as well as standard clustering algorithms.

vectorious - High performance linear algebra.

  •    Javascript

A high performance linear algebra library, written in JavaScript and optimized with C++ bindings to BLAS. The documentation is located in the wiki section of this repository.

deepLearningBook-Notes - Notes on the Deep Learning book from Ian Goodfellow, Yoshua Bengio and Aaron Courville (2016)

  •    Jupyter

This content is part of a series following the chapter 2 on linear algebra from the Deep Learning Book by Goodfellow, I., Bengio, Y., and Courville, A. (2016). It aims to provide intuitions/drawings/python code on mathematical theories and is constructed as my understanding of these concepts. I'd like to introduce a series of blog posts and their corresponding Python Notebooks gathering notes on the Deep Learning Book from Ian Goodfellow, Yoshua Bengio, and Aaron Courville (2016). The aim of these notebooks is to help beginners/advanced beginners to grasp linear algebra concepts underlying deep learning and machine learning. Acquiring these skills can boost your ability to understand and apply various data science algorithms. In my opinion, it is one of the bedrock of machine learning, deep learning and data science.






We have large collection of open source products. Follow the tags from Tag Cloud >>


Open source products are scattered around the web. Please provide information about the open source projects you own / you use. Add Projects.