Displaying 1 to 20 from 20 results

tpot - A Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming

  •    Python

Consider TPOT your Data Science Assistant. TPOT is a Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming.TPOT will automate the most tedious part of machine learning by intelligently exploring thousands of possible pipelines to find the best one for your data.

TransmogrifAI - TransmogrifAI (pronounced trăns-mŏgˈrə-fī) is an AutoML library for building modular, reusable, strongly typed machine learning workflows on Spark with minimal hand tuning

  •    Scala

TransmogrifAI (pronounced trăns-mŏgˈrə-fī) is an AutoML library written in Scala that runs on top of Spark. It was developed with a focus on accelerating machine learning developer productivity through machine learning automation, and an API that enforces compile-time type-safety, modularity, and reuse. Through automation, it achieves accuracies close to hand-tuned models with almost 100x reduction in time. Skip to Quick Start and Documentation.

featuretools - automated feature engineering

  •    Python

Featuretools is a python library for automated feature engineering. See the documentation for more information. Below is an example of using Deep Feature Synthesis (DFS) to perform automated feature engineering. In this example, we apply DFS to a multi-table dataset consisting of timestamped customer transactions.

auto_ml - Automated machine learning for analytics & production

  •    Python

auto_ml is designed for production. Here's an example that includes serializing and loading the trained model, then getting predictions on single dictionaries, roughly the process you'd likely follow to deploy the trained model. All of these projects are ready for production. These projects all have prediction time in the 1 millisecond range for a single prediction, and are able to be serialized to disk and loaded into a new environment after training.




deltapy - DeltaPy - Tabular Data Augmentation (by @firmai)

  •    Jupyter

Animated investment research at Sov.ai, sponsoring open source initiatives. Tabular augmentation is a new experimental space that makes use of novel and traditional data generation and synthesisation techniques to improve model prediction success. It is in essence a process of modular feature engineering and observation engineering while emphasising the order of augmentation to achieve the best predicted outcome from a given information set. DeltaPy was created with finance applications in mind, but it can be broadly applied to any data-rich environment.

open-solution-home-credit - Open solution to the Home Credit Default Risk challenge :house_with_garden:

  •    Python

This is an open solution to the Home Credit Default Risk challenge 🏡. In this open source solution you will find references to the neptune.ml. It is free platform for community Users, which we use daily to keep track of our experiments. Please note that using neptune.ml is not necessary to proceed with this solution. You may run it as plain Python script 🐍.


featran - A Scala feature transformation library for data science and machine learning

  •    Scala

Featran, also known as Featran77 or F77 (get it?), is a Scala library for feature transformation. It aims to simplify the time consuming task of feature engineering in data science and machine learning processes. It supports various collection types for feature extraction and output formats for feature representation.We can implement this in a naive way using reduce and map.

protr - Comprehensive toolkit for generating various numerical features of protein sequences

  •    R

Comprehensive toolkit for generating various numerical features of protein sequences described in Xiao et al. (2015) <DOI:10.1093/bioinformatics/btv042> (PDF). Nan Xiao, Dong-Sheng Cao, Min-Feng Zhu, and Qing-Song Xu. (2015). protr/ProtrWeb: R package and web server for generating various numerical representation schemes of protein sequences. Bioinformatics 31 (11), 1857-1859.

feng - feng - feature engineering for machine-learning champions

  •    Python

feng is a Python module for smoothly engineering features from your Pandas DataFrame so that you can win that Kaggle competition. We spent most of our efforts in feature engineering.

cortana-intelligence-customer360 - This repository contains instructions and code to deploy a customer 360 profile solution on Azure stack using the Cortana Intelligence Suite

  •    Python

The Customer 360 solution provides you a scalable way to build a customer profile enriched by machine learning. It also allows you to uniformly access and operate on data across disparate data sources (while minimizing raw data movement) and leverage the power of Microsoft R Server for scalable modelling and accurate predictions. Ingestion and Pre-processing: Ingest, prepare, and aggregate live user activity data.

home-credit-default-risk - Default risk prediction for Home Credit competition - Fast, scalable and maintainable SQL-based feature engineering pipeline

  •    Python

This is code I built for the Home Credit default risk competition on Kaggle. This should be seen more as an ML engineering achievement than a data science top of the line prediction model. First of all, due to time constraints this is not a top scorer. First rank was 0.80570 AUC (499 submissions), this is 0.78212 AUC (12 submissions).

lambdo - A column-oriented approach to feature engineering

  •    Python

Lambdo is a workflow engine which significantly simplifies the analysis process by unifying feature engineering and machine learning operations. Lambdo data analysis workflow does not distinguish between them and any node can be treated either as a feature or as prediction, and both of them can be trained.

bytehub - ByteHub: making feature stores simple

  •    Python

An easy-to-use feature store. A feature store is a data storage system for data science and machine-learning. It can store raw data and also transformed features, which can be fed straight into an ML model or training script.

PubMed-Best-Match - Machine-learning based pipeline relying on LambdaMART currently used in PubMed for relevance (Best Match) searches

  •    Python

As a result, this exposes the offline research for developing the new Best Match algorithm and model computation. For a live implementation, we recommend to take a look at Solr and its LTR plugin. A full solution consists of an information retrieval system to fetch articles matching the query and an implementation of LambdaMART to rerank the results. While this repository focuses on training a ranking model as implemented in the Best Match sort order of PubMed, we provide sample data to simulate the fetching steps.

go-featureprocessing - Fast, simple sklearn-like feature processing for Go

  •    Go

Code above will generate a new struct as well benchmarks and tests using google/gofuzz. This transformer can be serialized and de-serialized by standard Go routines. Serialized transformer is easy to read, update, and integrate with other tools.

the-building-data-genome-project - A collection of non-residential buildings for performance analysis and algorithm benchmarking

  •    Jupyter

It is an open data set from 507 non-residential buildings that includes hourly whole building electrical meter data for one year. Each of the buildings has meta data such as or area, weather, and primary use type. This data set can be used to benchmark various statistical learning algorithms and other data science techniques. It can also be used simply as a teaching or learning tool to practice dealing with measured performance data from large numbers of non-residential buildings. The charts below illustrate the breakdown of the buildings according to location, building industry, sub-industry, and primary use type. Clayton Miller, Forrest Meggers, The Building Data Genome Project: An open, public data set from non-residential building electrical meters, Energy Procedia, Volume 122, September 2017, Pages 439-444, ISSN 1876-6102, https://doi.org/10.1016/j.egypro.2017.07.400.






We have large collection of open source products. Follow the tags from Tag Cloud >>


Open source products are scattered around the web. Please provide information about the open source projects you own / you use. Add Projects.