Displaying 1 to 20 from 23 results

tpot - A Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming

  •    Python

Consider TPOT your Data Science Assistant. TPOT is a Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming.TPOT will automate the most tedious part of machine learning by intelligently exploring thousands of possible pipelines to find the best one for your data.

TransmogrifAI - TransmogrifAI (pronounced trăns-mŏgˈrə-fī) is an AutoML library for building modular, reusable, strongly typed machine learning workflows on Spark with minimal hand tuning

  •    Scala

TransmogrifAI (pronounced trăns-mŏgˈrə-fī) is an AutoML library written in Scala that runs on top of Spark. It was developed with a focus on accelerating machine learning developer productivity through machine learning automation, and an API that enforces compile-time type-safety, modularity, and reuse. Through automation, it achieves accuracies close to hand-tuned models with almost 100x reduction in time. Skip to Quick Start and Documentation.

featuretools - automated feature engineering

  •    Python

Featuretools is a python library for automated feature engineering. See the documentation for more information. Below is an example of using Deep Feature Synthesis (DFS) to perform automated feature engineering. In this example, we apply DFS to a multi-table dataset consisting of timestamped customer transactions.

auto_ml - Automated machine learning for analytics & production

  •    Python

auto_ml is designed for production. Here's an example that includes serializing and loading the trained model, then getting predictions on single dictionaries, roughly the process you'd likely follow to deploy the trained model. All of these projects are ready for production. These projects all have prediction time in the 1 millisecond range for a single prediction, and are able to be serialized to disk and loaded into a new environment after training.

deltapy - DeltaPy - Tabular Data Augmentation (by @firmai)

  •    Jupyter

Animated investment research at Sov.ai, sponsoring open source initiatives. Tabular augmentation is a new experimental space that makes use of novel and traditional data generation and synthesisation techniques to improve model prediction success. It is in essence a process of modular feature engineering and observation engineering while emphasising the order of augmentation to achieve the best predicted outcome from a given information set. DeltaPy was created with finance applications in mind, but it can be broadly applied to any data-rich environment.

evalml - EvalML is an AutoML library written in python.

  •    Python

EvalML is an AutoML library which builds, optimizes, and evaluates machine learning pipelines using domain-specific objective functions.

open_source_demos - A collection of demos showcasing automated feature engineering and machine learning in diverse use cases

  •    Jupyter

This repository consists of a series of demos that leverage EvalML, Featuretools, Woodwork, and Compose. The demos rely on a different subset of these libraries and to various levels of complexity. Building an accurate machine learning model requires several important steps. One of the most complex and time consuming is extracting information through features. Finding the right features is a crucial component of both interpreting the dataset as a whole as well as building a model with great predictive power. Another core component of any machine learning process is selecting the right estimator to use for the problem at hand. By combining the best features with the most accurate estimator and its corresponding hyperparameters, we can build a machine learning model that can generalize well to unknown data. Just as the process of feature engineering is made simple by Featuretools, we have made automated machine learning easy to implement using EvalML.

open-solution-home-credit - Open solution to the Home Credit Default Risk challenge :house_with_garden:

  •    Python

This is an open solution to the Home Credit Default Risk challenge 🏡. In this open source solution you will find references to the neptune.ml. It is free platform for community Users, which we use daily to keep track of our experiments. Please note that using neptune.ml is not necessary to proceed with this solution. You may run it as plain Python script 🐍.

featran - A Scala feature transformation library for data science and machine learning

  •    Scala

Featran, also known as Featran77 or F77 (get it?), is a Scala library for feature transformation. It aims to simplify the time consuming task of feature engineering in data science and machine learning processes. It supports various collection types for feature extraction and output formats for feature representation.We can implement this in a naive way using reduce and map.

protr - Comprehensive toolkit for generating various numerical features of protein sequences

  •    R

Comprehensive toolkit for generating various numerical features of protein sequences described in Xiao et al. (2015) <DOI:10.1093/bioinformatics/btv042> (PDF). Nan Xiao, Dong-Sheng Cao, Min-Feng Zhu, and Qing-Song Xu. (2015). protr/ProtrWeb: R package and web server for generating various numerical representation schemes of protein sequences. Bioinformatics 31 (11), 1857-1859.

feng - feng - feature engineering for machine-learning champions

  •    Python

feng is a Python module for smoothly engineering features from your Pandas DataFrame so that you can win that Kaggle competition. We spent most of our efforts in feature engineering.

cortana-intelligence-customer360 - This repository contains instructions and code to deploy a customer 360 profile solution on Azure stack using the Cortana Intelligence Suite

  •    Python

The Customer 360 solution provides you a scalable way to build a customer profile enriched by machine learning. It also allows you to uniformly access and operate on data across disparate data sources (while minimizing raw data movement) and leverage the power of Microsoft R Server for scalable modelling and accurate predictions. Ingestion and Pre-processing: Ingest, prepare, and aggregate live user activity data.

home-credit-default-risk - Default risk prediction for Home Credit competition - Fast, scalable and maintainable SQL-based feature engineering pipeline

  •    Python

This is code I built for the Home Credit default risk competition on Kaggle. This should be seen more as an ML engineering achievement than a data science top of the line prediction model. First of all, due to time constraints this is not a top scorer. First rank was 0.80570 AUC (499 submissions), this is 0.78212 AUC (12 submissions).

lambdo - A column-oriented approach to feature engineering

  •    Python

Lambdo is a workflow engine which significantly simplifies the analysis process by unifying feature engineering and machine learning operations. Lambdo data analysis workflow does not distinguish between them and any node can be treated either as a feature or as prediction, and both of them can be trained.

bytehub - ByteHub: making feature stores simple

  •    Python

An easy-to-use feature store. A feature store is a data storage system for data science and machine-learning. It can store raw data and also transformed features, which can be fed straight into an ML model or training script.

We have large collection of open source products. Follow the tags from Tag Cloud >>

Open source products are scattered around the web. Please provide information about the open source projects you own / you use. Add Projects.