Consider TPOT your Data Science Assistant. TPOT is a Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming.TPOT will automate the most tedious part of machine learning by intelligently exploring thousands of possible pipelines to find the best one for your data.
machine-learning data-science automl automation scikit-learn hyperparameter-optimization model-selection parameter-tuning automated-machine-learning random-forest gradient-boosting feature-engineering xgboost genetic-programmingXGBoost is an optimized distributed gradient boosting library designed to be highly efficient, flexible and portable. It implements machine learning algorithms under the Gradient Boosting framework. XGBoost provides a parallel tree boosting (also known as GBDT, GBM) that solve many data science problems in a fast and accurate way. The same code runs on major distributed environment (Hadoop, SGE, MPI) and can solve problems beyond billions of examples.XGBoost has been developed and used by a group of active community members. Your help is very valuable to make the package better for everyone.
gbdt gbrt gbm distributed-systems xgboost gradient-boosting histogramThis project aims at a minimal benchmark for scalability, speed and accuracy of commonly used implementations of a few machine learning algorithms. The target of this study is binary classification with numeric and categorical inputs (of limited cardinality i.e. not very sparse) and no missing data, perhaps the most common problem in business applications (e.g. credit scoring, fraud detection or churn prediction). If the input matrix is of n x p, n is varied as 10K, 100K, 1M, 10M, while p is ~1K (after expanding the categoricals into dummy variables/one-hot encoding). This particular type of data structure/size (the largest) stems from this author's interest in some particular business applications. Note: While a large part of this benchmark was done in Spring 2015 reflecting the state of ML implementations at that time, this repo is being updated if I see significant changes in implementations or new implementations have become widely available (e.g. lightgbm). Also, please find a summary of the progress and learnings from this benchmark at the end of this repo.
machine-learning data-science r gradient-boosting-machine random-forest deep-learning xgboost h2o sparkauto_ml is designed for production. Here's an example that includes serializing and loading the trained model, then getting predictions on single dictionaries, roughly the process you'd likely follow to deploy the trained model. All of these projects are ready for production. These projects all have prediction time in the 1 millisecond range for a single prediction, and are able to be serialized to disk and loaded into a new environment after training.
machine-learning data-science automated-machine-learning gradient-boosting scikit-learn machine-learning-pipelines machine-learning-library production-ready automl lightgbm analytics feature-engineering hyperparameter-optimization deep-learning xgboost keras deeplearning tensorflow artificial-intelligenceAlink is the Machine Learning algorithm platform based on Flink, developed by the PAI team of Alibaba computing platform.
machine-learning data-mining statistics kafka graph-algorithms clustering word2vec regression xgboost classification recommender recommender-system apriori feature-engineering flink fm flink-ml flink-machine-learningGive an input CSV file and a target field you want to predict to automl-gs, and get a trained high-performing machine learning or deep learning model plus native Python code pipelines allowing you to integrate that model into any prediction workflow. No black box: you can see exactly how the data is processed, how the model is constructed, and you can make tweaks as necessary. automl-gs is an AutoML tool which, unlike Microsoft's NNI, Uber's Ludwig, and TPOT, offers a zero code/model definition interface to getting an optimized model and data transformation pipeline in multiple popular ML/DL frameworks, with minimal Python dependencies (pandas + scikit-learn + your framework of choice). automl-gs is designed for citizen data scientists and engineers without a deep statistical background under the philosophy that you don't need to know any modern data preprocessing and machine learning engineering techniques to create a powerful prediction workflow.
machine-learning tensorflow keras xgboost automlMars is a tensor-based unified framework for large-scale data computation which scales numpy, pandas, scikit-learn and many other libraries. More details about installing Mars can be found at installation section in Mars document.
machine-learning tensorflow numpy scikit-learn pandas pytorch xgboost lightgbm tensor dask ray dataframe statsmodels joblibI just built out v2 of this project that now gives you analytics info from your models, and is production-ready. machineJS is an amazing research project that clearly proved there's a hunger for automated machine learning. auto_ml tackles this exact same goal, but with more features, cleaner code, and the ability to be copy/pasted into production.
machine-learning data-science machine-learning-library machine-learning-algorithms ml data-scientists javascript-library scikit-learn kaggle numerai automated-machine-learning automl auto-ml neuralnet neural-network algorithms random-forest svm naive-bayes bagging optimization brainjs date-night sklearn ensemble data-formatting js xgboost scikit-neuralnetwork knn k-nearest-neighbors gridsearch gridsearchcv grid-search randomizedsearchcv preprocessing data-formatter kaggle-competitionThis is an open solution to the Home Credit Default Risk challenge 🏡. In this open source solution you will find references to the neptune.ml. It is free platform for community Users, which we use daily to keep track of our experiments. Please note that using neptune.ml is not necessary to proceed with this solution. You may run it as plain Python script 🐍.
machine-learning deep-learning kaggle pipeline feature-engineering reproducible-experiments reproducibility pipeline-framework lightgbm xgboost neptune competition credit-scoring credit-risk open-source python3 python35It is a Tiny implement of Gradient Boosting tree, based on XGBoost's scoring function and SLIQ's efficient tree building algorithm. TGBoost build the tree in a level-wise way as in SLIQ (by constructing Attribute list and Class list). Currently, TGBoost support parallel learning on single machine, the speed and memory consumption are comparable to XGBoost. Handle missing value, XGBoost learn a direction for those with missing value, the direction is left or right. TGBoost take a different approach: it enumerate missing value go to left child, right child and missing value child, then choose the best one. So TGBoost use Ternary Tree.
boosted-trees gradient-boosting-machine machine-learning xgboost sliqIn this repo we compare two of the fastest boosted decision tree libraries: XGBoost and LightGBM. We will evaluate them across datasets of several domains and different sizes.On July 25, 2017, we published a blog post evaluating both libraries and discussing the benchmark results. The post is Lessons Learned From Benchmarking Fast Machine Learning Algorithms.
lightgbm xgboost boosted-trees machine-learning gpu benchmark azure distributed-systems gbdt gbm gbrt kaggleTraining AI machine learning models on the Fashion MNIST dataset. Fashion-MNIST is a dataset consisting of 70,000 images (60k training and 10k test) of clothing objects, such as shirts, pants, shoes, and more. Each example is a 28x28 grayscale image, associated with a label from 10 classes. The 10 classes are listed below.
mnist fashion dataset fashion-mnist machine-learning artificial-intelligence artificial-neural-networks support-vector-machines svm xgboost data-science r supervised-learning classification image-recognition image-classificationNote: A few people asked me for the challenge's data source. Unfortunately, I am not authorized to publicly release it - if you need it, please do send the request to either Mindef or Dextra.sg instead of sending it to me. Only Native XGBoost was recorded since it just dominated everything.
xgboost classification machine-learning data-scienceAll my submissions for Kaggle contests that I have been, and going to be participating. I will probably have everything written in Python (utilizing scikit-learn or similar libraries), but occasionally I might also use R or Haskell if I can.
kaggle xgboost mnist neural-network scikit-learn kaggle-contestMy goal for this minimal data science blog series is not only sharing, tutorializing, but also, making personal notes while learning and working as a Data Scientist. I’m looking forward to receiving any feedback from you. Chapter-1: Classify StarCraft 2 players with Python Pandas and Scikit-learn.
blog-series data-science scikit-learn xgboost kaggle machine-learningThis project adds a predictive model for understanding the dynamics of gender in intro CS at Berkeley for years 2014 through 2015. This work builds on previous research done in fulfillment of a Computer Science Education Ph.D., HipHopathy, A Socio-Curricular Study of Introductory Computer Science. The dataset used in this project not available for mass consumption, as it contains sensitive, personally identifiable student data. To generate this dataset, I created a survey that includes the following attributes for each data point based on a Likert scale of 1 to 5, 1 for strongly disagree and 5 corresponding to strongly agree. Some items have yes and no answers.
computer-science analytics xgboost machine-learningZoltar is a common library for serving TensorFlow, XGBoost and scikit-learn models in production. See Zoltar docs for details. Copyright 2018 Spotify AB.
model-serving machine-learning xgboost tensorflowPredict people interest in renting specific apartments. The challenge combines structured data, geolocalization, time data, free text and images. This solution features Gradient Boosted Trees (XGBoost and LightGBM) and does not use stacking, due to lack of time.
kaggle kaggle-competition machine-learning geolocalization xgboost gradient-boosting lightgbm clustering natural-language-processingWelcome to my repo to build Data Science, Machine Learning, Computer Vision, Natural language Processing and Deep Learning packages from source. My Data Science environment is running from a LXC container so Tensorflow build system, bazel, must be build with its auto-sandboxing disabled.
archlinux data-science machine-learning deep-learning package tensorflow scikit-learn mxnet opencv nervana pandas cudnn cuda pytorch spacy natural-language-processing natural-language-understanding xgboost lightgbm mklMachine learning algorithms create potentially more accurate models than linear models, but any increase in accuracy over more traditional, better-understood, and more easily explainable techniques is not practical for those who must explain their models to regulators or customers. For many decades, the models created by machine learning algorithms were generally taken to be black-boxes. However, a recent flurry of research has introduced credible techniques for interpreting complex, machine-learned models. Materials presented here illustrate applications or adaptations of these techniques for practicing data scientists. Want to contribute your own examples? Just make a pull request.
machine-learning jupyter-notebooks interpretability data-science data-mining h2o mli xai fatml transparency accountability fairness xgboost
We have large collection of open source products. Follow the tags from
Tag Cloud >>
Open source products are scattered around the web. Please provide information
about the open source projects you own / you use.
Add Projects.