Course materials for General Assembly's Data Science course in Washington, DC (8/18/15 - 10/29/15).
data-science machine-learning scikit-learn data-analysis pandas jupyter-notebook course linear-regression logistic-regression model-evaluation naive-bayes natural-language-processing decision-trees ensemble-learning clustering regular-expressions web-scraping data-visualization data-cleaningStacked ensembles are simple in theory. You combine the predictions of smaller models and feed those into another model. However, in practice, implementing them can be a major headache. Xcessiv holds your hand through all the implementation details of creating and optimizing stacked ensembles so you're free to fully define only the things you care about.
machine-learning ensemble-learning stacked-ensembles scikit-learn data-science hyperparameter-optimization automated-machine-learningNew to MLJ? Start here. Wanting to integrate an existing machine learning model into the MLJ framework? Start here.
data-science machine-learning statistics pipeline clustering julia pipelines regression tuning classification ensemble-learning predictive-modeling tuning-parameters stackingAutoGluon automates machine learning tasks enabling you to easily achieve strong predictive performance in your applications. With just a few lines of code, you can train and deploy high-accuracy machine learning and deep learning models on image, text, and tabular data.
data-science machine-learning natural-language-processing computer-vision deep-learning mxnet scikit-learn tabular-data pytorch image-classification ensemble-learning object-detectionMerlion is a Python library for time series intelligence. It provides an end-to-end machine learning framework that includes loading and transforming data, building and training models, post-processing model outputs, and evaluating model performance. It supports various time series learning tasks, including forecasting and anomaly detection for both univariate and multivariate time series. This library aims to provide engineers and researchers a one-stop solution to rapidly develop models for their specific time series needs, and benchmark them across multiple time series datasets. The table below provides a visual overview of how Merlion's key features compare to other libraries for time series anomaly detection and/or forecasting.
benchmarking machine-learning time-series forecasting ensemble-learning automl anomaly-detectionInstructions for how to install the necessary software for this tutorial is available here. Data for the tutorial can be downloaded by running ./data/get-data.sh (requires wget). Certain algorithms don't scale well when there are millions of features. For example, decision trees require computing some sort of metric (to determine the splits) on all the feature values (or some fraction of the values as in Random Forest and Stochastic GBM). Therefore, computation time is linear in the number of features. Other algorithms, such as GLM, scale much better to high-dimensional (n << p) and wide data with appropriate regularization (e.g. Lasso, Elastic Net, Ridge).
machine-learning deep-learning random-forest gradient-boosting-machine tutorial data-science ensemble-learning rML-Ensemble combines a Scikit-learn high-level API with a low-level computational graph framework to build memory efficient, maximally parallelized ensemble networks in as few lines of codes as possible. ML-Ensemble is thread safe as long as base learners are and can fall back on memory mapped multiprocessing for memory-neutral process-based concurrency. For tutorials and full documentation, visit the project website.
ensemble-learning machine-learning ensemble learners stacking stack ensemblesAutoMLPipeline is a package that makes it trivial to create complex ML pipeline structures using simple expressions. It leverages on the built-in macro programming features of Julia to symbolically process, manipulate pipeline expressions, and makes it easy to discover optimal structures for machine learning regression and classification. Just take note that + has higher priority than |> so if you are not sure, enclose the operations inside parentheses.
data-science machine-learning data-mining pipeline julia classification ensemble-learning data-mining-algorithms symbolic-expressions automl stacking chaining machine-learning-models pipeline-optimization pipeline-structure scikitlearn-wrapper symbolic-pipelineenpls offers an algorithmic framework for measuring feature importance, outlier detection, model applicability domain evaluation, and ensemble predictive modeling with (sparse) partial least squares regressions. See the vignette (or open with vignette("enpls") in R) for a quick-start guide.
machine-learning ensemble-learning outlier-detection partial-least-squares-regression dimensionality-reduction chemometricsWIP Machine learning library, written in J. Various algorithm implementations, including MLPClassifiers, MLPRegressors, Mixture Models, K-Means, KNN, RBF-Network, Self-organizing Maps. Models can be serialized to text files, with a mixture of text and binary packing. The size of the serialized file depends on the size of the model, but will probably range from 10 MB and upwards for NN models (including convnets and rec-nets).
machine-learning convolutional-neural-networks j deep-learning gaussian-mixture-models gaussian-processes self-organizing-map principal-component-analysis k-means hierarchical-clustering lstm ensemble-learning learning rbm restricted-boltzmann-machines multilayer-perceptron-network knn-classifier clusteringThis is a sklearn implementation of the machine-learning DeepSuperLearner algorithm, A Deep Ensemble method for Classification Problems. For details about DeepSuperLearner please refer to the https://arxiv.org/abs/1803.02323: Deep Super Learner: A Deep Ensemble for Classification Problems by Steven Young, Tamer Abdou, and Ayse Bener.
machine-learning ensemble-learning classification-algorithmThe subsemble package is an R implementation of the Subsemble algorithm. Subsemble is a general subset ensemble prediction method, which can be used for small, moderate, or large datasets. Subsemble partitions the full dataset into subsets of observations, fits a specified underlying algorithm on each subset, and uses a unique form of k-fold cross-validation to output a prediction function that combines the subset-specific fits. An oracle result provides a theoretical performance guarantee for Subsemble. Stephanie Sapp, Mark J. van der Laan & John Canny. Subsemble: An ensemble method for combining subset-specific algorithm fits. Journal of Applied Statistics, 41(6):1247-1259, 2014.
ensemble ensemble-learning cross-validation machine-learning machine-learning-algorithms r big-dataMassive On-line Analysis is an environment for massive data mining. MOA provides a framework for data stream mining and includes tools for evaluation and a collection of machine learning algorithms. Related to the WEKA project, also written in Java, while scaling to more demanding problems.
moa datastream classification ensemble-learning random-forest concept-drift decision-trees ensemblesurvtmle is an R package designed to use targeted minimum loss-based estimation (TMLE) to compute covariate-adjusted marginal cumulative incidence estimates in right-censored survival settings with and without competing risks. The estimates can leverage ensemble machine learning via the SuperLearner package. If you encounter any bugs or have any specific feature requests, please file an issue.
survival-analysis tmle competing-risks ensemble-learningRandom Forest Library In Python Compatible with Scikit-Learn
data-science machine-learning random-forest scikit-learn machine-learning-algorithms regression pandas classification ensemble-learning decision-tree
We have large collection of open source products. Follow the tags from
Tag Cloud >>
Open source products are scattered around the web. Please provide information
about the open source projects you own / you use.
Add Projects.