MMLSpark provides a number of deep learning and data science tools for Apache Spark, including seamless integration of Spark Machine Learning pipelines with Microsoft Cognitive Toolkit (CNTK) and OpenCV, enabling you to quickly create powerful, highly-scalable predictive and analytical models for large image and text datasets.MMLSpark requires Scala 2.11, Spark 2.1+, and either Python 2.7 or Python 3.5+. See the API documentation for Scala and for PySpark.
Its features include :
Tags | machine-learning spark cntk pyspark azure microsoft-machine-learning microsoft ml |
Implementation | Scala |
License | MIT |
Platform | OS-Independent |
This repository contains walkthroughs, templates and documentation related to Machine Learning & Data Science services and platforms on Azure. Services and platforms include Data Science Virtual Machine, Azure ML, HDInsight, Microsoft R Server, SQL-Server, Azure Data Lake etc.There are also materials from tutorials we have delivered at KDD, Strata etc., using the above services and platforms.
John Snow Labs Spark-NLP is a natural language processing library built on top of Apache Spark ML. It provides simple, performant & accurate NLP annotations for machine learning pipelines, that scale easily in a distributed environment. This library has been uploaded to the spark-packages repository https://spark-packages.org/package/JohnSnowLabs/spark-nlp .
nlp nlu natural-language-processing natural-language-understanding spark spark-ml pyspark machine-learning named-entity-recognition sentiment-analysis lemmatizer spell-checker tokenizer entity-extraction stemmer part-of-speech-tagger annotation-frameworkThe Microsoft Cognitive Toolkit is a free, easy-to-use, open-source, commercial-grade toolkit that trains deep learning algorithms to learn like the human brain. It is a unified deep-learning toolkit that describes neural networks as a series of computational steps via a directed graph.
deep-learning neural-networks artificial-intelligence.NET for Apache Spark provides high performance APIs for using Apache Spark from C# and F#. With these .NET APIs, you can access the most popular Dataframe and SparkSQL aspects of Apache Spark, for working with structured data, and Spark Structured Streaming, for working with streaming data. .NET for Apache Spark is compliant with .NET Standard - a formal specification of .NET APIs that are common across .NET implementations. This means you can use .NET for Apache Spark anywhere you write .NET code allowing you to reuse all the knowledge, skills, code, and libraries you already have as a .NET developer.
spark analytics bigdata spark-streaming spark-sql machine-learning fsharp dotnet-core dotnet-standard streaming apache-spark tpcds tpch azure hdinsight databricks emr microsoftConnectTheDots.io is an open source project created by Microsoft to help you get tiny devices connected to Microsoft Azure IoT and to implement great IoT solutions taking advantage of Microsoft Azure advanced analytic services such as Azure Stream Analytics and Azure Machine Learning.The project is built with the assumption that the sensors get the raw data and format it into a JSON string. That string is then sent to Azure IoT Hub, from which a Web app gathers the data and displays it as a chart. Optional other functions of the Azure cloud include detecting and displaying alerts and averages, however this is not required.
This repository is aimed to provide simple and ready-to-use tutorials for CNTK. Each tutorial has a source code. Deep Learning is of great interest these days - there's a need for rapid and optimized implementations of the algorithms and deep architectures. Microsoft Cognitive Toolkit (CNTK) is designed to provide a free and fast-and-easy platform for facilitating the deep learning architecture design and implementation. CNTK demonstrated to be superior compared to the famous TensorFlow in performance (Benchmarking State-of-the-Art Deep Learning Software Tools: report, paper).
deep-learning tutorial microsoft frameworkThis is a collection of IPython notebook/Jupyter notebooks intended to train the reader on different Apache Spark concepts, from basic to advanced, by using the Python language. If Python is not your language, and it is R, you may want to have a look at our R on Apache Spark (SparkR) notebooks instead. Additionally, if your are interested in being introduced to some basic Data Science Engineering, you might find these series of tutorials interesting. There we explain different concepts and applications using Python and R.
spark pyspark data-analysis mllib ipython-notebook notebook ipython data-science machine-learning big-data bigdataTransmogrifAI (pronounced trăns-mŏgˈrə-fī) is an AutoML library written in Scala that runs on top of Spark. It was developed with a focus on accelerating machine learning developer productivity through machine learning automation, and an API that enforces compile-time type-safety, modularity, and reuse. Through automation, it achieves accuracies close to hand-tuned models with almost 100x reduction in time. Skip to Quick Start and Documentation.
ml automl transformations estimators dsl pipelines machine-learning salesforce einstein features feature-engineering spark sparkml ai automated-machine-learning transmogrification transmogrify structured-data transformersNew: check out our hands-on tutorial.Photon Machine Learning (Photon ML) is a machine learning library based upon Apache Spark originally developed by the LinkedIn Machine Learning Algorithms team.
The content and code on this repo is intended for computer science instruction as a collaboration with Microsoft developer advocates and Faculty / Students under the MIT license. Please check back regularly for updated versions. This repo provides technical resources to help students and faculty learn about Azure and teach others. The content covers cross-platform scenarios in AI and machine learning, data science, web development, mobile app dev, internet of things, and devops.
This project is not actively maintained anymore please see Seldon Core. Seldon Server is a machine learning platform that helps your data science team deploy models into production.
machine-learning deep-learning deployment kubernetes docker microservices spark kafka kafka-streams tensorflow cloud aws gcp azure seldon recommender-system recommendation-engine predictionThis project aims at a minimal benchmark for scalability, speed and accuracy of commonly used implementations of a few machine learning algorithms. The target of this study is binary classification with numeric and categorical inputs (of limited cardinality i.e. not very sparse) and no missing data, perhaps the most common problem in business applications (e.g. credit scoring, fraud detection or churn prediction). If the input matrix is of n x p, n is varied as 10K, 100K, 1M, 10M, while p is ~1K (after expanding the categoricals into dummy variables/one-hot encoding). This particular type of data structure/size (the largest) stems from this author's interest in some particular business applications. Note: While a large part of this benchmark was done in Spring 2015 reflecting the state of ML implementations at that time, this repo is being updated if I see significant changes in implementations or new implementations have become widely available (e.g. lightgbm). Also, please find a summary of the progress and learnings from this benchmark at the end of this repo.
machine-learning data-science r gradient-boosting-machine random-forest deep-learning xgboost h2o sparkLike my work? I am Principal Consultant at Data Syndrome, a consultancy offering assistance and training with building full-stack analytics products, applications and systems. Find us on the web at datasyndrome.com. There is now a video course using code from chapter 8, Realtime Predictive Analytics with Kafka, PySpark, Spark MLlib and Spark Streaming. Check it out now at datasyndrome.com/video.
data-syndrome data data-science analytics apache-spark apache-kafka kafka spark predictive-analytics machine-learning machine-learning-algorithms airflow python-3 python3 amazon-ec2 agile-data agile-data-science vagrant amazon-web-servicesUnity Machine Learning Agents (ML-Agents) is an open-source Unity plugin that enables games and simulations to serve as environments for training intelligent agents. Agents can be trained using reinforcement learning, imitation learning, neuroevolution, or other machine learning methods through a simple-to-use Python API. We also provide implementations (based on TensorFlow) of state-of-the-art algorithms to enable game developers and hobbyists to easily train intelligent agents for 2D, 3D and VR/AR games. These trained agents can be used for multiple purposes, including controlling NPC behavior (in a variety of settings such as multi-agent and adversarial), automated testing of game builds and evaluating different game design decisions pre-release. ML-Agents is mutually beneficial for both game developers and AI researchers as it provides a central platform where advances in AI can be evaluated on Unity’s rich environments and then made accessible to the wider research and game developer communities. For more information, in addition to installation and usage instructions, see our documentation home. If you have used a version of ML-Agents prior to v0.3, we strongly recommend our guide on migrating to v0.3.
reinforcement-learning unity3d deep-learning unity deep-reinforcement-learning neural-networksA curated, but probably biased and incomplete, list of awesome machine learning interpretability resources. If you want to contribute to this list (and please do!) read over the contribution guidelines, send a pull request, or contact me @jpatrickhall.
fairness xai interpretability iml fatml accountability transparency machine-learning data-science data-mining r awesome awesome-list machine-learning-interpretability interpretable-machine-learning interpretable-ml interpretable-ai interpretable-deep-learning explainable-mlThe Accord.NET project provides machine learning, statistics, artificial intelligence, computer vision and image processing methods to .NET. It can be used on Microsoft Windows, Xamarin, Unity3D, Windows Store applications, Linux or mobile.
machine-learning framework c-sharp nuget visual-studio statistics unity3d neural-network support-vector-machines computer-vision image-processing ffmpegThis project provides a set of PHP client libraries that make it easy to access Microsoft Azure tables, blobs, queues, service bus (queues and topics), service runtime and service management APIs. For documentation on how to host PHP applications on Microsoft Azure, please see the Microsoft Azure PHP Developer Center.The recommended way to resolve dependencies is to install them using the Composer package manager.
azure microsoft-azure-sdk sdkI just built out v2 of this project that now gives you analytics info from your models, and is production-ready. machineJS is an amazing research project that clearly proved there's a hunger for automated machine learning. auto_ml tackles this exact same goal, but with more features, cleaner code, and the ability to be copy/pasted into production.
machine-learning data-science machine-learning-library machine-learning-algorithms ml data-scientists javascript-library scikit-learn kaggle numerai automated-machine-learning automl auto-ml neuralnet neural-network algorithms random-forest svm naive-bayes bagging optimization brainjs date-night sklearn ensemble data-formatting js xgboost scikit-neuralnetwork knn k-nearest-neighbors gridsearch gridsearchcv grid-search randomizedsearchcv preprocessing data-formatter kaggle-competitionThis project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.
dmtk multiverso lightgbm microsoft machine-learningThe Oryx open source project provides infrastructure for lambda-architecture applications on top of Spark, Spark Streaming and Kafka. On this, it provides further support for real-time, large scale machine learning, and end-to-end applications of this support for common machine learning use cases, like recommendations, clustering, classification and regression.
lambda lambda-architecture oryx apache-spark machine-learning kafka classification clustering
We have large collection of open source products. Follow the tags from
Tag Cloud >>
Open source products are scattered around the web. Please provide information
about the open source projects you own / you use.
Add Projects.