Displaying 1 to 5 from 5 results

MMLSpark - Microsoft Machine Learning for Apache Spark


MMLSpark provides a number of deep learning and data science tools for Apache Spark, including seamless integration of Spark Machine Learning pipelines with Microsoft Cognitive Toolkit (CNTK) and OpenCV, enabling you to quickly create powerful, highly-scalable predictive and analytical models for large image and text datasets.MMLSpark requires Scala 2.11, Spark 2.1+, and either Python 2.7 or Python 3.5+. See the API documentation for Scala and for PySpark.

python_mozetl - ETL jobs for Firefox Telemetry


This repository is a collection of ETL jobs for Firefox Telemetry.Jobs committed to python_mozet can be scheduled via airflow or ATMO. We provide a testing suite and code review, which makes your job more maintainable. Centralizing our jobs in one repository allows for code reuse and easier collaboration.

PySparkGeoAnalysis - :globe_with_meridians: Interactive Workshop on GeoAnalysis using PySpark


This workshop will introduce you to Apache Spark via the exciting domain of Geospatial Analysis.

data-analytics-machine-learning-big-data - Slides, code and more for my class: Data Analytics and Machine Learning on Big Data


If you want to install and run everything on your computer, here are the best tutorials I've found for getting Python and Spark running on your computer. In order to visualize the decision trees in Jupyter, you will need to install Graphviz as well as the Python package.