Displaying 1 to 6 from 6 results

spark-py-notebooks - Apache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks

  •    Jupyter

This is a collection of IPython notebook/Jupyter notebooks intended to train the reader on different Apache Spark concepts, from basic to advanced, by using the Python language. If Python is not your language, and it is R, you may want to have a look at our R on Apache Spark (SparkR) notebooks instead. Additionally, if your are interested in being introduced to some basic Data Science Engineering, you might find these series of tutorials interesting. There we explain different concepts and applications using Python and R.

data-analytics-machine-learning-big-data - Slides, code and more for my class: Data Analytics and Machine Learning on Big Data

  •    Jupyter

If you want to install and run everything on your computer, here are the best tutorials I've found for getting Python and Spark running on your computer. In order to visualize the decision trees in Jupyter, you will need to install Graphviz as well as the Python package.

dllib - dllib is a distributed deep learning library running on Apache Spark

  •    CSS

dllib is a distributed deep learning framework running on Apache Spark. See more detail in documentation. dllib is designed to be simple and easy to use for Spark users. Since dllib has completely same interface of MLlib algorithms, libraries in MLlib can be used for feature engineering or transformation.

mlfeature - Feature engineering toolkit for Spark MLlib.

  •    Scala

VarianceSelector is a simple baseline approach to feature selection. It removes all features whose variance doesn’t meet some threshold. By default, it removes all zero-variance features, i.e. features that have the same value in all samples. Put NULLs and values out of bounds into a special bucket as well as NaN.




Play-Spark-Scala

  •    Scala

Apache Spark is a fast and general-purpose cluster computing system. It provides high-level APIs in Scala, Java, and Python that make parallel jobs easy to write, and an optimized engine that supports general computation graphs. It also supports a rich set of higher-level tools including Shark (Hive on Spark), MLlib for machine learning, GraphX for graph processing, and Spark Streaming. This is Spark Application which is built in Play 2.2.0. We can build it in any Play version. One thing that we have to keep in mind is the Akka version should be compatible to both Spark & Play. So, check the Akka version in Spark & Play that are inbuilt.