pyspark-notebook - Pyspark Notebook With Docker

  •        45

Run your docker with docker-compose. It helps to keep your arguments/settings in a single file and run together in an isolated environment.

http://blog.prabeeshk.com/blog/2015/06/19/pyspark-notebook-with-docker/
https://github.com/prabeesh/pyspark-notebook

Tags
Implementation
License
Platform

   




Related Projects

spark-py-notebooks - Apache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks

  •    Jupyter

This is a collection of IPython notebook/Jupyter notebooks intended to train the reader on different Apache Spark concepts, from basic to advanced, by using the Python language. If Python is not your language, and it is R, you may want to have a look at our R on Apache Spark (SparkR) notebooks instead. Additionally, if your are interested in being introduced to some basic Data Science Engineering, you might find these series of tutorials interesting. There we explain different concepts and applications using Python and R.

Optimus - :truck: Agile Data Science Workflows made easy with Python and Spark.

  •    Python

Optimus is the missing framework to profile, clean, process and do ML in a distributed fashion using Apache Spark(PySpark). You can go to the 10 minutes to Optimus notebook where you can find the basic to start working.

sparkmagic - Jupyter magics and kernels for working with remote Spark clusters

  •    Python

Sparkmagic is a set of tools for interactively working with remote Spark clusters through Livy, a Spark REST server, in Jupyter notebooks. The Sparkmagic project includes a set of magics for interactively running Spark code in multiple languages, as well as some kernels that you can use to turn Jupyter into an integrated Spark environment. There are two ways to use sparkmagic. Head over to the examples section for a demonstration on how to use both models of execution.

spark-nlp - Natural Language Understanding Library for Apache Spark.

  •    Jupyter

John Snow Labs Spark-NLP is a natural language processing library built on top of Apache Spark ML. It provides simple, performant & accurate NLP annotations for machine learning pipelines, that scale easily in a distributed environment. This library has been uploaded to the spark-packages repository https://spark-packages.org/package/JohnSnowLabs/spark-nlp .

Agile_Data_Code_2 - Code for Agile Data Science 2.0, O'Reilly 2017, Second Edition

  •    Jupyter

Like my work? I am Principal Consultant at Data Syndrome, a consultancy offering assistance and training with building full-stack analytics products, applications and systems. Find us on the web at datasyndrome.com. There is now a video course using code from chapter 8, Realtime Predictive Analytics with Kafka, PySpark, Spark MLlib and Spark Streaming. Check it out now at datasyndrome.com/video.


docker-stacks - Ready-to-run Docker images containing Jupyter applications

  •    Dockerfile

Jupyter Docker Stacks are a set of ready-to-run Docker images containing Jupyter applications and interactive computing tools. The two examples below may help you get started if you have Docker installed know which Docker image you want to use, and want to launch a single Jupyter Notebook server in a container.

pipeline - PipelineAI: Real-Time Enterprise AI Platform

  •    HTML

Each model is built into a separate Docker image with the appropriate Python, C++, and Java/Scala Runtime Libraries for training or prediction. Use the same Docker Image from Local Laptop to Production to avoid dependency surprises.

jupyterhub-deploy-docker - Reference deployment of JupyterHub with docker

  •    Python

jupyterhub-deploy-docker provides a reference deployment of JupyterHub, a multi-user Jupyter Notebook environment, on a single host using Docker. This deployment is NOT intended for a production environment. It is a reference implementation that does not meet traditional requirements in terms of availability nor scalability.

MMLSpark - Microsoft Machine Learning for Apache Spark

  •    Scala

MMLSpark provides a number of deep learning and data science tools for Apache Spark, including seamless integration of Spark Machine Learning pipelines with Microsoft Cognitive Toolkit (CNTK) and OpenCV, enabling you to quickly create powerful, highly-scalable predictive and analytical models for large image and text datasets.MMLSpark requires Scala 2.11, Spark 2.1+, and either Python 2.7 or Python 3.5+. See the API documentation for Scala and for PySpark.

spark-movie-lens - An on-line movie recommender using Spark, Python Flask, and the MovieLens dataset

  •    Jupyter

This Apache Spark tutorial will guide you step-by-step into how to use the MovieLens dataset to build a movie recommender using collaborative filtering with Spark's Alternating Least Saqures implementation. It is organised in two parts. The first one is about getting and parsing movies and ratings data into Spark RDDs. The second is about building and using the recommender and persisting it for later use in our on-line recommender system. This tutorial can be used independently to build a movie recommender model based on the MovieLens dataset. Most of the code in the first part, about how to use ALS with the public MovieLens dataset, comes from my solution to one of the exercises proposed in the CS100.1x Introduction to Big Data with Apache Spark by Anthony D. Joseph on edX, that is also publicly available since 2014 at Spark Summit. Starting from there, I've added with minor modifications to use a larger dataset, then code about how to store and reload the model for later use, and finally a web service using Flask.

dockerspawner - Spawns JupyterHub single user servers in Docker containers

  •    Python

DockerSpawner enables JupyterHub to spawn single user notebook servers in Docker containers. JupyterHub 0.7 or above is required, which also means Python 3.3 or above.

deepschool.io - Deep Learning tutorials in jupyter notebooks.

  •    Jupyter

See here for installing on windows. 1: Refer to this Dockerfile and this for information on how the docker image was built.

jupyter-notify - A Jupyter Notebook magic for browser notifications of cell completion

  •    Python

This package provides a Jupyter notebook cell magic %%notify that notifies the user upon completion of a potentially long-running cell via a browser push notification. Use cases include long-running machine learning models, grid searches, or Spark computations. This magic allows you to navigate away to other work (or even another Mac desktop entirely) and still get a notification when your cell completes. Clicking on the body of the notification will bring you directly to the browser window and tab with the notebook, even if you're on a different desktop (clicking the "Close" button in the notification will keep you where you are). The extension has currently been tested in Chrome (Version: 58.0.3029) and Firefox (Version: 53.0.3).

CarND-TensorFlow-Lab - TensorFlow Lab for Self-Driving Car ND

  •    Jupyter

We've prepared a Jupyter notebook that will guide you through the process of creating a single layer neural network in TensorFlow. If you don't have Docker already, download and install Docker from here.

binderhub - Deterministically build docker images from a git repository + commit

  •    Python

BinderHub allows you to BUILD and REGISTER a Docker image using a GitHub repository, then CONNECT with JupyterHub, allowing you to create a public IP address that allows users to interact with the code and environment within a live JupyterHub instance. You can select a specific branch name, commit, or tag to serve. BinderHub is created using Python, kubernetes, tornado, and traitlets. As such, it should be a familiar technical foundation for Jupyter developers.

book - Deep Learning 101 with PaddlePaddle

  •    HTML

This book you are reading is interactive -- each chapter can run as a Jupyter Notebook. We packed this book, Jupyter, PaddlePaddle, and all dependencies into a Docker image. So you don't need to install anything except Docker. If you are using Windows, please follow this installation guide. If you are running Mac, please follow this. For various Linux distros, please refer to https://www.docker.com. If you are using Windows or Mac, you might want to give Docker more memory and CPUs/cores.

CADL - Course materials/Homework materials for the FREE MOOC course on "Creative Applications of Deep Learning w/ Tensorflow" #CADL

  •    Jupyter

This repository contains lecture transcripts and homework assignments as Jupyter Notebooks for the first of three Kadenze Academy courses on Creative Applications of Deep Learning w/ Tensorflow. It also contains a python package containing all the code developed during all three courses. The first course makes heavy usage of Jupyter Notebook. This will be necessary for submitting the homeworks and interacting with the guided session notebooks I will provide for each assignment. Follow along this guide and we'll see how to obtain all of the necessary libraries that we'll be using. By the end of this, you'll have installed Jupyter Notebook, NumPy, SciPy, and Matplotlib. While many of these libraries aren't necessary for performing the Deep Learning which we'll get to in later lectures, they are incredibly useful for manipulating data on your computer, preparing data for learning, and exploring results.

AIND-Recognizer

  •    Jupyter

A template notebook is provided as asl_recognizer.ipynb. The notebook is a combination tutorial and submission document. Some of the codebase and some of your implementation will be external to the notebook. For submission, complete the Submission sections of each part. This will include running your implementations in code notebook cells, answering analysis questions, and passing provided unit tests provided in the codebase and called out in the notebook. This will open the Jupyter Notebook software and notebook in your browser which is where you will directly edit and run your code. Follow the instructions in the notebook for completing the project.

dashboards - Jupyter Dashboards Layout Extension

  •    Jupyter

The dashboards layout extension is an add-on for Jupyter Notebook. It lets you arrange your notebook outputs (text, plots, widgets, ...) in grid- or report-like layouts. It saves information about your layouts in your notebook document. Other people with the extension can open your notebook and view your layouts. For a sample of what's possible with the dashboard layout extension, have a look at the demo dashboard-notebooks in this repository.

HELK - The Incredible HELK

  •    Shell

A Hunting ELK (Elasticsearch, Logstash, Kibana) with advanced analytic capabilities.At the end of the HELK installation, you will have a similar output with the information you need to access the primary HELK components. Remember that the default username and password for the HELK are helk:hunting.