Displaying 1 to 10 from 10 results

DataflowJavaSDK - Google Cloud Dataflow provides a simple, powerful model for building both batch and streaming parallel data processing pipelines

  •    Java

Google Cloud Dataflow SDK for Java is a distribution of Apache Beam designed to simplify usage of Apache Beam on Google Cloud Dataflow service. This artifact includes the parent POM for other Dataflow SDK artifacts.

spark-py-notebooks - Apache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks

  •    Jupyter

This is a collection of IPython notebook/Jupyter notebooks intended to train the reader on different Apache Spark concepts, from basic to advanced, by using the Python language. If Python is not your language, and it is R, you may want to have a look at our R on Apache Spark (SparkR) notebooks instead. Additionally, if your are interested in being introduced to some basic Data Science Engineering, you might find these series of tutorials interesting. There we explain different concepts and applications using Python and R.

pachyderm - Reproducible Data Science at Scale!

  •    Go

Pachyderm is a tool for production data pipelines. If you need to chain together data scraping, ingestion, cleaning, munging, wrangling, processing, modeling, and analysis in a sane way, then Pachyderm is for you. If you have an existing set of scripts which do this in an ad-hoc fashion and you're looking for a way to "productionize" them, Pachyderm can make this easy for you. Install Pachyderm locally or deploy on AWS/GCE/Azure in about 5 minutes.

Cascalog - Data processing on Hadoop

  •    Clojure

Cascalog is a fully-featured data processing and querying library for Clojure or Java. The main use cases for Cascalog are processing "Big Data" on top of Hadoop or doing analysis on your local computer. Cascalog is a replacement for tools like Pig, Hive, and Cascading and operates at a significantly higher level of abstraction than those tools.




DataScienceVM - Tools and Docs on the Azure Data Science Virtual Machine (http://aka.ms/dsvm)

  •    HTML

The Data Science Virtual Machine (DSVM) is a customized VM image on Microsoft’s Azure cloud built specifically for doing data science. It has many popular data science and other tools pre-installed and pre-configured to jump-start building intelligent applications for advanced analytics. It is available on Windows Server 2016, Windows Server 2012, and on Linux. We offer Linux edition of the DSVM in either Ubuntu 16.04 LTS or on OpenLogic 7.2 CentOS-based Linux distributions. You can try the Data Science VM for free for 30 days (with $200 credits) with a free Azure Trial. The Linux (Ubuntu-based) DSVM also provides a test drive through a button on the product page. The Test Drive will provide full access to you own instance of the VM with just a free Microsoft account (No Azure subscription or CC needed).On this repo, we will feature tools, tips and extensions (see below) to the Data Science VM. We invite the DSVM user community to contribute any useful tools or scripts, extensions you may have written to enhance the user experience on the DSVM.

spark-r-notebooks - R on Apache Spark (SparkR) tutorials for Big Data analysis and Machine Learning as IPython / Jupyter notebooks

  •    Jupyter

This is a collection of Jupyter notebooks intended to train the reader on different Apache Spark concepts, from basic to advanced, by using the R language. If your are interested in being introduced to some basic Data Science Engineering concepts and applications, you might find these series of tutorials interesting. There we explain different concepts and applications using Python and R. Additionally, if you are interested in using Python with Spark, you can have a look at our pySpark notebooks.

warp - Convert and analyze large data sets at light speed, on Mac and iOS.

  •    Swift

Warp allows you to convert and analyze (very) large databases with ease at the speed of light. In Warp, you work on a small subset of the data, after which Warp repeats your actions on the entire dataset. Unlike most data analysis apps, you do not have to type any codes in Warp. Effortlessly juggle around data between files and databases by simply dragging-and-dropping! Load CSV files into MySQL or transfer a PostgreSQL table to a RethinkDB table by just dragging one to the other.

countly-sdk-js - Countly Product Analytics SDK for Icenium and Phonegap

  •    Java

Questions? Visit http://community.count.ly. Countly is an innovative, real-time, open source mobile analytics and push notifications platform. It collects data from mobile devices, and visualizes this information to analyze mobile application usage and end-user behavior. There are two parts of Countly: the server that collects and analyzes data, and mobile SDK that sends this data. Both parts are open source with different licensing terms.


data-science-live-book - An open source book to learn data science, data analysis and machine learning, suitable for all ages!

  •    TeX

This book is now available at Amazon in [Kindle]( Link: http://a.co/d/dIj1XwD) Black & White and color 📗 🚀. Most of the written R code can be used in real scenarios! I worked on the funModeling R package at the same time, so it is used many times in the book.





We have large collection of open source products. Follow the tags from Tag Cloud >>


Open source products are scattered around the web. Please provide information about the open source projects you own / you use. Add Projects.