Displaying 1 to 20 from 28 results

DataScienceR - a curated list of R tutorials for Data Science, NLP and Machine Learning

  •    R

This repo contains a curated list of R tutorials and packages for Data Science, NLP and Machine Learning. This also serves as a reference guide for several common data analysis tasks. Curated list of Python tutorials for Data Science, NLP and Machine Learning.

Skater - Python Library for Model Interpretation/Explanations

  •    Python

Skater is a unified framework to enable Model Interpretation for all forms of model to help one build an Interpretable machine learning system often needed for real world use-cases(** we are actively working towards to enabling faithful interpretability for all forms models). It is an open source python library designed to demystify the learned structures of a black box model both globally(inference on the basis of a complete data set) and locally(inference about an individual prediction). The project was started as a research idea to find ways to enable better interpretability(preferably human interpretability) to predictive "black boxes" both for researchers and practioners. The project is still in beta phase.

AIDL-Series - :books: Series of Artificial Intelligence & Deep Learning, including Mathematics Fundamentals, Python Practices, NLP Application, etc


:books: Series of Artificial Intelligence & Deep Learning, including Mathematics Fundamentals, Python Practices, NLP Application, etc. 💫 人工智能与深度学习实战,机器学习篇 | Tensoflow 篇

Vegas - The missing MatPlotLib for Scala + Spark

  •    Scala

Vegas aims to be the missing MatPlotLib for the Scala and Spark world. Vegas wraps around Vega-Lite but provides syntax more familiar (and type checked) for use within Scala. And then use the following code to render a plot into a pop-up window (see below for more details on controlling how and where Vegas renders).

modin - Modin: Speed up your Pandas workflows by changing a single line of code

  •    Python

Modin uses Ray to provide an effortless way to speed up your pandas notebooks, scripts, and libraries. Unlike other distributed DataFrame libraries, Modin provides seamless integration and compatibility with existing pandas code. Even using the DataFrame constructor is identical. To use Modin, you do not need to know how many cores your system has and you do not need to specify how to distribute the data. In fact, you can continue using your previous pandas notebooks while experiencing a considerable speedup from Modin, even on a single machine. Once you’ve changed your import statement, you’re ready to use Modin just like you would pandas.

jupyter-notify - A Jupyter Notebook magic for browser notifications of cell completion

  •    Python

This package provides a Jupyter notebook cell magic %%notify that notifies the user upon completion of a potentially long-running cell via a browser push notification. Use cases include long-running machine learning models, grid searches, or Spark computations. This magic allows you to navigate away to other work (or even another Mac desktop entirely) and still get a notification when your cell completes. Clicking on the body of the notification will bring you directly to the browser window and tab with the notebook, even if you're on a different desktop (clicking the "Close" button in the notification will keep you where you are). The extension has currently been tested in Chrome (Version: 58.0.3029) and Firefox (Version: 53.0.3).

knyfe - knyfe is a python utility for rapid exploration of datasets.

  •    Python

knyfe is a python utility for rapid exploration of datasets. Use it when you have some kind of dataset and you want to get a feel for how it is composed, run some simple tests on it, or prepare it for further processing. The great thing about knyfe is that you don't have to know much about how your dataset is designed. You shouldn't have to remember in which variable resides in which column of your data matrix or how your structs are nested. Just get shit done.

krangl - krangl is a {K}otlin DSL for data w{rangl}ing

  •    Kotlin

krangl is a {K}otlin library for data w{rangl}ing. By implementing a grammar of data manipulation using a modern functional-style API, it allows to filter, transform, aggregate and reshape tabular data. krangl is heavily inspired by the amazing dplyr for R. krangl is written in Kotlin, excels in Kotlin, but emphasizes as well on good java-interop. It is mimicking the API of dplyr, while carefully adding more typed constructs where possible.

kravis - A {K}otlin g{ra}mmar for data {vis}ualization

  •    Kotlin

Visualizing tabular and relational data is the core of data-science. kravis implements a grammar to create a wide range of plots using a standardized set of verbs. You can also use JitPack with Maven or Gradle to build the latest snapshot as a dependency in your project.

docker-ocaml-jupyter-datascience - Dockerfiles for data science in OCaml on Jupyter

  •    Jupyter

A ready-to-use environment of Jupyter (IPython notebook) and OCaml Jupyter (OCaml kernel) with libraries for data science and machine learning. First, launch a Jupyter server as follows.

ocaml-jupyter - An OCaml kernel for Jupyter (IPython) notebook

  •    Jupyter

An OCaml kernel for Jupyter notebook. This provides an OCaml REPL with a great user interface such as markdown/HTML documentation, LaTeX formula by MathJax, and image embedding.

predict-opioid-prescribers - A pattern focusing on how to use scikit learn and python in Watson Studio to predict opioid prescribers based off of a 2014 kaggle dataset

  •    Jupyter

Read this in other languages: 日本語. Data Science Experience is now Watson Studio. Although some images in this code pattern may show the service as Data Science Experience, the steps and processes will still work.

my-awesome-AI-bookmarks - Curated list of my reads, implementations and core concepts of Artificial Intelligence, Deep Learning, Machine Learning by best folk in the world


Curated list of my reads, implementations and core concepts of Artificial Intelligence, Deep Learning, Machine Learning by best folk in the world. Have anything in mind that you think is awesome and would fit in this list? Feel free to send a pull request.

apparate - Make your libraries magically appear in Databricks.

  •    Python

Make your libraries magically appear in Databricks. When our team started setting up CI/CD for the various packages we maintain, we encountered some difficulties integrating Jenkins with Databricks.

data-refinery - Data transformation

  •    Python

The main goal of the library is perform a Transformation over a data event. Supports a variety of functions typically used on machine learning and AI. Development is oriented into a functional style avoiding side effects on transformations.

php-ml-datasets - Public datasets prepared for machine learning tasks with php-ml


Public datasets prepared for machine learning tasks with php-ml. Bike sharing systems are new generation of traditional bike rentals where whole process from membership, rental and return back has become automatic. Through these systems, user is able to easily rent a bike from a particular position and return back at another position. Currently, there are about over 500 bike-sharing programs around the world which is composed of over 500 thousands bicycles. Today, there exists great interest in these systems due to their important role in traffic, environmental and health issues.

lumberjack - The not-a-pipe operator that logs

  •    R

A function composition operator ('pipe') and extensible framework that allows for easy logging of changes in data. To log changes in data, you need to attach a logger, and use the lumberjack operator %>>%.

ggstatsplot - Collection of functions to enhance ggplot2 plots with results from statistical tests.

  •    HTML

ggstatsplot is an extension of ggplot2 package for creating graphics with details from statistical tests included in the plots themselves and targeted primarily at behavioral sciences community to provide a one-line code to produce information-rich plots. In a typical exploratory data analysis workflow, data visualization and statistical modelling are two different phases: visualization informs modelling, and modelling in its turn can suggest a different visualization method, and so on and so forth. The central idea of ggstatsplot is simple: combine these two phases into one in the form of graphics with statistical details, which makes data exploration simpler and faster. Currently, it supports only the most common types of statistical tests (parametric, nonparametric, and robust versions of t-test, anova, and correlation analyses, contingency table analysis, and regression analyses).