This repo contains a curated list of R tutorials and packages for Data Science, NLP and Machine Learning. This also serves as a reference guide for several common data analysis tasks. Curated list of Python tutorials for Data Science, NLP and Machine Learning.
datascience data-science r text-miningSkater is a unified framework to enable Model Interpretation for all forms of model to help one build an Interpretable machine learning system often needed for real world use-cases(** we are actively working towards to enabling faithful interpretability for all forms models). It is an open source python library designed to demystify the learned structures of a black box model both globally(inference on the basis of a complete data set) and locally(inference about an individual prediction). The project was started as a research idea to find ways to enable better interpretability(preferably human interpretability) to predictive "black boxes" both for researchers and practioners. The project is still in beta phase.
ml predictive-modeling machine-learning modeling-tools model-interpretation blackbox datascience model-explanation explanation-system deep-learning deep-neural-networks attribution lstm-neural-networks cnn-classification:books: Series of Artificial Intelligence & Deep Learning, including Mathematics Fundamentals, Python Practices, NLP Application, etc. 💫 人工智能与深度学习实战,机器学习篇 | Tensoflow 篇
datascience machinelearning deeplearning neural-network natural-language-processing artificial-intelligenceVegas aims to be the missing MatPlotLib for the Scala and Spark world. Vegas wraps around Vega-Lite but provides syntax more familiar (and type checked) for use within Scala. And then use the following code to render a plot into a pop-up window (see below for more details on controlling how and where Vegas renders).
plotting datascienceModin uses Ray to provide an effortless way to speed up your pandas notebooks, scripts, and libraries. Unlike other distributed DataFrame libraries, Modin provides seamless integration and compatibility with existing pandas code. Even using the DataFrame constructor is identical. To use Modin, you do not need to know how many cores your system has and you do not need to specify how to distribute the data. In fact, you can continue using your previous pandas notebooks while experiencing a considerable speedup from Modin, even on a single machine. Once you’ve changed your import statement, you’re ready to use Modin just like you would pandas.
dataframe pandas ray distributed datascience pandas-on-ray modin sqlThis package provides a Jupyter notebook cell magic %%notify that notifies the user upon completion of a potentially long-running cell via a browser push notification. Use cases include long-running machine learning models, grid searches, or Spark computations. This magic allows you to navigate away to other work (or even another Mac desktop entirely) and still get a notification when your cell completes. Clicking on the body of the notification will bring you directly to the browser window and tab with the notebook, even if you're on a different desktop (clicking the "Close" button in the notification will keep you where you are). The extension has currently been tested in Chrome (Version: 58.0.3029) and Firefox (Version: 53.0.3).
datascienceHMAC timing attack's w/ statistical analysis
data datascience data-science security statistics hackingknyfe is a python utility for rapid exploration of datasets. Use it when you have some kind of dataset and you want to get a feel for how it is composed, run some simple tests on it, or prepare it for further processing. The great thing about knyfe is that you don't have to know much about how your dataset is designed. You shouldn't have to remember in which variable resides in which column of your data matrix or how your structs are nested. Just get shit done.
dataset datascienceTechniques for Scraping the Web in Python
aws aws-lambda beautifulsoup scraping step-functions serverless jupyter-notebook datasciencekrangl is a {K}otlin library for data w{rangl}ing. By implementing a grammar of data manipulation using a modern functional-style API, it allows to filter, transform, aggregate and reshape tabular data. krangl is heavily inspired by the amazing dplyr for R. krangl is written in Kotlin, excels in Kotlin, but emphasizes as well on good java-interop. It is mimicking the API of dplyr, while carefully adding more typed constructs where possible.
kotlin data-mining sql dsl datascienceVisualizing tabular and relational data is the core of data-science. kravis implements a grammar to create a wide range of plots using a standardized set of verbs. You can also use JitPack with Maven or Gradle to build the latest snapshot as a dependency in your project.
kotlin datascience krangl dplyrA ready-to-use environment of Jupyter (IPython notebook) and OCaml Jupyter (OCaml kernel) with libraries for data science and machine learning. First, launch a Jupyter server as follows.
jupyter-notebook docker dockerfile datascience machine-learning dataanalysis ocaml functional-programmingAn OCaml kernel for Jupyter notebook. This provides an OCaml REPL with a great user interface such as markdown/HTML documentation, LaTeX formula by MathJax, and image embedding.
ocaml functional-programming jupyter-kernels machine-learning datascience dataanalysis jupyter-notebook jupyter ocaml-kernel ocaml-replRead this in other languages: 日本語. Data Science Experience is now Watson Studio. Although some images in this code pattern may show the service as Data Science Experience, the steps and processes will still work.
datascience machinelearning python3 scikitlearn-machine-learning pandas pixiedust dsx ibmcode call-for-codeCurated list of my reads, implementations and core concepts of Artificial Intelligence, Deep Learning, Machine Learning by best folk in the world. Have anything in mind that you think is awesome and would fit in this list? Feel free to send a pull request.
artificial-intelligence deeplearning datascience tensorflow pytorch blogs mathamatics awesome-list deep-learning neural-network awesome list machine-learning algorithmsMake your libraries magically appear in Databricks. When our team started setting up CI/CD for the various packages we maintain, we encountered some difficulties integrating Jenkins with Databricks.
datascienceThe main goal of the library is perform a Transformation over a data event. Supports a variety of functions typically used on machine learning and AI. Development is oriented into a functional style avoiding side effects on transformations.
machine-learning etl etl-pipeline data-science data datasciencePublic datasets prepared for machine learning tasks with php-ml. Bike sharing systems are new generation of traditional bike rentals where whole process from membership, rental and return back has become automatic. Through these systems, user is able to easily rent a bike from a particular position and return back at another position. Currently, there are about over 500 bike-sharing programs around the world which is composed of over 500 thousands bicycles. Today, there exists great interest in these systems due to their important role in traffic, environmental and health issues.
machine-learning dataset data regression classification datascienceA function composition operator ('pipe') and extensible framework that allows for easy logging of changes in data. To log changes in data, you need to attach a logger, and use the lumberjack operator %>>%.
r logging daff datascience reproducible-researchggstatsplot is an extension of ggplot2 package for creating graphics with details from statistical tests included in the plots themselves and targeted primarily at behavioral sciences community to provide a one-line code to produce information-rich plots. In a typical exploratory data analysis workflow, data visualization and statistical modelling are two different phases: visualization informs modelling, and modelling in its turn can suggest a different visualization method, and so on and so forth. The central idea of ggstatsplot is simple: combine these two phases into one in the form of graphics with statistical details, which makes data exploration simpler and faster. Currently, it supports only the most common types of statistical tests (parametric, nonparametric, and robust versions of t-test, anova, and correlation analyses, contingency table analysis, and regression analyses).
ggplot-extension statistical-tests dataviz r statistical-analysis statistical-inference data visualization datascience violin-plot vignette badge parametric robust plot
We have large collection of open source products. Follow the tags from
Tag Cloud >>
Open source products are scattered around the web. Please provide information
about the open source projects you own / you use.
Add Projects.