This repo contains a curated list of R tutorials and packages for Data Science, NLP and Machine Learning. This also serves as a reference guide for several common data analysis tasks. Curated list of Python tutorials for Data Science, NLP and Machine Learning.
datascience data-science r text-miningSkater is a unified framework to enable Model Interpretation for all forms of model to help one build an Interpretable machine learning system often needed for real world use-cases(** we are actively working towards to enabling faithful interpretability for all forms models). It is an open source python library designed to demystify the learned structures of a black box model both globally(inference on the basis of a complete data set) and locally(inference about an individual prediction). The project was started as a research idea to find ways to enable better interpretability(preferably human interpretability) to predictive "black boxes" both for researchers and practioners. The project is still in beta phase.
ml predictive-modeling machine-learning modeling-tools model-interpretation blackbox datascience model-explanation explanation-system deep-learning deep-neural-networks attribution lstm-neural-networks cnn-classificationCleverCSV provides a drop-in replacement for the Python csv package with improved dialect detection for messy CSV files. It also provides a handy command line tool that can standardize a messy file or generate Python code to import it. Click here to go to the introduction with more details about CleverCSV. If you're in a hurry, below is a quick overview of how to get started with the CleverCSV Python package and the command line interface.
csv-converter data-science data-mining csv csv-files python-library python3 datascience csv-format csv-reading csv-parser csv-reader csv-export csv-import csv-parsingAnimated Investment Management Research at Sov.ai — Sponsoring open source AI, Machine learning, and Data Science initiatives. Have a look at the newly started FirmAI Medium publication where we have experts of AI in business, write about their topics of interest.
data-science machine-learning example jupyter-notebook datascience practical-machine-learning firmai:books: Series of Artificial Intelligence & Deep Learning, including Mathematics Fundamentals, Python Practices, NLP Application, etc. 💫 人工智能与深度å¦ä¹ 实战,机器å¦ä¹ 篇 | Tensoflow 篇
datascience machinelearning deeplearning neural-network natural-language-processing artificial-intelligenceVegas aims to be the missing MatPlotLib for the Scala and Spark world. Vegas wraps around Vega-Lite but provides syntax more familiar (and type checked) for use within Scala. And then use the following code to render a plot into a pop-up window (see below for more details on controlling how and where Vegas renders).
plotting datascienceAnimated Investment Management Research at Sov.ai — Sponsoring open source AI, Machine learning, and Data Science initiatives. A curated list of applied business machine learning (BML) and business data science (BDS) examples and libraries. The code in this repository is in Python (primarily using jupyter notebooks) unless otherwise stated. The catalogue is inspired by awesome-machine-learning.
machine-learning jupyter example jupyter-notebook datascience practical-machine-learning business-machine-learningModin uses Ray to provide an effortless way to speed up your pandas notebooks, scripts, and libraries. Unlike other distributed DataFrame libraries, Modin provides seamless integration and compatibility with existing pandas code. Even using the DataFrame constructor is identical. To use Modin, you do not need to know how many cores your system has and you do not need to specify how to distribute the data. In fact, you can continue using your previous pandas notebooks while experiencing a considerable speedup from Modin, even on a single machine. Once you’ve changed your import statement, you’re ready to use Modin just like you would pandas.
dataframe pandas ray distributed datascience pandas-on-ray modin sqltech.ml.dataset is a Clojure library for data processing and machine learning. Datasets are currently in-memory columnwise databases and we support parsing from file or input-stream. We support these formats: raw/gzipped csv/tsv, xls, xlsx, json, and sequences of maps as input sources. SQL bindings are provided as a separate library. Data size in memory is minimized (primitive arrays), datetime types are often converted to an integer representation and strings are loaded into string tables. These features together dramatically decrease the working set size in memory. Because data is stored in columnar fashion columnwise operations on the dataset are very fast.
machine-learning csv xlsx datascience dataset dataframe etl-pipelineThis package provides a Jupyter notebook cell magic %%notify that notifies the user upon completion of a potentially long-running cell via a browser push notification. Use cases include long-running machine learning models, grid searches, or Spark computations. This magic allows you to navigate away to other work (or even another Mac desktop entirely) and still get a notification when your cell completes. Clicking on the body of the notification will bring you directly to the browser window and tab with the notebook, even if you're on a different desktop (clicking the "Close" button in the notification will keep you where you are). The extension has currently been tested in Chrome (Version: 58.0.3029) and Firefox (Version: 53.0.3).
datascienceMelusine is a high-level Python library for email classification and feature extraction, written in Python and capable of running on top of Scikit-Learn, Tensorflow 2 and Keras. Integrated models runs with Tensorflow 2.2. It is developed with a focus on emails written in French. Melusine is compatible with Python >= 3.6.
emails datascience nlp-machine-learningHMAC timing attack's w/ statistical analysis
data datascience data-science security statistics hackingknyfe is a python utility for rapid exploration of datasets. Use it when you have some kind of dataset and you want to get a feel for how it is composed, run some simple tests on it, or prepare it for further processing. The great thing about knyfe is that you don't have to know much about how your dataset is designed. You shouldn't have to remember in which variable resides in which column of your data matrix or how your structs are nested. Just get shit done.
dataset datascienceTechniques for Scraping the Web in Python
aws aws-lambda beautifulsoup scraping step-functions serverless jupyter-notebook datasciencekrangl is a {K}otlin library for data w{rangl}ing. By implementing a grammar of data manipulation using a modern functional-style API, it allows to filter, transform, aggregate and reshape tabular data. krangl is heavily inspired by the amazing dplyr for R. krangl is written in Kotlin, excels in Kotlin, but emphasizes as well on good java-interop. It is mimicking the API of dplyr, while carefully adding more typed constructs where possible.
kotlin data-mining sql dsl datascienceVisualizing tabular and relational data is the core of data-science. kravis implements a grammar to create a wide range of plots using a standardized set of verbs. You can also use JitPack with Maven or Gradle to build the latest snapshot as a dependency in your project.
kotlin datascience krangl dplyrA ready-to-use environment of Jupyter (IPython notebook) and OCaml Jupyter (OCaml kernel) with libraries for data science and machine learning. First, launch a Jupyter server as follows.
jupyter-notebook docker dockerfile datascience machine-learning dataanalysis ocaml functional-programmingAn OCaml kernel for Jupyter notebook. This provides an OCaml REPL with a great user interface such as markdown/HTML documentation, LaTeX formula by MathJax, and image embedding.
ocaml functional-programming jupyter-kernels machine-learning datascience dataanalysis jupyter-notebook jupyter ocaml-kernel ocaml-replRead this in other languages: 日本語. Data Science Experience is now Watson Studio. Although some images in this code pattern may show the service as Data Science Experience, the steps and processes will still work.
datascience machinelearning python3 scikitlearn-machine-learning pandas pixiedust dsx ibmcode call-for-codeCurated list of my reads, implementations and core concepts of Artificial Intelligence, Deep Learning, Machine Learning by best folk in the world. Have anything in mind that you think is awesome and would fit in this list? Feel free to send a pull request.
artificial-intelligence deeplearning datascience tensorflow pytorch blogs mathamatics awesome-list deep-learning neural-network awesome list machine-learning algorithms
We have large collection of open source products. Follow the tags from
Tag Cloud >>
Open source products are scattered around the web. Please provide information
about the open source projects you own / you use.
Add Projects.