Keras is a high-level neural networks API, written in Python and capable of running on top of TensorFlow, CNTK, or Theano. It was developed with a focus on enabling fast experimentation. Being able to go from idea to result with the least possible delay is key to doing good research.
deep-learning tensorflow theano neural-networks machine-learning data-scienceConsider TPOT your Data Science Assistant. TPOT is a Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming.TPOT will automate the most tedious part of machine learning by intelligently exploring thousands of possible pipelines to find the best one for your data.
machine-learning data-science automl automation scikit-learn hyperparameter-optimization model-selection parameter-tuning automated-machine-learning random-forest gradient-boosting feature-engineering xgboost genetic-programmingProphet is a procedure for forecasting time series data. It is based on an additive model where non-linear trends are fit with yearly and weekly seasonality, plus holidays. It works best with daily periodicity data with at least one year of historical data. Prophet is robust to missing data, shifts in the trend, and large outliers.Prophet is open source software released by Facebook's Core Data Science team. It is available for download on CRAN and PyPI.
forecasting time-series data-science data-analysisThis is code and text behind the R for Data Science book.
book bookdown data-science rThe Knowledge Repository project is focused on facilitating the sharing of knowledge between data scientists and other technical roles using data formats and tools that make sense in these professions. It provides various data stores (and utilities to manage them) for "knowledge posts", with a particular focus on notebooks (R Markdown and Jupyter / IPython Notebook) to better promote reproducible research.Check out this Medium Post for the inspiration for the project.
data data-science knowledge data-analysisThis is a repository of teaching materials, code, and data for my data analysis and machine learning projects.Each repository will (usually) correspond to one of the blog posts on my web site.
machine-learning data-analysis data-science ipython-notebook evolutionary-algorithmGoogle Cloud Dataflow SDK for Java is a distribution of Apache Beam designed to simplify usage of Apache Beam on Google Cloud Dataflow service. This artifact includes the parent POM for other Dataflow SDK artifacts.
google-cloud-dataflow data-science data-analysis data-mining big-data data-processingIPython Notebook(s) demonstrating deep learning functionality.IPython Notebook(s) demonstrating scikit-learn functionality.
machine-learning deep-learning data-science big-data aws tensorflow theano caffe scikit-learn kaggle spark mapreduce hadoop matplotlib pandas numpy scipy kerasxarray (formerly xray) is an open source project and Python package that aims to bring the labeled data power of pandas to the physical sciences, by providing N-dimensional variants of the core pandas data structures. Our goal is to provide a pandas-like and pandas-compatible toolkit for analytics on multi-dimensional arrays, rather than the tabular data for which pandas excels. Our approach adopts the Common Data Model for self- describing scientific data in widespread use in the Earth sciences: xarray.Dataset is an in-memory representation of a netCDF file.
scientific-computing netcdf numpy data-science pandas dataframes data-analysis pydataBuild on top of Plotly.js, React, and Flask, Dash ties modern UI elements like dropdowns, sliders, and graphs directly to your analytical python code. Here’s a 43-line example of a Dash App that ties a Dropdown to a D3.js Plotly Graph. As the user selects a value in the Dropdown, the application code dynamically exports data from Google Finance into a Pandas DataFrame. This app was written in just 43 lines of code (view the source).
dash plotly data-visualization data-science gui-framework flask react finance bioinformatics technical-computing charting plotly-dash web-appEach record is projected into zero or more bins of a nominal plotting grid shape, based on a specified glyph. Reductions are computed for each bin, compressing the potentially large dataset into a much smaller aggregate array.
visualization data-science plotting heatmap rasterizationThis is a collection of IPython notebook/Jupyter notebooks intended to train the reader on different Apache Spark concepts, from basic to advanced, by using the Python language. If Python is not your language, and it is R, you may want to have a look at our R on Apache Spark (SparkR) notebooks instead. Additionally, if your are interested in being introduced to some basic Data Science Engineering, you might find these series of tutorials interesting. There we explain different concepts and applications using Python and R.
spark pyspark data-analysis mllib ipython-notebook notebook ipython data-science machine-learning big-data bigdataPachyderm is a tool for production data pipelines. If you need to chain together data scraping, ingestion, cleaning, munging, wrangling, processing, modeling, and analysis in a sane way, then Pachyderm is for you. If you have an existing set of scripts which do this in an ad-hoc fashion and you're looking for a way to "productionize" them, Pachyderm can make this easy for you. Install Pachyderm locally or deploy on AWS/GCE/Azure in about 5 minutes.
pachyderm docker analytics big-data containers distributed-systems kubernetes data-science data-analysisThis lightweight tool helps you get a sense of your application's schema, as well as any outliers to that schema. Particularly useful when you inherit a codebase with data dump and want to quickly learn how the data's structured. Also useful for finding rare keys. Also featured on the official MongoDB blog.
mongodb mongo data-science nosql nosql-analytics"Time Series Feature extraction based on scalable hypothesis tests". The package contains many feature extraction methods and a robust feature selection algorithm.
data-science feature-extraction time-seriesI learned Python by hacking first, and getting serious later. I wanted to do this with Machine Learning. If this is your style, join me in getting a bit ahead of yourself. I suggest you get your feet wet ASAP. You'll boost your confidence.
machine-learning data-science scikit-learn ipython-notebook deep-learning jupyter-notebook courses learning learning-by-doing diyboltons should be builtins. Full and extensive docs are available on Read The Docs. See what's new by checking the CHANGELOG.
utilities cache data-structures statistics data-science file queue json recursiveStop plotting your data - annotate your data and let it visualize itself. HoloViews is an open-source Python library designed to make data analysis and visualization seamless and simple. With HoloViews, you can usually express what you want to do in very few lines of code, letting you focus on what you are trying to explore and convey, not on the process of plotting.
visualization analysis bokeh matplotlib interactive data-science exploratory-data-analysis pandasGensim is a Python library for topic modelling, document indexing and similarity retrieval with large corpora. Target audience is the natural language processing (NLP) and information retrieval (IR) community. If this feature list left you scratching your head, you can first read more about the Vector Space Model and unsupervised document analysis on Wikipedia.
gensim topic-modeling information-retrieval machine-learning natural-language-processing nlp data-science data-mining word2vec word-embeddings text-summarization neural-network document-similarity word-similarity fasttextUnified and efficient Machine Learning since 1999. Buildbot: http://buildbot.shogun-toolbox.org/waterfall.
machine-learning data-science artificial-intelligence swig c-plus-plus cmake
We have large collection of open source products. Follow the tags from
Tag Cloud >>
Open source products are scattered around the web. Please provide information
about the open source projects you own / you use.
Add Projects.