Prophet is a procedure for forecasting time series data. It is based on an additive model where non-linear trends are fit with yearly and weekly seasonality, plus holidays. It works best with daily periodicity data with at least one year of historical data. Prophet is robust to missing data, shifts in the trend, and large outliers.Prophet is open source software released by Facebook's Core Data Science team. It is available for download on CRAN and PyPI.
forecasting time-series data-science data-analysisThe Knowledge Repository project is focused on facilitating the sharing of knowledge between data scientists and other technical roles using data formats and tools that make sense in these professions. It provides various data stores (and utilities to manage them) for "knowledge posts", with a particular focus on notebooks (R Markdown and Jupyter / IPython Notebook) to better promote reproducible research.Check out this Medium Post for the inspiration for the project.
data data-science knowledge data-analysisThis is a repository of teaching materials, code, and data for my data analysis and machine learning projects.Each repository will (usually) correspond to one of the blog posts on my web site.
machine-learning data-analysis data-science ipython-notebook evolutionary-algorithmGoogle Cloud Dataflow SDK for Java is a distribution of Apache Beam designed to simplify usage of Apache Beam on Google Cloud Dataflow service. This artifact includes the parent POM for other Dataflow SDK artifacts.
google-cloud-dataflow data-science data-analysis data-mining big-data data-processingApache Superset (incubating) is a modern, enterprise-ready business intelligence web application
druid data-visualization dashboards data data-analysis sql-editorxarray (formerly xray) is an open source project and Python package that aims to bring the labeled data power of pandas to the physical sciences, by providing N-dimensional variants of the core pandas data structures. Our goal is to provide a pandas-like and pandas-compatible toolkit for analytics on multi-dimensional arrays, rather than the tabular data for which pandas excels. Our approach adopts the Common Data Model for self- describing scientific data in widespread use in the Earth sciences: xarray.Dataset is an in-memory representation of a netCDF file.
scientific-computing netcdf numpy data-science pandas dataframes data-analysis pydataThis is a collection of IPython notebook/Jupyter notebooks intended to train the reader on different Apache Spark concepts, from basic to advanced, by using the Python language. If Python is not your language, and it is R, you may want to have a look at our R on Apache Spark (SparkR) notebooks instead. Additionally, if your are interested in being introduced to some basic Data Science Engineering, you might find these series of tutorials interesting. There we explain different concepts and applications using Python and R.
spark pyspark data-analysis mllib ipython-notebook notebook ipython data-science machine-learning big-data bigdataPachyderm is a tool for production data pipelines. If you need to chain together data scraping, ingestion, cleaning, munging, wrangling, processing, modeling, and analysis in a sane way, then Pachyderm is for you. If you have an existing set of scripts which do this in an ad-hoc fashion and you're looking for a way to "productionize" them, Pachyderm can make this easy for you. Install Pachyderm locally or deploy on AWS/GCE/Azure in about 5 minutes.
pachyderm docker analytics big-data containers distributed-systems kubernetes data-science data-analysisplotnine is an implementation of a grammar of graphics in Python, it is based on ggplot2. The grammar allows users to compose plots by explicitly mapping data to the visual objects that make up the plot. Plotting with a grammar is powerful, it makes custom (and otherwise complex) plots easy to think about and then create, while the simple plots remain simple.
plotting grammar graphics data-analysisscikit-learn is a Python module for machine learning built on top of SciPy. It is simple and efficient tools for data mining and data analysis. It supports automatic classification, clustering, model selection, pre processing and lot more.
machine-learning data-mining data-analysis classificationDex : The data explorer is a data visualization tool written in Java/JavaFX capable of powerful ETL and data visualization. There are 2 main ways to install Dex.
data-science data-visualization visualization data-analysis data-mining javafx d3 dataviz datavis datavisualization d3jsCourse materials for General Assembly's Data Science course in Washington, DC (8/18/15 - 10/29/15).
data-science machine-learning scikit-learn data-analysis pandas jupyter-notebook course linear-regression logistic-regression model-evaluation naive-bayes natural-language-processing decision-trees ensemble-learning clustering regular-expressions web-scraping data-visualization data-cleaningThis is the list of published articles on medium.com 🇬🇧, habr.com 🇷🇺, and jqr.com 🇨🇳. Icons are clickable. Also, links to Kaggle Kernels (in English) are given. This way one can reproduce everything without installing a single package. Assignments will be announced each week. Meanwhile, you can pratice with demo versions. Solutions will be discussed in the upcoming run of the course.
machine-learning data-analysis data-science pandas algorithms numpy scipy matplotlib seaborn plotly scikit-learn kaggle-inclass vowpal-wabbit ipynb docker mathA website for GitHub user to generate his GitHub data analysis (contributions/commits/languages/repos datas), helps to make a better resume. Attention:Most of the pages support English now😁😁😁, including github data analysis page.
github-contributions github-commits github contribute-languages github-analysis contributions resume resume-template data-analysis react reac利用Python进行数据分析 第二版 (2017) 中文翻译笔记
python-for-data-analysis jupyter-notebook chinese-translation data-analysis pandasA curated list of awesome R packages and tools. Inspired by awesome-machine-learning. Packages change the way you use R.
r data-science data-analysis awesome list awesome-listFed up with a ton of tutorials but no easy way to find exercises I decided to create a repo just with exercises to practice pandas. Don't get me wrong, tutorials are great resources, but to learn is to do. So unless you practice you won't learn. My suggestion is that you learn a topic in a tutorial or video and then do exercises. Learn one more topic and do exercises. If you got the answer wrong, don't go directly to the solution with code.
pandas exercise practice tutorial data-analysispandas is a Python package providing fast, flexible, and expressive data structures designed to make working with "relational" or "labeled" data both easy and intuitive. It aims to be the fundamental high-level building block for doing practical, real world data analysis in Python. Additionally, it has the broader goal of becoming the most powerful and flexible open source data analysis / manipulation tool available in any language. It is already well on its way toward this goal. Binary installers for the latest released version are available at the Python package index and on conda.
data-analysis pandas flexible alignment:zap: A distributed crawler for weibo, building with celery and requests.
weibospider data-analysis python3 distributed-crawler weibo sinaxLearn is a high performance, easy-to-use, and scalable machine learning package, which can be used to solve large-scale machine learning problems. xLearn is especially useful for solving machine learning problems on large-scale sparse data, which is very common in Internet services such as online advertisement and recommender systems in recent years. If you are the user of liblinear, libfm, or libffm, now xLearn is your another better choice. xLearn is developed with high-performance C++ code with careful design and optimizations. Our system is designed to maximize CPU and memory utilization, provide cache-aware computation, and support lock-free learning. By combining these insights, xLearn is 5x-13x faster compared to similar systems.
machine-learning statistics data-science data-analysis
We have large collection of open source products. Follow the tags from
Tag Cloud >>
Open source products are scattered around the web. Please provide information
about the open source projects you own / you use.
Add Projects.