open-data-bikes-analysis - Open Data Bikes Sharing Stations Analysis

  •        8

Analyze bikes sharing station data from Bordeaux and Lyon Open Data (French cities). Use the Python 3 programming language in Jupyter notebooks and the following libraries: pandas, numpy, seaborn, matplotlib, scikit-learn, xgboost.

https://github.com/Oslandia/open-data-bikes-analysis

Tags
Implementation
License
Platform

   




Related Projects

spark-py-notebooks - Apache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks

  •    Jupyter

This is a collection of IPython notebook/Jupyter notebooks intended to train the reader on different Apache Spark concepts, from basic to advanced, by using the Python language. If Python is not your language, and it is R, you may want to have a look at our R on Apache Spark (SparkR) notebooks instead. Additionally, if your are interested in being introduced to some basic Data Science Engineering, you might find these series of tutorials interesting. There we explain different concepts and applications using Python and R.

Data-Analysis-and-Machine-Learning-Projects - Repository of teaching materials, code, and data for my data analysis and machine learning projects

  •    Jupyter

This is a repository of teaching materials, code, and data for my data analysis and machine learning projects.Each repository will (usually) correspond to one of the blog posts on my web site.

OpenML - Open Machine Learning

  •    CSS

We are a group of people who are excited about open science, open data and machine learning. We want to make machine learning and data analysis simple, accessible, collaborative and open with an optimal division of labour between computers and humans. OpenML is an online machine learning platform for sharing and organizing data, machine learning algorithms and experiments. It is designed to create a frictionless, networked ecosystem, that you can readily integrate into your existing processes/code/environments, allowing people all over the world to collaborate and build directly on each other’s latest ideas, data and results, irrespective of the tools and infrastructure they happen to use.

data-science-your-way - Ways of doing Data Science Engineering and Machine Learning in R and Python

  •    Jupyter

These series of tutorials on Data Science engineering will try to compare how different concepts in the discipline can be implemented in the two dominant ecosystems nowadays: R and Python. We will do this from a neutral point of view. Our opinion is that each environment has good and bad things, and any data scientist should know how to use both in order to be as prepared as posible for job market or to start personal project.

holoviews - Stop plotting your data - annotate your data and let it visualize itself.

  •    Python

Stop plotting your data - annotate your data and let it visualize itself. HoloViews is an open-source Python library designed to make data analysis and visualization seamless and simple. With HoloViews, you can usually express what you want to do in very few lines of code, letting you focus on what you are trying to explore and convey, not on the process of plotting.


common-workflow-language - Repository for the CWL standards

  •    Python

The Common Workflow Language (CWL) is a specification for describing analysis workflows and tools in a way that makes them portable and scalable across a variety of software and hardware environments, from workstations to cluster, cloud, and high performance computing (HPC) environments. CWL is designed to meet the needs of data-intensive science, such as Bioinformatics, Medical Imaging, Astronomy, Physics, and Chemistry. CWL is developed by a multi-vendor working group consisting of organizations and individuals aiming to enable scientists to share data analysis workflows. The CWL project is maintained on Github and we follow the Open-Stand.org principles for collaborative open standards development. Legally, CWL is a member project of Software Freedom Conservancy and is formally managed by the elected CWL leadership team, however every-day project decisions are made by the CWL community which is open for participation by anyone.

fma - FMA: A Dataset For Music Analysis

  •    Jupyter

Michaël Defferrard, Kirell Benzi, Pierre Vandergheynst, Xavier Bresson, EPFL LTS2. The dataset is a dump of the Free Music Archive (FMA), an interactive library of high-quality, legal audio downloads. Below the abstract from the paper.

pandas-videos - Jupyter notebook and datasets from the pandas Q&A video series

  •    Jupyter

Read about the series, and view all of the videos on one page: Easier data analysis in Python with pandas.

pycm - Multi-class confusion matrix library in Python

  •    Python

PyCM is a multi-class confusion matrix library written in Python that supports both input data vectors and direct matrix, and a proper tool for post-classification model evaluation that supports most classes and overall statistics parameters. PyCM is the swiss-army knife of confusion matrices, targeted mainly at data scientists that need a broad array of metrics for predictive models and an accurate evaluation of large variety of classifiers. threshold is added in version 0.9 for real value prediction.

Prophet - Tool for producing high quality forecasts for time series data that has multiple seasonality with linear or non-linear growth

  •    Python

Prophet is a procedure for forecasting time series data. It is based on an additive model where non-linear trends are fit with yearly and weekly seasonality, plus holidays. It works best with daily periodicity data with at least one year of historical data. Prophet is robust to missing data, shifts in the trend, and large outliers.Prophet is open source software released by Facebook's Core Data Science team. It is available for download on CRAN and PyPI.

Data-Analysis - Data Analysis Using Python

  •    Jupyter

Data Analysis Using Python and a little R. A place to share my code and reports for various data science projects.

pandas - Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data

  •    Python

pandas is a Python package providing fast, flexible, and expressive data structures designed to make working with "relational" or "labeled" data both easy and intuitive. It aims to be the fundamental high-level building block for doing practical, real world data analysis in Python. Additionally, it has the broader goal of becoming the most powerful and flexible open source data analysis / manipulation tool available in any language. It is already well on its way toward this goal. Binary installers for the latest released version are available at the Python package index and on conda.

datascience-box - Data Science Course in a Box

  •    HTML

This introductory data science course that is our (working) answer to these questions. The courses focuses on data acquisition and wrangling, exploratory data analysis, data visualization, and effective communication and approaching statistics from a model-based, instead of an inference-based, perspective. A heavy emphasis is placed on a consitent syntax (with tools from the tidyverse), reproducibility (with R Markdown) and version control and collaboration (with git/GitHub). We help ease the learning curve by avoiding local installation and supplementing out-of-class learning with interactive tools (like learnr tutorials). By the end of the semester teams of students work on fully reproducible data analysis projects on data they acquired, answering questions they care about. This repository serves as a "data science course in a box" containing all materials required to teach (or learn from) the course described above.

openmct - A web based mission control framework.

  •    Javascript

Open MCT (Open Mission Control Technologies) is a next-generation mission control framework for visualization of data on desktop and mobile devices. It is developed at NASA's Ames Research Center, and is being used by NASA for data analysis of spacecraft missions, as well as planning and operation of experimental rover systems. As a generalizable and open source framework, Open MCT could be used as the basis for building applications for planning, operation, and analysis of any systems producing telemetry data. Try Open MCT now with our live demo.

Data-Science-45min-Intros - Ipython notebook presentations for getting starting with basic programming, statistics and machine learning techniques

  •    Jupyter

Every week*, our data science team @Gnip (aka @TwitterBoulder) gets together for about 50 minutes to learn something. While these started as opportunities to collectively "raise the tide" on common stumbling blocks in data munging and analysis tasks, they have since grown to machine learning, statistics, and general programming topics. Anything that will help us do our jobs better is fair game.

LSTM-Sentiment-Analysis - Sentiment Analysis with LSTMs in Tensorflow

  •    Jupyter

This repository contains the iPython notebook and training data to accompany the O'Reilly tutorial on sentiment analysis with LSTMs in Tensorflow. See the original tutorial to run this code in a pre-built environment on O'Reilly's servers with cell-by-cell guidance, or run these files on your own machine. There is also another file called Pre-Trained LSTM.ipynb which allows you to input your own text, and see the output of the trained network. Before running the notebook, you'll first need to download all data we'll be using. This data is located in the models.tar.gz and training_data.tar.gz tarballs. We will extract these into the same directory as Oriole LSTM.ipynb. As always, the first step is to clone the repository.

xarray - N-D labeled arrays and datasets in Python

  •    Python

xarray (formerly xray) is an open source project and Python package that aims to bring the labeled data power of pandas to the physical sciences, by providing N-dimensional variants of the core pandas data structures. Our goal is to provide a pandas-like and pandas-compatible toolkit for analytics on multi-dimensional arrays, rather than the tabular data for which pandas excels. Our approach adopts the Common Data Model for self- describing scientific data in widespread use in the Earth sciences: xarray.Dataset is an in-memory representation of a netCDF file.

practical-machine-learning-with-python - Master the essential skills needed to recognize and solve complex real-world problems with Machine Learning and Deep Learning by leveraging the highly popular Python Machine Learning Eco-system

  •    Jupyter

"Data is the new oil" is a saying which you must have heard by now along with the huge interest building up around Big Data and Machine Learning in the recent past along with Artificial Intelligence and Deep Learning. Besides this, data scientists have been termed as having "The sexiest job in the 21st Century" which makes it all the more worthwhile to build up some valuable expertise in these areas. Getting started with machine learning in the real world can be overwhelming with the vast amount of resources out there on the web. "Practical Machine Learning with Python" follows a structured and comprehensive three-tiered approach packed with concepts, methodologies, hands-on examples, and code. This book is packed with over 500 pages of useful information which helps its readers master the essential skills needed to recognize and solve complex problems with Machine Learning and Deep Learning by following a data-driven mindset. By using real-world case studies that leverage the popular Python Machine Learning ecosystem, this book is your perfect companion for learning the art and science of Machine Learning to become a successful practitioner. The concepts, techniques, tools, frameworks, and methodologies used in this book will teach you how to think, design, build, and execute Machine Learning systems and projects successfully.