IPython Notebook(s) demonstrating deep learning functionality.IPython Notebook(s) demonstrating scikit-learn functionality.
machine-learning deep-learning data-science big-data aws tensorflow theano caffe scikit-learn kaggle spark mapreduce hadoop matplotlib pandas numpy scipy kerasxarray (formerly xray) is an open source project and Python package that aims to bring the labeled data power of pandas to the physical sciences, by providing N-dimensional variants of the core pandas data structures. Our goal is to provide a pandas-like and pandas-compatible toolkit for analytics on multi-dimensional arrays, rather than the tabular data for which pandas excels. Our approach adopts the Common Data Model for self- describing scientific data in widespread use in the Earth sciences: xarray.Dataset is an in-memory representation of a netCDF file.
scientific-computing netcdf numpy data-science pandas dataframes data-analysis pydataThis repository contains the entire Python Data Science Handbook, in the form of (free!) Jupyter notebooks. Run the code using the Jupyter notebooks available in this repository's notebooks directory.
scikit-learn numpy jupyter-notebook matplotlib pandasStop plotting your data - annotate your data and let it visualize itself. HoloViews is an open-source Python library designed to make data analysis and visualization seamless and simple. With HoloViews, you can usually express what you want to do in very few lines of code, letting you focus on what you are trying to explore and convey, not on the process of plotting.
visualization analysis bokeh matplotlib interactive data-science exploratory-data-analysis pandasCourse materials for General Assembly's Data Science course in Washington, DC (8/18/15 - 10/29/15).
data-science machine-learning scikit-learn data-analysis pandas jupyter-notebook course linear-regression logistic-regression model-evaluation naive-bayes natural-language-processing decision-trees ensemble-learning clustering regular-expressions web-scraping data-visualization data-cleaningThis is the list of published articles on medium.com 🇬🇧, habr.com 🇷🇺, and jqr.com 🇨🇳. Icons are clickable. Also, links to Kaggle Kernels (in English) are given. This way one can reproduce everything without installing a single package. Assignments will be announced each week. Meanwhile, you can pratice with demo versions. Solutions will be discussed in the upcoming run of the course.
machine-learning data-analysis data-science pandas algorithms numpy scipy matplotlib seaborn plotly scikit-learn kaggle-inclass vowpal-wabbit ipynb docker math利用Python进行数据分析 第二版 (2017) 中文翻译笔记
python-for-data-analysis jupyter-notebook chinese-translation data-analysis pandasFed up with a ton of tutorials but no easy way to find exercises I decided to create a repo just with exercises to practice pandas. Don't get me wrong, tutorials are great resources, but to learn is to do. So unless you practice you won't learn. My suggestion is that you learn a topic in a tutorial or video and then do exercises. Learn one more topic and do exercises. If you got the answer wrong, don't go directly to the solution with code.
pandas exercise practice tutorial data-analysispandas is a Python package providing fast, flexible, and expressive data structures designed to make working with "relational" or "labeled" data both easy and intuitive. It aims to be the fundamental high-level building block for doing practical, real world data analysis in Python. Additionally, it has the broader goal of becoming the most powerful and flexible open source data analysis / manipulation tool available in any language. It is already well on its way toward this goal. Binary installers for the latest released version are available at the Python package index and on conda.
data-analysis pandas flexible alignmentThis rep is a growing list of Python cheat sheets, tailored for Data Science. If you want to install a package individually, go into the corresponding <package-name>.md file for instructions on how to install.
numpy python-cheat-sheets data-science pandas scikit-learnInspired by 100 Numpy exerises, here are 100* short puzzles for testing your knowledge of pandas' power. Since pandas is a large library with many different specialist features and functions, these excercises focus mainly on the fundamentals of manipulating data (indexing, grouping, aggregating, cleaning), making use of the core DataFrame and Series objects. Many of the excerises here are straightforward in that the solutions require no more than a few lines of code (in pandas or NumPy - don't go using pure Python!). Choosing the right methods and following best practices is the underlying goal.
pandas numpy data-analysis"Data is the new oil" is a saying which you must have heard by now along with the huge interest building up around Big Data and Machine Learning in the recent past along with Artificial Intelligence and Deep Learning. Besides this, data scientists have been termed as having "The sexiest job in the 21st Century" which makes it all the more worthwhile to build up some valuable expertise in these areas. Getting started with machine learning in the real world can be overwhelming with the vast amount of resources out there on the web. "Practical Machine Learning with Python" follows a structured and comprehensive three-tiered approach packed with concepts, methodologies, hands-on examples, and code. This book is packed with over 500 pages of useful information which helps its readers master the essential skills needed to recognize and solve complex problems with Machine Learning and Deep Learning by following a data-driven mindset. By using real-world case studies that leverage the popular Python Machine Learning ecosystem, this book is your perfect companion for learning the art and science of Machine Learning to become a successful practitioner. The concepts, techniques, tools, frameworks, and methodologies used in this book will teach you how to think, design, build, and execute Machine Learning systems and projects successfully.
machine-learning deep-learning text-analytics classification clustering natural-language-processing computer-vision data-science spacy nltk scikit-learn prophet time-series-analysis convolutional-neural-networks tensorflow keras statsmodels pandas deep-neural-networksWeld is a language and runtime for improving the performance of data-intensive applications. It optimizes across libraries and functions by expressing the core computations in libraries using a common intermediate representation, and optimizing across each framework. Modern analytics applications combine multiple functions from different libraries and frameworks to build complex workflows. Even though individual functions can achieve high performance in isolation, the performance of the combined workflow is often an order of magnitude below hardware limits due to extensive data movement across the functions. Weld’s take on solving this problem is to lazily build up a computation for the entire workflow, and then optimizing and evaluating it only when a result is needed.
stanford data analytics machine-learning code-generation performance llvm pandasNOTE: For the latest stable README.md ensure you are on the main branch. Built based on the Apache Arrow columnar memory format, cuDF is a GPU DataFrame library for loading, joining, aggregating, filtering, and otherwise manipulating data.
anaconda gpu arrow machine-learning-algorithms h2o cuda pandas python-api mapd gpu-dataframe rapids cudf🤗Datasets also provides access to +15 evaluation metrics and is designed to let the community easily add and share new datasets and evaluation metrics. 🤗Datasets originated from a fork of the awesome TensorFlow Datasets and the HuggingFace team want to deeply thank the TensorFlow Datasets team for building this amazing library. More details on the differences between 🤗Datasets and tfds can be found in the section Main differences between 🤗Datasets and tfds.
nlp natural-language-processing computer-vision metrics tensorflow numpy evaluation pandas pytorch datasetsPandasGUI is a GUI for viewing, plotting and analyzing Pandas DataFrames. Issues, feedback and pull requests are welcome.
gui viewer pandas dataframeLux is a Python library that facilitate fast and easy data exploration by automating the visualization and data analysis process. By simply printing out a dataframe in a Jupyter notebook, Lux recommends a set of visualizations highlighting interesting trends and patterns in the dataset. Visualizations are displayed via an interactive widget that enables users to quickly browse through large collections of visualizations and make sense of their data. Here is a 1-min video introducing Lux, and slides from a more extended talk.
visualization data-science jupyter exploratory-data-analysis pandas visualization-toolsMars is a tensor-based unified framework for large-scale data computation which scales numpy, pandas, scikit-learn and many other libraries. More details about installing Mars can be found at installation section in Mars document.
machine-learning tensorflow numpy scikit-learn pandas pytorch xgboost lightgbm tensor dask ray dataframe statsmodels joblibIbis is a toolbox to bridge the gap between local Python environments, remote storage, execution systems like Hadoop components (HDFS, Impala, Hive, Spark) and SQL databases. Its goal is to simplify analytical workflows and make you more productive. Learn more about using the library at http://ibis-project.org.
hadoop impala pandas hdfs ibisUp to date remote data access for pandas, works for multiple versions of pandas. As of v0.6.0 Yahoo!, Google Options, Google Quotes and EDGAR have been immediately deprecated due to large changes in their API and no stable replacement.
html data-analysis data dataset stock-data finance financial-data pydata pandas
We have large collection of open source products. Follow the tags from
Tag Cloud >>
Open source products are scattered around the web. Please provide information
about the open source projects you own / you use.
Add Projects.