pydata-notebook - 利用Python进行数据分析 第二版 (2017) 中文翻译笔记

  •        122

利用Python进行数据分析 第二版 (2017) 中文翻译笔记

https://github.com/BrambleXu/pydata-notebook

Tags
Implementation
License
Platform

   




Related Projects

pandas-videos - Jupyter notebook and datasets from the pandas Q&A video series

  •    Jupyter

Read about the series, and view all of the videos on one page: Easier data analysis in Python with pandas.

spark-py-notebooks - Apache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks

  •    Jupyter

This is a collection of IPython notebook/Jupyter notebooks intended to train the reader on different Apache Spark concepts, from basic to advanced, by using the Python language. If Python is not your language, and it is R, you may want to have a look at our R on Apache Spark (SparkR) notebooks instead. Additionally, if your are interested in being introduced to some basic Data Science Engineering, you might find these series of tutorials interesting. There we explain different concepts and applications using Python and R.

100-pandas-puzzles - 100 data puzzles for pandas, ranging from short and simple to super tricky (60% complete)

  •    Jupyter

Inspired by 100 Numpy exerises, here are 100* short puzzles for testing your knowledge of pandas' power. Since pandas is a large library with many different specialist features and functions, these excercises focus mainly on the fundamentals of manipulating data (indexing, grouping, aggregating, cleaning), making use of the core DataFrame and Series objects. Many of the excerises here are straightforward in that the solutions require no more than a few lines of code (in pandas or NumPy - don't go using pure Python!). Choosing the right methods and following best practices is the underlying goal.

pandas-cookbook - Recipes for using Python's pandas library

  •    Jupyter

pandas is a Python library for doing data analysis. It's really fast and lets you do exploratory work incredibly quickly. The goal of this cookbook is to give you some concrete examples for getting started with pandas. The docs are really comprehensive. However, I've often had people tell me that they have some trouble getting started, so these are examples with real-world data, and all the bugs and weirdness that entails.


python-for-data-analysis - An introduction to data science using Python and Pandas with Jupyter notebooks

  •    Jupyter

Course in data science. Learn to analyze data of all types using the Python programming language. No programming experience is necessary. Note: O'Reilly Media titles are free to UCSD affiliates with Safari Books Online.

Data-Analysis-and-Machine-Learning-Projects - Repository of teaching materials, code, and data for my data analysis and machine learning projects

  •    Jupyter

This is a repository of teaching materials, code, and data for my data analysis and machine learning projects.Each repository will (usually) correspond to one of the blog posts on my web site.

practical-machine-learning-with-python - Master the essential skills needed to recognize and solve complex real-world problems with Machine Learning and Deep Learning by leveraging the highly popular Python Machine Learning Eco-system

  •    Jupyter

"Data is the new oil" is a saying which you must have heard by now along with the huge interest building up around Big Data and Machine Learning in the recent past along with Artificial Intelligence and Deep Learning. Besides this, data scientists have been termed as having "The sexiest job in the 21st Century" which makes it all the more worthwhile to build up some valuable expertise in these areas. Getting started with machine learning in the real world can be overwhelming with the vast amount of resources out there on the web. "Practical Machine Learning with Python" follows a structured and comprehensive three-tiered approach packed with concepts, methodologies, hands-on examples, and code. This book is packed with over 500 pages of useful information which helps its readers master the essential skills needed to recognize and solve complex problems with Machine Learning and Deep Learning by following a data-driven mindset. By using real-world case studies that leverage the popular Python Machine Learning ecosystem, this book is your perfect companion for learning the art and science of Machine Learning to become a successful practitioner. The concepts, techniques, tools, frameworks, and methodologies used in this book will teach you how to think, design, build, and execute Machine Learning systems and projects successfully.

xarray - N-D labeled arrays and datasets in Python

  •    Python

xarray (formerly xray) is an open source project and Python package that aims to bring the labeled data power of pandas to the physical sciences, by providing N-dimensional variants of the core pandas data structures. Our goal is to provide a pandas-like and pandas-compatible toolkit for analytics on multi-dimensional arrays, rather than the tabular data for which pandas excels. Our approach adopts the Common Data Model for self- describing scientific data in widespread use in the Earth sciences: xarray.Dataset is an in-memory representation of a netCDF file.

LSTM-Sentiment-Analysis - Sentiment Analysis with LSTMs in Tensorflow

  •    Jupyter

This repository contains the iPython notebook and training data to accompany the O'Reilly tutorial on sentiment analysis with LSTMs in Tensorflow. See the original tutorial to run this code in a pre-built environment on O'Reilly's servers with cell-by-cell guidance, or run these files on your own machine. There is also another file called Pre-Trained LSTM.ipynb which allows you to input your own text, and see the output of the trained network. Before running the notebook, you'll first need to download all data we'll be using. This data is located in the models.tar.gz and training_data.tar.gz tarballs. We will extract these into the same directory as Oriole LSTM.ipynb. As always, the first step is to clone the repository.

data-science-your-way - Ways of doing Data Science Engineering and Machine Learning in R and Python

  •    Jupyter

These series of tutorials on Data Science engineering will try to compare how different concepts in the discipline can be implemented in the two dominant ecosystems nowadays: R and Python. We will do this from a neutral point of view. Our opinion is that each environment has good and bad things, and any data scientist should know how to use both in order to be as prepared as posible for job market or to start personal project.

rcloud - Collaborative data analysis and visualization

  •    Javascript

RCloud is an environment for collaboratively creating and sharing data analysis scripts. RCloud lets you mix analysis code in R, HTML5, Markdown, Python, and others. Much like Jupyter notebooks, Beaker notebook, Apache Zeppelin, Sage, and Mathematica, RCloud provides a notebook interface that lets you easily record a session and annotate it with text, equations, and supporting images. lets you easily browse and search other users's notebooks. You can comment on notebooks, fork them, star them, and use them as function calls in your own notebooks.

statistical-analysis-python-tutorial - Statistical Data Analysis in Python

  •    HTML

Chris Fonnesbeck is an Assistant Professor in the Department of Biostatistics at the Vanderbilt University School of Medicine. He specializes in computational statistics, Bayesian methods, meta-analysis, and applied decision analysis. He originally hails from Vancouver, BC and received his Ph.D. from the University of Georgia. This tutorial will introduce the use of Python for statistical data analysis, using data stored as Pandas DataFrame objects. Much of the work involved in analyzing data resides in importing, cleaning and transforming data in preparation for analysis. Therefore, the first half of the course is comprised of a 2-part overview of basic and intermediate Pandas usage that will show how to effectively manipulate datasets in memory. This includes tasks like indexing, alignment, join/merge methods, date/time types, and handling of missing data. Next, we will cover plotting and visualization using Pandas and Matplotlib, focusing on creating effective visual representations of your data, while avoiding common pitfalls. Finally, participants will be introduced to methods for statistical data modeling using some of the advanced functions in Numpy, Scipy and Pandas. This will include fitting your data to probability distributions, estimating relationships among variables using linear and non-linear models, and a brief introduction to bootstrapping methods. Each section of the tutorial will involve hands-on manipulation and analysis of sample datasets, to be provided to attendees in advance.

Data-Analysis - Data Analysis Using Python

  •    Jupyter

Data Analysis Using Python and a little R. A place to share my code and reports for various data science projects.

geonotebook - A Jupyter notebook extension for geospatial visualization and analysis

  •    Python

GeoNotebook is an application that provides client/server environment with interactive visualization and analysis capabilities using Jupyter, GeoJS and other open source tools. Jointly developed by Kitware and NASA Ames. Documentation for GeoNotebook can be found at http://geonotebook.readthedocs.io.

BDA_py_demos - Bayesian Data Analysis demos for Python

  •    Jupyter

to interactively run the IPython Notebooks in the browser. This repository contains some Python demos for the book Bayesian Data Analysis, 3rd ed by Gelman, Carlin, Stern, Dunson, Vehtari, and Rubin (BDA3).

alphalens - Performance analysis of predictive (alpha) stock factors

  •    Jupyter

Alphalens is a Python Library for performance analysis of predictive (alpha) stock factors. Alphalens works great with the Zipline open source backtesting library, and Pyfolio which provides performance and risk analysis of financial portfolios.Check out the example notebooks for more on how to read and use the factor tear sheet.

pandas - Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data

  •    Python

pandas is a Python package providing fast, flexible, and expressive data structures designed to make working with "relational" or "labeled" data both easy and intuitive. It aims to be the fundamental high-level building block for doing practical, real world data analysis in Python. Additionally, it has the broader goal of becoming the most powerful and flexible open source data analysis / manipulation tool available in any language. It is already well on its way toward this goal. Binary installers for the latest released version are available at the Python package index and on conda.

PythonDataScienceHandbook - Python Data Science Handbook: full text in Jupyter Notebooks

  •    Jupyter

This repository contains the entire Python Data Science Handbook, in the form of (free!) Jupyter notebooks. Run the code using the Jupyter notebooks available in this repository's notebooks directory.

pydata-book - Materials and IPython notebooks for "Python for Data Analysis" by Wes McKinney, published by O'Reilly Media

  •    Jupyter

If you are reading the 1st Edition (published in 2012), please find the reorganized book materials on the 1st-edition branch. The code in this repository, including all code samples in the notebooks listed above, is released under the MIT license. Read more at the Open Source Initiative.