- 56

weightedcalcs is a pandas-based Python library for calculating weighted means, medians, standard deviations, and more. See this notebook to see examples of other calculations, including grouped calculations.

https://github.com/jsvine/weightedcalcsTags | pandas statistics |

Implementation | Python |

License | MIT |

Platform | Windows Linux |

Chris Fonnesbeck is an Assistant Professor in the Department of Biostatistics at the Vanderbilt University School of Medicine. He specializes in computational statistics, Bayesian methods, meta-analysis, and applied decision analysis. He originally hails from Vancouver, BC and received his Ph.D. from the University of Georgia. This tutorial will introduce the use of Python for statistical data analysis, using data stored as Pandas DataFrame objects. Much of the work involved in analyzing data resides in importing, cleaning and transforming data in preparation for analysis. Therefore, the first half of the course is comprised of a 2-part overview of basic and intermediate Pandas usage that will show how to effectively manipulate datasets in memory. This includes tasks like indexing, alignment, join/merge methods, date/time types, and handling of missing data. Next, we will cover plotting and visualization using Pandas and Matplotlib, focusing on creating effective visual representations of your data, while avoiding common pitfalls. Finally, participants will be introduced to methods for statistical data modeling using some of the advanced functions in Numpy, Scipy and Pandas. This will include fitting your data to probability distributions, estimating relationships among variables using linear and non-linear models, and a brief introduction to bootstrapping methods. Each section of the tutorial will involve hands-on manipulation and analysis of sample datasets, to be provided to attendees in advance.

Zipline is a Pythonic algorithmic trading library. It is an event-driven system that supports both backtesting and live-trading. Zipline is currently used in production as the backtesting and live-trading engine powering Quantopian -- a free, community-centered, hosted platform for building and executing trading strategies.Note: Installing Zipline via pip is slightly more involved than the average Python package. Simply running pip install zipline will likely fail if you've never installed any scientific Python packages before.

algorithmic-trading trading machine-learning stock-analysisModin uses Ray to provide an effortless way to speed up your pandas notebooks, scripts, and libraries. Unlike other distributed DataFrame libraries, Modin provides seamless integration and compatibility with existing pandas code. Even using the DataFrame constructor is identical. To use Modin, you do not need to know how many cores your system has and you do not need to specify how to distribute the data. In fact, you can continue using your previous pandas notebooks while experiencing a considerable speedup from Modin, even on a single machine. Once you’ve changed your import statement, you’re ready to use Modin just like you would pandas.

dataframe pandas ray distributed datascience pandas-on-ray modin sqlxarray (formerly xray) is an open source project and Python package that aims to bring the labeled data power of pandas to the physical sciences, by providing N-dimensional variants of the core pandas data structures. Our goal is to provide a pandas-like and pandas-compatible toolkit for analytics on multi-dimensional arrays, rather than the tabular data for which pandas excels. Our approach adopts the Common Data Model for self- describing scientific data in widespread use in the Earth sciences: xarray.Dataset is an in-memory representation of a netCDF file.

scientific-computing netcdf numpy data-science pandas dataframes data-analysis pydataInspired by 100 Numpy exerises, here are 100* short puzzles for testing your knowledge of pandas' power. Since pandas is a large library with many different specialist features and functions, these excercises focus mainly on the fundamentals of manipulating data (indexing, grouping, aggregating, cleaning), making use of the core DataFrame and Series objects. Many of the excerises here are straightforward in that the solutions require no more than a few lines of code (in pandas or NumPy - don't go using pure Python!). Choosing the right methods and following best practices is the underlying goal.

pandas numpy data-analysisUp to date remote data access for pandas, works for multiple versions of pandas. As of v0.6.0 Yahoo!, Google Options, Google Quotes and EDGAR have been immediately deprecated due to large changes in their API and no stable replacement.

html data-analysis data dataset stock-data finance financial-data pydata pandasDjango REST Pandas (DRP) provides a simple way to generate and serve pandas DataFrames via the Django REST Framework. The resulting API can serve up CSV (and a number of other formats) for consumption by a client-side visualization tool like d3.js. The design philosophy of DRP enforces a strict separation between data and presentation. This keeps the implementation simple, but also has the nice side effect of making it trivial to provide the source data for your visualizations. This capability can often be leveraged by sending users to the same URL that your visualization code uses internally to load the data.

dataviz pandas django rest-api django-rest-framework spreadsheet chart excel csvSeaborn is a Python visualization library based on matplotlib. It provides a high-level interface for drawing attractive statistical graphics.Online documentation is available at seaborn.pydata.org. Installation requires numpy, scipy, pandas, and matplotlib. Some functions will optionally use statsmodels if it is installed.

data-visualization visualization statisticsThis is a free open source project for software tools in financial economics. We develop code for research notebooks which are executable scripts capable of statistical computations, as well as, collection of raw data in real-time. This serves to verify theoretical ideas and practical methods interactively. Economic and financial data, both historical and the most current.

jupyter-notebook pandas federal-reserve gdp inflation income housing equities bonds fx gold time-series econometrics statistics asset-pricing finance interest-rates economics employmentReview the computer requirements on hardware needed for the bootcamp. Completing the pre-work is essential to obtaining the foundational knowledge necessary to succeed in the Metis data science bootcamp. Each student should expect to spend 60+ hours of tutorials to become familiar with software installation, editors, command line, Python (numpy, pandas, etc.), linear algebra and statistics.

Vaex is a python library for Out-of-Core DataFrames (similar to Pandas), to visualize and explore big tabular datasets. It can calculate statistics such as mean, sum, count, standard deviation etc, on an N-dimensional grid up to a billion (109) objects/rows per second. Visualization is done using histograms, density plots and 3d volume rendering, allowing interactive exploration of big data. Vaex uses memory mapping, zero memory copy policy and lazy computations for best performance (no memory wasted).

dataframe bigdata tabular-data visualization memory-mapped-file hdf5Practice and tutorial-style notebooks covering wide variety of machine learning techniques

numpy statistics pandas matplotlib regression scikit-learn classification principal-component-analysis clustering decision-trees random-forest dimensionality-reduction neural-network deep-learning artificial-intelligence data-science machine-learning k-nearest-neighbours naive-bayespandas is a Python library for doing data analysis. It's really fast and lets you do exploratory work incredibly quickly. The goal of this cookbook is to give you some concrete examples for getting started with pandas. The docs are really comprehensive. However, I've often had people tell me that they have some trouble getting started, so these are examples with real-world data, and all the bugs and weirdness that entails.

Read about the series, and view all of the videos on one page: Easier data analysis in Python with pandas.

data-science jupyter-notebook pandas tutorial data-analysis data-cleaningEasy pipelines for pandas DataFrames. Some pipeline stages require scikit-learn; they will simply not be loaded if scikit-learn is not found on the system, and pdpipe will issue a warning. To use them you must also install scikit-learn.

pandas pandas-dataframe pipeline data data-science dataframe dataframesRepository to store sample python programs for python learning

pandas pandas-dataframe pandas-tutorial numpy numpy-arrays numpy-tutorial python-tutorial python-tutorials python-pandas jupyter-notebook jupyter jupyter-notebooks jupyter-tutorialBlaze translates a subset of modified NumPy and Pandas-like syntax to databases and other computing systems. Blaze allows Python users a familiar interface to query data living in other data storage systems. We point blaze to a simple dataset in a foreign database (PostgreSQL). Instantly we see results as we would see them in a Pandas DataFrame.

A collection of notebooks behind my series on writing idiomatic pandas.

Fed up with a ton of tutorials but no easy way to find exercises I decided to create a repo just with exercises to practice pandas. Don't get me wrong, tutorials are great resources, but to learn is to do. So unless you practice you won't learn. My suggestion is that you learn a topic in a tutorial or video and then do exercises. Learn one more topic and do exercises. If you got the answer wrong, don't go directly to the solution with code.

pandas exercise practice tutorial data-analysispandas is a Python package providing fast, flexible, and expressive data structures designed to make working with "relational" or "labeled" data both easy and intuitive. It aims to be the fundamental high-level building block for doing practical, real world data analysis in Python. Additionally, it has the broader goal of becoming the most powerful and flexible open source data analysis / manipulation tool available in any language. It is already well on its way toward this goal. Binary installers for the latest released version are available at the Python package index and on conda.

data-analysis pandas flexible alignment
We have large collection of open source products. Follow the tags from
Tag Cloud >>

Open source products are scattered around the web. Please provide information
about the open source projects you own / you use.
**Add Projects.**