xarray - N-D labeled arrays and datasets in Python

  •        172

xarray (formerly xray) is an open source project and Python package that aims to bring the labeled data power of pandas to the physical sciences, by providing N-dimensional variants of the core pandas data structures. Our goal is to provide a pandas-like and pandas-compatible toolkit for analytics on multi-dimensional arrays, rather than the tabular data for which pandas excels. Our approach adopts the Common Data Model for self- describing scientific data in widespread use in the Earth sciences: xarray.Dataset is an in-memory representation of a netCDF file.

http://xarray.pydata.org
https://github.com/pydata/xarray

Tags
Implementation
License
Platform

   




Related Projects

Zipline - A Pythonic Algorithmic Trading Library

  •    Python

Zipline is a Pythonic algorithmic trading library. It is an event-driven system that supports both backtesting and live-trading. Zipline is currently used in production as the backtesting and live-trading engine powering Quantopian -- a free, community-centered, hosted platform for building and executing trading strategies.Note: Installing Zipline via pip is slightly more involved than the average Python package. Simply running pip install zipline will likely fail if you've never installed any scientific Python packages before.

pandas-datareader - Extract data from a wide range of Internet sources into a pandas DataFrame.

  •    HTML

Up to date remote data access for pandas, works for multiple versions of pandas. As of v0.6.0 Yahoo!, Google Options, Google Quotes and EDGAR have been immediately deprecated due to large changes in their API and no stable replacement.

eland - Python Client and Toolkit for DataFrames, Big Data, Machine Learning and ETL in Elasticsearch

  •    Python

Eland is a Python Elasticsearch client for exploring and analyzing data in Elasticsearch with a familiar Pandas-compatible API. Where possible the package uses existing Python APIs and data structures to make it easy to switch between numpy, pandas, scikit-learn to their Elasticsearch powered equivalents. In general, the data resides in Elasticsearch and not in memory, which allows Eland to access large datasets stored in Elasticsearch.

pdpipe - Easy pipelines for pandas DataFrames.

  •    Python

Easy pipelines for pandas DataFrames. Some pipeline stages require scikit-learn; they will simply not be loaded if scikit-learn is not found on the system, and pdpipe will issue a warning. To use them you must also install scikit-learn.

statistical-analysis-python-tutorial - Statistical Data Analysis in Python

  •    HTML

Chris Fonnesbeck is an Assistant Professor in the Department of Biostatistics at the Vanderbilt University School of Medicine. He specializes in computational statistics, Bayesian methods, meta-analysis, and applied decision analysis. He originally hails from Vancouver, BC and received his Ph.D. from the University of Georgia. This tutorial will introduce the use of Python for statistical data analysis, using data stored as Pandas DataFrame objects. Much of the work involved in analyzing data resides in importing, cleaning and transforming data in preparation for analysis. Therefore, the first half of the course is comprised of a 2-part overview of basic and intermediate Pandas usage that will show how to effectively manipulate datasets in memory. This includes tasks like indexing, alignment, join/merge methods, date/time types, and handling of missing data. Next, we will cover plotting and visualization using Pandas and Matplotlib, focusing on creating effective visual representations of your data, while avoiding common pitfalls. Finally, participants will be introduced to methods for statistical data modeling using some of the advanced functions in Numpy, Scipy and Pandas. This will include fitting your data to probability distributions, estimating relationships among variables using linear and non-linear models, and a brief introduction to bootstrapping methods. Each section of the tutorial will involve hands-on manipulation and analysis of sample datasets, to be provided to attendees in advance.


NumPy - fundamental package for scientific computing with Python

  •    Python

NumPy is the fundamental package needed for scientific computing with Python. Numerical Python adds a fast and sophisticated N-dimensional array facility to the Python language. NumPy can also be used as an efficient multi-dimensional container of generic data. Arbitrary data-types can be defined. This allows NumPy to seamlessly and speedily integrate with a wide variety of databases.

stumpy - STUMPY is a powerful and scalable Python library for modern time series analysis

  •    Python

Whether you are an academic, data scientist, software developer, or time series enthusiast, STUMPY is straightforward to install and our goal is to allow you to get to your time series insights faster. See documentation for more information. Please see our API documentation for a complete list of available functions and see our informative tutorials for more comprehensive example use cases. Below, you will find code snippets that quickly demonstrate how to use STUMPY.

100-pandas-puzzles - 100 data puzzles for pandas, ranging from short and simple to super tricky (60% complete)

  •    Jupyter

Inspired by 100 Numpy exerises, here are 100* short puzzles for testing your knowledge of pandas' power. Since pandas is a large library with many different specialist features and functions, these excercises focus mainly on the fundamentals of manipulating data (indexing, grouping, aggregating, cleaning), making use of the core DataFrame and Series objects. Many of the excerises here are straightforward in that the solutions require no more than a few lines of code (in pandas or NumPy - don't go using pure Python!). Choosing the right methods and following best practices is the underlying goal.

data-science-with-ruby - Practical Data Science with Ruby based tools.

  •    Ruby

Data Science is a new "sexy" buzzword without specific meaning but often used to substitute Statistics, Scientific Computing, Text and Data Mining and Visualization, Machine Learning, Data Processing and Warehousing as well as Retrieval Algorithms of any kind. This curated list comprises awesome tutorials, libraries, information sources about various Data Science applications using the Ruby programming language.

pandapy - PandaPy has the speed of NumPy and the usability of Pandas 10x to 50x faster (by @firmai)

  •    Python

Animated investment research at Sov.ai, sponsoring open source initiatives. PandaPy software, similar to the original Pandas project, is developed to improve the usability of python for finance. Structured datatypes are designed to be able to mimic ‘structs’ in the C language, and share a similar memory layout. PandaPy currently houses more than 30 functions. Structured NumPy are meant for interfacing with C code and for low-level manipulation of structured buffers, for example for interpreting binary blobs. For these purposes they support specialized features such as subarrays, nested datatypes, and unions, and allow control over the memory layout of the structure.

pycon-2019-tutorial - Data Science Best Practices with pandas

  •    Jupyter

This tutorial was presented by Kevin Markham at PyCon on May 2, 2019. Watch the complete tutorial video on YouTube. The pandas library is a powerful tool for multiple phases of the data science workflow, including data cleaning, visualization, and exploratory data analysis. However, the size and complexity of the pandas library makes it challenging to discover the best way to accomplish any given task.

datacleaner - A Python tool that automatically cleans data sets and readies them for analysis.

  •    Python

A Python tool that automatically cleans data sets and readies them for analysis.datacleaner works with data in pandas DataFrames.

blaze - NumPy and Pandas interface to Big Data

  •    Python

Blaze translates a subset of modified NumPy and Pandas-like syntax to databases and other computing systems. Blaze allows Python users a familiar interface to query data living in other data storage systems. We point blaze to a simple dataset in a foreign database (PostgreSQL). Instantly we see results as we would see them in a Pandas DataFrame.

cugraph - cuGraph - RAPIDS Graph Analytics Library

  •    Jupyter

The RAPIDS cuGraph library is a collection of GPU accelerated graph algorithms that process data found in GPU DataFrames. The vision of cuGraph is to make graph analysis ubiquitous to the point that users just think in terms of analysis and not technologies or frameworks. To realize that vision, cuGraph operates, at the Python layer, on GPU DataFrames, thereby allowing for seamless passing of data between ETL tasks in cuDF and machine learning tasks in cuML. Data scientists familiar with Python will quickly pick up how cuGraph integrates with the Pandas-like API of cuDF. Likewise, users familiar with NetworkX will quickly recognize the NetworkX-like API provided in cuGraph, with the goal to allow existing code to be ported with minimal effort into RAPIDS. For users familiar with C++/CUDA and graph structures, a C++ API is also provided. However, there is less type and structure checking at the C++ layer. For more project details, see rapids.ai.

reflow - A language and runtime for distributed, incremental data processing in the cloud

  •    Go

Reflow is a system for incremental data processing in the cloud. Reflow enables scientists and engineers to compose existing tools (packaged in Docker images) using ordinary programming constructs. Reflow then evaluates these programs in a cloud environment, transparently parallelizing work and memoizing results. Reflow was created at GRAIL to manage our NGS (next generation sequencing) bioinformatics workloads on AWS, but has also been used for many other applications, including model training and ad-hoc data analyses. Reflow thus allows scientists and engineers to write straightforward programs and then have them transparently executed in a cloud environment. Programs are automatically parallelized and distributed across multiple machines, and redundant computations (even across runs and users) are eliminated by its memoization cache. Reflow evaluates its programs incrementally: whenever the input data or program changes, only those outputs that depend on the changed data or code are recomputed.

django-rest-pandas - 📊📈 Serves up Pandas dataframes via the Django REST Framework for use in client-side (i

  •    Python

Django REST Pandas (DRP) provides a simple way to generate and serve pandas DataFrames via the Django REST Framework. The resulting API can serve up CSV (and a number of other formats) for consumption by a client-side visualization tool like d3.js. The design philosophy of DRP enforces a strict separation between data and presentation. This keeps the implementation simple, but also has the nice side effect of making it trivial to provide the source data for your visualizations. This capability can often be leveraged by sending users to the same URL that your visualization code uses internally to load the data.

pandas-videos - Jupyter notebook and datasets from the pandas Q&A video series

  •    Jupyter

Read about the series, and view all of the videos on one page: Easier data analysis in Python with pandas.






We have large collection of open source products. Follow the tags from Tag Cloud >>


Open source products are scattered around the web. Please provide information about the open source projects you own / you use. Add Projects.