xarray (formerly xray) is an open source project and Python package that aims to bring the labeled data power of pandas to the physical sciences, by providing N-dimensional variants of the core pandas data structures. Our goal is to provide a pandas-like and pandas-compatible toolkit for analytics on multi-dimensional arrays, rather than the tabular data for which pandas excels. Our approach adopts the Common Data Model for self- describing scientific data in widespread use in the Earth sciences: xarray.Dataset is an in-memory representation of a netCDF file.
http://xarray.pydata.orgTags | scientific-computing netcdf numpy data-science pandas dataframes data-analysis pydata |
Implementation | Python |
License | Apache |
Platform | Windows Linux |
Zipline is a Pythonic algorithmic trading library. It is an event-driven system that supports both backtesting and live-trading. Zipline is currently used in production as the backtesting and live-trading engine powering Quantopian -- a free, community-centered, hosted platform for building and executing trading strategies.Note: Installing Zipline via pip is slightly more involved than the average Python package. Simply running pip install zipline will likely fail if you've never installed any scientific Python packages before.
algorithmic-trading trading machine-learning stock-analysisUp to date remote data access for pandas, works for multiple versions of pandas. As of v0.6.0 Yahoo!, Google Options, Google Quotes and EDGAR have been immediately deprecated due to large changes in their API and no stable replacement.
html data-analysis data dataset stock-data finance financial-data pydata pandasEland is a Python Elasticsearch client for exploring and analyzing data in Elasticsearch with a familiar Pandas-compatible API. Where possible the package uses existing Python APIs and data structures to make it easy to switch between numpy, pandas, scikit-learn to their Elasticsearch powered equivalents. In general, the data resides in Elasticsearch and not in memory, which allows Eland to access large datasets stored in Elasticsearch.
elasticsearch machine-learning big-data etl scikit-learn pandas lightgbm data-analysis dataframe dataframes time-series-forecasting elandEasy pipelines for pandas DataFrames. Some pipeline stages require scikit-learn; they will simply not be loaded if scikit-learn is not found on the system, and pdpipe will issue a warning. To use them you must also install scikit-learn.
pandas pandas-dataframe pipeline data data-science dataframe dataframesChris Fonnesbeck is an Assistant Professor in the Department of Biostatistics at the Vanderbilt University School of Medicine. He specializes in computational statistics, Bayesian methods, meta-analysis, and applied decision analysis. He originally hails from Vancouver, BC and received his Ph.D. from the University of Georgia. This tutorial will introduce the use of Python for statistical data analysis, using data stored as Pandas DataFrame objects. Much of the work involved in analyzing data resides in importing, cleaning and transforming data in preparation for analysis. Therefore, the first half of the course is comprised of a 2-part overview of basic and intermediate Pandas usage that will show how to effectively manipulate datasets in memory. This includes tasks like indexing, alignment, join/merge methods, date/time types, and handling of missing data. Next, we will cover plotting and visualization using Pandas and Matplotlib, focusing on creating effective visual representations of your data, while avoiding common pitfalls. Finally, participants will be introduced to methods for statistical data modeling using some of the advanced functions in Numpy, Scipy and Pandas. This will include fitting your data to probability distributions, estimating relationships among variables using linear and non-linear models, and a brief introduction to bootstrapping methods. Each section of the tutorial will involve hands-on manipulation and analysis of sample datasets, to be provided to attendees in advance.
利用Python进行数据分析 第二版 (2017) 中文翻译笔记
python-for-data-analysis jupyter-notebook chinese-translation data-analysis pandasNumPy is the fundamental package needed for scientific computing with Python. Numerical Python adds a fast and sophisticated N-dimensional array facility to the Python language. NumPy can also be used as an efficient multi-dimensional container of generic data. Arbitrary data-types can be defined. This allows NumPy to seamlessly and speedily integrate with a wide variety of databases.
scientific scientific-computing mathematics n-arrayIPython Notebook(s) demonstrating deep learning functionality.IPython Notebook(s) demonstrating scikit-learn functionality.
machine-learning deep-learning data-science big-data aws tensorflow theano caffe scikit-learn kaggle spark mapreduce hadoop matplotlib pandas numpy scipy kerasWhether you are an academic, data scientist, software developer, or time series enthusiast, STUMPY is straightforward to install and our goal is to allow you to get to your time series insights faster. See documentation for more information. Please see our API documentation for a complete list of available functions and see our informative tutorials for more comprehensive example use cases. Below, you will find code snippets that quickly demonstrate how to use STUMPY.
data-science pattern-matching pydata dask numba motif-discovery time-series-analysis anomaly-detection time-series-data-mining matrix-profile time-series-segmentationInspired by 100 Numpy exerises, here are 100* short puzzles for testing your knowledge of pandas' power. Since pandas is a large library with many different specialist features and functions, these excercises focus mainly on the fundamentals of manipulating data (indexing, grouping, aggregating, cleaning), making use of the core DataFrame and Series objects. Many of the excerises here are straightforward in that the solutions require no more than a few lines of code (in pandas or NumPy - don't go using pure Python!). Choosing the right methods and following best practices is the underlying goal.
pandas numpy data-analysisData Science is a new "sexy" buzzword without specific meaning but often used to substitute Statistics, Scientific Computing, Text and Data Mining and Visualization, Machine Learning, Data Processing and Warehousing as well as Retrieval Algorithms of any kind. This curated list comprises awesome tutorials, libraries, information sources about various Data Science applications using the Ruby programming language.
data-science data-visualization data-analysis data-mining data-analytics visualization awesome awesome-list list rubydatascienceAnimated investment research at Sov.ai, sponsoring open source initiatives. PandaPy software, similar to the original Pandas project, is developed to improve the usability of python for finance. Structured datatypes are designed to be able to mimic ‘structs’ in the C language, and share a similar memory layout. PandaPy currently houses more than 30 functions. Structured NumPy are meant for interfacing with C code and for low-level manipulation of structured buffers, for example for interpreting binary blobs. For these purposes they support specialized features such as subarrays, nested datatypes, and unions, and allow control over the memory layout of the structure.
finance data-science machine-learning numpy pandas data-structures arrays structured-data algorithmic-tradingThe JavaScript data transformation and analysis toolkit inspired by Pandas and LINQ. Implemented in TypeScript, used in JavaScript ES5+ or TypeScript.
data-wrangling data-forge data data-analysis nodejs pandas visualization data-visualization data-management data-manipulation data-munging data-cleaning data-cleansing csv json data-science data-clensingThis tutorial was presented by Kevin Markham at PyCon on May 2, 2019. Watch the complete tutorial video on YouTube. The pandas library is a powerful tool for multiple phases of the data science workflow, including data cleaning, visualization, and exploratory data analysis. However, the size and complexity of the pandas library makes it challenging to discover the best way to accomplish any given task.
data-science tutorial pandas vizualisationA Python tool that automatically cleans data sets and readies them for analysis.datacleaner works with data in pandas DataFrames.
data-science machine-learning automationBlaze translates a subset of modified NumPy and Pandas-like syntax to databases and other computing systems. Blaze allows Python users a familiar interface to query data living in other data storage systems. We point blaze to a simple dataset in a foreign database (PostgreSQL). Instantly we see results as we would see them in a Pandas DataFrame.
The RAPIDS cuGraph library is a collection of GPU accelerated graph algorithms that process data found in GPU DataFrames. The vision of cuGraph is to make graph analysis ubiquitous to the point that users just think in terms of analysis and not technologies or frameworks. To realize that vision, cuGraph operates, at the Python layer, on GPU DataFrames, thereby allowing for seamless passing of data between ETL tasks in cuDF and machine learning tasks in cuML. Data scientists familiar with Python will quickly pick up how cuGraph integrates with the Pandas-like API of cuDF. Likewise, users familiar with NetworkX will quickly recognize the NetworkX-like API provided in cuGraph, with the goal to allow existing code to be ported with minimal effort into RAPIDS. For users familiar with C++/CUDA and graph structures, a C++ API is also provided. However, there is less type and structure checking at the C++ layer. For more project details, see rapids.ai.
Reflow is a system for incremental data processing in the cloud. Reflow enables scientists and engineers to compose existing tools (packaged in Docker images) using ordinary programming constructs. Reflow then evaluates these programs in a cloud environment, transparently parallelizing work and memoizing results. Reflow was created at GRAIL to manage our NGS (next generation sequencing) bioinformatics workloads on AWS, but has also been used for many other applications, including model training and ad-hoc data analyses. Reflow thus allows scientists and engineers to write straightforward programs and then have them transparently executed in a cloud environment. Programs are automatically parallelized and distributed across multiple machines, and redundant computations (even across runs and users) are eliminated by its memoization cache. Reflow evaluates its programs incrementally: whenever the input data or program changes, only those outputs that depend on the changed data or code are recomputed.
bioinformatics-pipeline scientific-computing language runtime data-science cloud-computing aws analysis-pipelineDjango REST Pandas (DRP) provides a simple way to generate and serve pandas DataFrames via the Django REST Framework. The resulting API can serve up CSV (and a number of other formats) for consumption by a client-side visualization tool like d3.js. The design philosophy of DRP enforces a strict separation between data and presentation. This keeps the implementation simple, but also has the nice side effect of making it trivial to provide the source data for your visualizations. This capability can often be leveraged by sending users to the same URL that your visualization code uses internally to load the data.
dataviz pandas django rest-api django-rest-framework spreadsheet chart excel csvRead about the series, and view all of the videos on one page: Easier data analysis in Python with pandas.
data-science jupyter-notebook pandas tutorial data-analysis data-cleaning
We have large collection of open source products. Follow the tags from
Tag Cloud >>
Open source products are scattered around the web. Please provide information
about the open source projects you own / you use.
Add Projects.