Pandas - Powerful Python Data Analysis Toolkit

  •        1006

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions. It supports aggregating or transforming data with a powerful group by engine allowing split-apply-combine operations on data sets, High performance merging and joining of data sets, Time series-functionality, Hierarchical axis indexing and lot more.

http://pandas.pydata.org/
https://github.com/pydata/pandas
http://code.google.com/p/pandas

Tags
Implementation
License
Platform

   




Related Projects

EventQL - The database for large-scale event analytics


EventQL is a distributed, column-oriented database built for large-scale event collection and analytics. It runs super-fast SQL and MapReduce queries. Its features include Automatic partitioning, Columnar storage, Standard SQL support, Scales to petabytes, Timeseries and relational data, Fast range scans and lot more.

blaze - NumPy and Pandas interface to Big Data


NumPy and Pandas interface to Big Data

Seaborn - Statistical data visualization using matplotlib


Seaborn is a Python visualization library based on matplotlib. It provides a high-level interface for drawing attractive statistical graphics.Online documentation is available at seaborn.pydata.org. Installation requires numpy, scipy, pandas, and matplotlib. Some functions will optionally use statsmodels if it is installed.

Zipline - A Pythonic Algorithmic Trading Library


Zipline is a Pythonic algorithmic trading library. It is an event-driven system that supports both backtesting and live-trading. Zipline is currently used in production as the backtesting and live-trading engine powering Quantopian -- a free, community-centered, hosted platform for building and executing trading strategies.Note: Installing Zipline via pip is slightly more involved than the average Python package. Simply running pip install zipline will likely fail if you've never installed any scientific Python packages before.

Pandas-Reference-Card - A reference card for Pandas data analysis toolkit, http://pandas.pydata.org


A reference card for Pandas data analysis toolkit, http://pandas.pydata.org



RecSQL


RecSQL is a hack that allows one to load table-like data records into an in-memory sqlite database for quick and dirty analysis via SQL. The SQLarray class has additional SQL functions such as sqrt or histogram defined. SQL tables can always be returned as numpy record arrays so that data can be easily handled in other packages such as numpy or plotted via matplotlib. Selections produce new SQLarray objects.

orbeckst-RecSQL


RecSQL is a hack that allows one to load table-like data records into an in-memory sqlite database for quick and dirty analysis via SQL. The SQLarray class has additional SQL functions such as sqrt or histogram defined. SQL tables can always be returned as numpy record arrays so that data can be easily handled in other packages such as numpy or plotted via matplotlib. Selections produce new SQLarray objects.

Cronos


This is a complete time series analysis package written in C#. It provides a number of tools for data manipulation, and supports a range of different models, including ARMA and GARCH models. A plugin framework allows developers to create their own custom models and transforms.

fudge - Fudge is a project template for data analysis in Python with pandas.


Fudge is a project template for data analysis in Python with pandas.

ipython_notebook_installation_notes


Notes, instructions, formulae for installing a data analysis environment with Ubuntu 12.04, ipython notebook, and pandas/scipy/numpy etc

nitime - Timeseries analysis for neuroscience data


Timeseries analysis for neuroscience data

microstructure


MicroStructure is a module that, through the use of pandas, allows for the easy analysis of market microstructure. It imports and steps through order data, reconstructs the order book, and then runs package and user supplied algorithms on the data, producing pandas DataFrames containing the results at tick-resolution. This is primarily a vessel for implementing the algorithms discussed in the literature on market microstructure. I hope to add algorithms as time goes on.

Numpy-Pandas - Notes and code for Numpy and Pandas


Notes and code for Numpy and Pandas

load_analysis - matplotlib graph plotting + numpy math analysis of data


matplotlib graph plotting + numpy math analysis of data

Sigfeaturespy - underdeveloped data analysis project utilizing matplotlib and numpy


underdeveloped data analysis project utilizing matplotlib and numpy

mysql-for-excel - MySQL for Excel is an Excel Add-In that is installed and accessed from within the MS Excel Data tab offering a wizard-like interface arranged in an elegant,yet simple way to help users browse MySQL Schemas, Tables, Views, and Procedures and perform data operations against them using MS Excel as the vehicle to drive the data in and out MySQL Databases


MySQL for Excel is an Excel Add-In that is installed and accessed from within the MS Excel’s Data tab offering a wizard-like interface arranged in an elegant yet simple way to help users browse MySQL Schemas, Tables, Views and Procedures and perform data operations against them using MS Excel as the vehicle to drive the data in and out MySQL Databases. Copyright (c) 2012, 2016, Oracle and/or its affiliates. All rights reserved.MySQL for Excel has been designed to be a simple and friendly tool for data analysts who want to harness the power of MS Excel to play with MySQL data without worrying about the technical details involved to reach the data they want, boosting productivity so they can focus on the data analysis and manipulation.

ruby-data-analysis - Data analysis software for statistics written in ruby


Data analysis software for statistics written in ruby

data-timeseries-java - Repository for streaming and batch samples of timeseries data


These samples are designed to be used as demonstrators and are not intended for production usage. Timeseries data has relevance in many sectors and industries with the problems being solved having relevance to financial data to the internet of things and instrumentation data from virtual machines.Java 1.8 is required for these samples.

data_science


This repo is focused exclusively on my adventure learning data science and the tools and techniques necessary to perform data-science-related tasks. This includes, but is not limited to: Python, SQLite, pandas, numPy, sciPy, graphlab's SFrame, d3, and whatever else I come across that I'm supposed to learn!