NimData - DataFrame API written in Nim, enabling fast out-of-core data processing

DataFrame API written in Nim, enabling fast out-of-core data processing. NimData is inspired by frameworks like Pandas/Spark/Flink/Thrill, and sits between the Pandas and the Spark/Flink/Thrill side. Similar to Pandas, NimData is currently non-distributed, but shares the type-safe, lazy API of Spark/Flink/Thrill. Thanks to Nim, it enables elegant out-of-core processing at native speed.



Related Projects

Marketstore - DataFrame Server for Financial Timeseries Data

  •    Go

MarketStore is a database server optimized for financial timeseries data. You can think of it as an extensible DataFrame service that is accessible from anywhere in your system, at higher scalability. It is designed from the ground up to address scalability issues around handling large amounts of financial market data used in algorithmic trading backtesting, charting, and analyzing price history with data spanning many years, including tick-level for the all US equities or the exploding crypto currencies space. If you are struggling with managing lots of HDF5 files, this is perfect solution to your problem.

Nim - Nim is a compiled, garbage-collected systems programming language with a design that focuses on efficiency, expressiveness, and elegance (in the order of priority)

  •    Nim

This repository contains the Nim compiler, Nim's stdlib, tools and documentation. For more information about Nim, including downloads and documentation for the latest release, check out Nim's website.More platforms are supported, however they are not tested regularly and they may not be as stable as the above-listed platforms.

jester - A sinatra-like web framework for Nim.

  •    Nim

The sinatra-like web framework for Nim. Jester provides a DSL for quickly creating web applications in Nim. Note: Jester requires Nim version 0.15.0.

nimx - GUI library

  •    Nim

Cross-platform GUI framework in Nim. Nimx officially supports Linux, MacOS, Windows, Android, iOS, Javascript (with Nim JS backend) and Asm.js (with Nim C backend and Emscripten).


  •    Scala

This is a package for DataFrame-based graphs on top of Apache Spark. Users can write highly expressive queries by leveraging the DataFrame API, combined with a new API for motif finding. The user also benefits from DataFrame performance optimizations within the Spark SQL engine. To compile this project, run build/sbt assembly from the project home directory. This will also run the Scala unit tests.

modin - Modin: Speed up your Pandas workflows by changing a single line of code

  •    Python

Modin uses Ray to provide an effortless way to speed up your pandas notebooks, scripts, and libraries. Unlike other distributed DataFrame libraries, Modin provides seamless integration and compatibility with existing pandas code. Even using the DataFrame constructor is identical. To use Modin, you do not need to know how many cores your system has and you do not need to specify how to distribute the data. In fact, you can continue using your previous pandas notebooks while experiencing a considerable speedup from Modin, even on a single machine. Once you’ve changed your import statement, you’re ready to use Modin just like you would pandas.

nimble - Package manager for the Nim programming language.

  •    Nim

Nimble is a beta-grade package manager for the Nim programming language.Interested in learning how to create a package? Skip directly to that section here.

nimkernel - A small kernel written in Nim

  •    Nimrod

This is a small 32bit (i586) kernel written using the Nim programming language. I have been wanting to do this for a while but it wasn't until people in the #nim IRC channel inquired about Nim OS dev and the rustboot kernel inspired me that I finally did it.

Arraymancer - A fast, ergonomic and portable tensor library in Nim with a deep learning focus for CPU, GPU, OpenCL and embedded devices

  •    Nim

Arraymancer is a tensor (N-dimensional array) project in Nim. The main focus is providing a fast and ergonomic CPU, Cuda and OpenCL ndarray library on which to build a scientific computing and in particular a deep learning ecosystem. The library is inspired by Numpy and PyTorch. The library provides ergonomics very similar to Numpy, Julia and Matlab but is fully parallel and significantly faster than those libraries. It is also faster than C-based Torch.

Aporia - IDE/Advanced text editor mainly focusing on support for the Nim programming language.

  •    Nim

Aporia is an IDE for the Nim programming language. Aporia uses GTK as the default toolkit, and the gtksourceview for the text editor component.The method by which Aporia can be installed depends on your platform. The following installation instructions are valid as of version 0.4.1 of Aporia.

gota - Gota: DataFrames and data wrangling in Go (Golang)

  •    Go

This is an implementation of DataFrames, Series and data wrangling methods for the Go programming language. The API is still in flux so use at your own risk.The term DataFrame typically refers to a tabular dataset that can be viewed as a two dimensional table. Often the columns of this dataset refers to a list of features, while the rows represent a number of measurements. As the data on the real world is not perfect, DataFrame supports non measurements or NaN elements.

pygdf - GPU Data Frame

  •    Jupyter

PyGDF implements the Python interface to access and manipulate the GPU Dataframe of GPU Open Analytics Initialive (GOAI). We aim to provide a simple interface that similar to the Pandas dataframe and hide the details of GPU programming.

pdpipe - Easy pipelines for pandas DataFrames.

  •    Python

Easy pipelines for pandas DataFrames. Some pipeline stages require scikit-learn; they will simply not be loaded if scikit-learn is not found on the system, and pdpipe will issue a warning. To use them you must also install scikit-learn.

shc - The Apache Spark - Apache HBase Connector is a library to support Spark accessing HBase table as external data source or sink

  •    Scala

The Apache Spark - Apache HBase Connector is a library to support Spark accessing HBase table as external data source or sink. With it, user can operate HBase with Spark-SQL on DataFrame and DataSet level. With the DataFrame and DataSet support, the library leverages all the optimization techniques in catalyst, and achieves data locality, partition pruning, predicate pushdown, Scanning and BulkGet, etc.

datafusion - SQL Query Execution against Apache Arrow, in Rust

  •    Rust

DataFusion is a SQL parser, planner, and query execution library for Rust. A DataFrame API is also provided. DataFusion can be used as a crate dependency in your project to add SQL support for custom data sources.

spry - A Smalltalk and Rebol inspired language implemented as an AST interpreter in Nim

  •    Nim

Included is a VagrantFile for Ubuntu 16.04. Just do vagrant up and vagrant ssh into it to find spry installed. Test with ispry - the "interactive spry" REPL. The following should work on a Ubuntu/Debian, adapt accordingly for other distros.

pipelines - a language for scripting data flow

  •    Nim

Pipelines is a language and runtime for crafting massively parallel pipelines. Unlike other languages for defining data flow, the Pipeline language requires implementation of components to be defined separately in the Python scripting language. This allows the details of implementations to be separated from the structure of the pipeline, while providing access to thousands of active libraries for machine learning, data analysis and processing. Skip to Getting Started to install the Pipeline compiler. Running the Pipeline document would safely execute each component of the pipeline in parallel and output the expected result.

Data Frame Loader


A simple C# API for loading tabular dataframes into Microsoft SQL Server database using only a small number of tables to represent any kind of dataframe.

Mobius - C# and F# language binding and extensions to Apache Spark

  •    CSharp

Mobius provides C# language binding to Apache Spark enabling the implementation of Spark driver program and data processing operations in the languages supported in the .NET framework like C# or F#.For more code samples, refer to Mobius\examples directory or Mobius\csharp\Samples directory.

pandas-datareader - Extract data from a wide range of Internet sources into a pandas DataFrame.

  •    HTML

Up to date remote data access for pandas, works for multiple versions of pandas. As of v0.6.0 Yahoo!, Google Options, Google Quotes and EDGAR have been immediately deprecated due to large changes in their API and no stable replacement.

