turbodbc - Turbodbc is a Python module to access relational databases via the Open Database Connectivity (ODBC) interface

  •        209

Turbodbc is a Python module to access relational databases via the Open Database Connectivity (ODBC) interface. Its primary target audience are data scientist that use databases for which no efficient native Python drivers are available. For maximum compatibility, turbodbc complies with the Python Database API Specification 2.0 (PEP 249). For maximum performance, turbodbc offers built-in NumPy and Apache Arrow support and internally relies on batched data transfer instead of single-record communication as other popular ODBC modules do.




Related Projects

SQL Relay - Database Connection Pool library with API available in all programming languages

  •    C++

SQL Relay is a persistent database connection pooling, proxying and load balancing system for Unix and Linux supporting ODBC, and all major databases. It has APIs for C, C++, ODBC, Perl, Perl-DBI, Python, Python-DB, Zope, PHP, Ruby, Ruby-DBI, Java, TCL and Erlang, drop-in replacement libraries for MySQL and PostgreSQL clients.

QuestDB - Fast relational time-series database

  •    Java

QuestDB is an open-source NewSQL relational database designed to process time-series data, faster. QuestDB ingests data via HTTP, PostgresSQL wire protocol, Influx line protocol or directly from Java. Reading data is done using SQL via HTTP, PostgreSQL wire protocol or via Java API. The whole database and console fits in a 3.5Mb package.

pygdf - GPU Data Frame

  •    Jupyter

PyGDF implements the Python interface to access and manipulate the GPU Dataframe of GPU Open Analytics Initialive (GOAI). We aim to provide a simple interface that similar to the Pandas dataframe and hide the details of GPU programming.

VoltDB - Fast Scalable SQL DBMS with ACID

  •    Java

VoltDB was specifically designed for contemporary software applications that are pushed beyond their limits by high volume data sources. VoltDB provides the ability to capture, store and process incoming data at millions of read/write operations per second. And VoltDB’s relational model opens that data to be analyzed in real-time, using familiar Business Intelligence tools, to identify data patterns and trends, spot anomalies, or perform tracking and alerting.

sqitch - Sane database change management

  •    Perl

Sqitch is a database change management application. It currently supports PostgreSQL 8.4+, SQLite 3.7.11+, MySQL 5.0+, Oracle 10g+, Firebird 2.0+, Vertica 6.0+, Exasol 6.0+ and Snowflake. Sqitch is not integrated with any framework, ORM, or platform. Rather, it is a standalone change management system with no opinions about your database engine, application framework, or your development environment.

EventQL - The database for large-scale event analytics

  •    C++

EventQL is a distributed, column-oriented database built for large-scale event collection and analytics. It runs super-fast SQL and MapReduce queries. Its features include Automatic partitioning, Columnar storage, Standard SQL support, Scales to petabytes, Timeseries and relational data, Fast range scans and lot more.

palladium - Framework for setting up predictive analytics services

  •    Python

Palladium provides means to easily set up predictive analytics services as web services. It is a pluggable framework for developing real-world machine learning solutions. It provides generic implementations for things commonly needed in machine learning, such as dataset loading, model training with parameter search, a web service, and persistence capabilities, allowing you to concentrate on the core task of developing an accurate machine learning model. Having a well-tested core framework that is used for a number of different services can lead to a reduction of costs during development and maintenance due to harmonization of different services being based on the same code base and identical processes. Palladium has a web service overhead of a few milliseconds only, making it possible to set up services with low response times. A configuration file lets you conveniently tie together existing components with components that you developed. As an example, if what you want to do is to develop a model where you load a dataset from a CSV file or an SQL database, and train an SVM classifier to predict one of the rows in the data given the others, and then find out about your model's accuracy, then that's what Palladium allows you to do without writing a single line of code. However, it is also possible to independently integrate own solutions.

Infobright - The Database for Analytics

  •    C++

Infobright combines a columnar database with our Knowledge Grid architecture to deliver a self-managing, self-tuning database optimized for analytics. Infobright eliminates the need to create indexes, partition data, or do any manual tuning to achieve fast response for queries and reports.

explorer - Data Explorer by Keen IO - point-and-click interface for analyzing and visualizing event data

  •    Javascript

Check out the demo here. The Keen IO Explorer is an open source point-and-click interface for querying and visualizing your event data. It's maintained by the team at Keen IO. If you haven’t done so already, login to Keen IO to create a project for your app. You'll need a Keen IO account to create a project. The Project ID and API Keys are available on the Project Overview page. You will need these for the next steps.

xarray - N-D labeled arrays and datasets in Python

  •    Python

xarray (formerly xray) is an open source project and Python package that aims to bring the labeled data power of pandas to the physical sciences, by providing N-dimensional variants of the core pandas data structures. Our goal is to provide a pandas-like and pandas-compatible toolkit for analytics on multi-dimensional arrays, rather than the tabular data for which pandas excels. Our approach adopts the Common Data Model for self- describing scientific data in widespread use in the Earth sciences: xarray.Dataset is an in-memory representation of a netCDF file.

GeoMesa - Suite of tools for working with big geo-spatial data in a distributed fashion

  •    Scala

GeoMesa is an open-source, distributed, spatio-temporal database built on a number of distributed cloud data storage systems, including Accumulo, HBase, Cassandra, and Kafka. Leveraging a highly parallelized indexing strategy, GeoMesa aims to provide as much of the spatial querying and data manipulation to Accumulo as PostGIS does to Postgres.

InfiniDB - Scale-up analytics database engine for data warehousing and business intelligence

  •    C++

InfiniDB Community Edition is a scale-up, column-oriented database for data warehousing, analytics, business intelligence and read-intensive applications. InfiniDB's data warehouse columnar engine is multi-terabyte capable and accessed via MySQL.

numerical-linear-algebra - Free online textbook of Jupyter notebooks for fast

  •    Jupyter

This course was taught in the University of San Francisco's Masters of Science in Analytics program, summer 2017 (for graduate students studying to become data scientists). The course is taught in Python with Jupyter Notebooks, using libraries such as Scikit-Learn and Numpy for most lessons, as well as Numba (a library that compiles Python to C for faster performance) and PyTorch (an alternative to Numpy for the GPU) in a few lessons. Accompanying the notebooks is a playlist of lecture videos, available on YouTube. If you are ever confused by a lecture or it goes too quickly, check out the beginning of the next video, where I review concepts from the previous lecture, often explaining things from a new perspective or with different illustrations, and answer questions.

sandman2 - Automatically generate a RESTful API service for your legacy database. No code required!

  •    Python

sandman2 automagically generates a RESTful API service from your existing database, without requiring you to write a line of code. Simply point sandman2 to your database, add salt for seasoning, and voila!, a fully RESTful API service with hypermedia support starts running, ready to accept HTTP requests. This is a big deal. It means every single database you interact with, from the SQLite database that houses your web browser's data up to your production PostgreSQL server can be endowed with a REST API and accessed programatically, using any number of HTTP client libraries available in every language. sandman2 frees your data.

Hypertable - A high performance, scalable, distributed storage and processing system for structured

  •    C++

Hypertable is based on Google's Bigtable Design, which is a proven scalable design that powers hundreds of Google services. Many of the current scalable NoSQL database offerings are based on a hash table design which means that the data they manage is not kept physically ordered. Hypertable keeps data physically sorted by a primary key and it is well suited for Analytics.

Database Fishing Tool

  •    Perl

DaFT is a frontend to any database that can be connected via ODBC. It provides list views of tables and columns, table view of data, SQL editor, CSV,HTML,SyLK,XLS data format export, and functions for copying data between two ODBC databases.

Agile_Data_Code_2 - Code for Agile Data Science 2.0, O'Reilly 2017, Second Edition

  •    Jupyter

Like my work? I am Principal Consultant at Data Syndrome, a consultancy offering assistance and training with building full-stack analytics products, applications and systems. Find us on the web at datasyndrome.com. There is now a video course using code from chapter 8, Realtime Predictive Analytics with Kafka, PySpark, Spark MLlib and Spark Streaming. Check it out now at datasyndrome.com/video.

ClickHouse - Columnar DBMS and Real Time Analytics

  •    C++

ClickHouse is an open source column-oriented database management system capable of real time generation of analytical data reports using SQL queries. It is Linearly Scalable, Blazing Fast, Highly Reliable, Fault Tolerant, Data compression, Real time query processing, Web analytics, Vectorized query execution, Local and distributed joins. It can process hundreds of millions to more than a billion rows and tens of gigabytes of data per single server per second.

Pinot - A realtime distributed OLAP datastore

  •    Java

Pinot is a realtime distributed OLAP datastore, which is used at LinkedIn to deliver scalable real time analytics with low latency. It can ingest data from offline data sources (such as Hadoop and flat files) as well as online sources (such as Kafka). Pinot is designed to scale horizontally, so that it can scale to larger data sets and higher query rates as needed.