Displaying 1 to 20 from 42 results

JanusGraph - Distributed graph database

  •    Java

JanusGraph is a highly scalable graph database optimized for storing and querying large graphs with billions of vertices and edges distributed across a multi-machine cluster. JanusGraph is a transactional database that can support thousands of concurrent users, complex traversals, and analytic graph queries.

tera - An Internet-Scale Database.

  •    C++

Copyright 2015, Baidu, Inc. Tera is the collection of many sparse, distributed, multidimensional tables. The table is indexed by a row key, column key, and a timestamp; each value in the table is an uninterpreted array of bytes.




OpenTSDB - A scalable, distributed Time Series Database.

  •    Java

OpenTSDB is a distributed, scalable Time Series Database (TSDB) written on top of HBase. OpenTSDB was written to address a common need: store, index and serve metrics collected from computer systems (network gear, operating systems, applications) at a large scale, and make this data easily accessible and graphable.

Apache Trafodion - Webscale SQL-on-Hadoop solution enabling transactional or operational workloads on Apache Hadoop.

  •    C++

Apache Trafodion is a webscale SQL-on-Hadoop solution enabling transactional or operational workloads on Apache Hadoop. Trafodion builds on the scalability, elasticity, and flexibility of Hadoop. Trafodion extends Hadoop to provide guaranteed transactional integrity, enabling new kinds of big data applications to run on Hadoop.

Elasticsearch-Exporter - A small script to export data from one Elasticsearch cluster into another.

  •    Javascript

A command line script to import/export data from ElasticSearch to various other storage systems. This is a brand new implementation with lots of bugs and way too little time to test everything for one lonely developer, so please consider this beta at best and provide feedback, bug reports and maybe even patches.


stream-reactor - Streaming reference architecture for ETL with Kafka and Kafka-Connect

  •    Scala

Lenses offers SQL (for data browsing and Kafka Streams), Kafka Connect connector management, cluster monitoring and more. A collection of components to build a real time ingestion pipeline.

Kundera - JPA 1.0 ORM library for the Cassandra/Hbase/MongoDB database.

  •    Java

A JPA 2.0 compliant Object-Datastore Mapping Library for NoSQL Datastores. The idea behind Kundera is to make working with NoSQL Databases drop-dead simple and fun. Currently it supports Cassandra, MongoDB, HBase and Relational databases.

Gimel - PayPal's Big Data Processing Framework

  •    Scala

Gimel provides unified Data API to access data from any storage like HDFS, GS, Alluxio, Hbase, Aerospike, BigQuery, Druid, Elastic, Teradata, Oracle, MySQL, etc.

hbase-rdd - Spark RDD to read and write from HBase

  •    Scala

This project allows to connect Apache Spark to HBase. Currently it is compiled with Scala 2.10 and 2.11, using the versions of Spark and HBase available on CDH5.5. Version 0.6.0 of this project works on CDH5.3, version 0.4.0 works on CDH5.1 and version 0.2.2-SNAPSHOT works on CDH5.0. Other combinations of versions may be made available in the future. This guide assumes you are using SBT. Usage of similar tools like Maven or Leiningen should work with minor differences as well.

node-hbase-client - Asynchronous HBase client for Node.js, pure JavaScript implementation.

  •    Javascript

Asynchronous HBase client for Node.js, pure javascript implementation.

hbase-docker - HBase running in Docker

  •    Shell

This configuration builds a docker container to run HBase (with embedded Zookeeper) running on the files inside the container. The approach here requires editing the local server's /etc/hosts file to add an entry for the container hostname. This is because HBase uses hostnames to pass connection data back out of the container (from it's internal Zookeeper).

cbass - adding "simple" to HBase

  •    Clojure

In this example we are just muting "packing" and "unpacking" relying on the custom serialization being done prior to calling cbass, so the data is a byte array, and deserialization is done after the value is returned from cbass, since it will just return a byte array back in this case (i.e. identity function for both). notice the "pluto", it has no columns, which is also fine.

hbase-mr-pof - A proof of concept prototype of new HBase + Hadoop Map Reduce integration

  •    Scala

A proof of concept prototype of new HBase + Hadoop Map Reduce integration

ansible-cloudera-hadoop - ansible playbook to deploy cloudera hadoop components to the cluster

  •    Shell

The playbook is composed according to official cloudera guides with a primary purpose of production deployment in mind. High availability for HDFS and Yarn is implemented when a sufficient number of resources(hosts) is configured. From the other side, all of the components can be also deployed on a single host. It’s only required to place hostname(s) to the appropriate group in the hosts file, and the required services will be setup.