Displaying 1 to 20 from 44 results

JanusGraph - Distributed graph database

  •    Java

JanusGraph is a highly scalable graph database optimized for storing and querying large graphs with billions of vertices and edges distributed across a multi-machine cluster. JanusGraph is a transactional database that can support thousands of concurrent users, complex traversals, and analytic graph queries.

Gaffer - A large-scale entity and relation database supporting aggregation of properties

  •    Java

Gaffer is a graph database framework. It allows the storage of very large graphs containing rich properties on the nodes and edges. Several storage options are available, including Accumulo, Hbase and Parquet. It is designed to be as flexible, scalable and extensible as possible, allowing for rapid prototyping and transition to production systems.

tera - An Internet-Scale Database.

  •    C++

Copyright 2015, Baidu, Inc. Tera is the collection of many sparse, distributed, multidimensional tables. The table is indexed by a row key, column key, and a timestamp; each value in the table is an uninterpreted array of bytes.




OpenTSDB - A scalable, distributed Time Series Database.

  •    Java

OpenTSDB is a distributed, scalable Time Series Database (TSDB) written on top of HBase. OpenTSDB was written to address a common need: store, index and serve metrics collected from computer systems (network gear, operating systems, applications) at a large scale, and make this data easily accessible and graphable.

Apache Trafodion - Webscale SQL-on-Hadoop solution enabling transactional or operational workloads on Apache Hadoop.

  •    C++

Apache Trafodion is a webscale SQL-on-Hadoop solution enabling transactional or operational workloads on Apache Hadoop. Trafodion builds on the scalability, elasticity, and flexibility of Hadoop. Trafodion extends Hadoop to provide guaranteed transactional integrity, enabling new kinds of big data applications to run on Hadoop.

Elasticsearch-Exporter - A small script to export data from one Elasticsearch cluster into another.

  •    Javascript

A command line script to import/export data from ElasticSearch to various other storage systems. This is a brand new implementation with lots of bugs and way too little time to test everything for one lonely developer, so please consider this beta at best and provide feedback, bug reports and maybe even patches.


stream-reactor - Streaming reference architecture for ETL with Kafka and Kafka-Connect

  •    Scala

Lenses offers SQL (for data browsing and Kafka Streams), Kafka Connect connector management, cluster monitoring and more. A collection of components to build a real time ingestion pipeline.

Kundera - JPA 1.0 ORM library for the Cassandra/Hbase/MongoDB database.

  •    Java

A JPA 2.0 compliant Object-Datastore Mapping Library for NoSQL Datastores. The idea behind Kundera is to make working with NoSQL Databases drop-dead simple and fun. Currently it supports Cassandra, MongoDB, HBase and Relational databases.

Gimel - PayPal's Big Data Processing Framework

  •    Scala

Gimel provides unified Data API to access data from any storage like HDFS, GS, Alluxio, Hbase, Aerospike, BigQuery, Druid, Elastic, Teradata, Oracle, MySQL, etc.

WeDataSphere - WeDataSphere is a financial level one-stop open-source suitcase for big data platforms

  •    

DataSphere Studio, Linkis, Scriptis, Qualitis, Schedulis, Exchangis. DataSphere Studio is positioned as a data application development portal, and the closed loop covers the entire process of data application development. With a unified UI, the workflow-like graphical drag-and-drop development experience meets the entire lifecycle of data application development from data import, desensitization cleaning, data analysis, data mining, quality inspection, visualization, scheduling to data output applications, etc.

hbase-rdd - Spark RDD to read and write from HBase

  •    Scala

This project allows to connect Apache Spark to HBase. Currently it is compiled with Scala 2.10 and 2.11, using the versions of Spark and HBase available on CDH5.5. Version 0.6.0 of this project works on CDH5.3, version 0.4.0 works on CDH5.1 and version 0.2.2-SNAPSHOT works on CDH5.0. Other combinations of versions may be made available in the future. This guide assumes you are using SBT. Usage of similar tools like Maven or Leiningen should work with minor differences as well.

node-hbase-client - Asynchronous HBase client for Node.js, pure JavaScript implementation.

  •    Javascript

Asynchronous HBase client for Node.js, pure javascript implementation.

hbase-docker - HBase running in Docker

  •    Shell

This configuration builds a docker container to run HBase (with embedded Zookeeper) running on the files inside the container. The approach here requires editing the local server's /etc/hosts file to add an entry for the container hostname. This is because HBase uses hostnames to pass connection data back out of the container (from it's internal Zookeeper).

cbass - adding "simple" to HBase

  •    Clojure

In this example we are just muting "packing" and "unpacking" relying on the custom serialization being done prior to calling cbass, so the data is a byte array, and deserialization is done after the value is returned from cbass, since it will just return a byte array back in this case (i.e. identity function for both). notice the "pluto", it has no columns, which is also fine.

hbase-mr-pof - A proof of concept prototype of new HBase + Hadoop Map Reduce integration

  •    Scala

A proof of concept prototype of new HBase + Hadoop Map Reduce integration






We have large collection of open source products. Follow the tags from Tag Cloud >>


Open source products are scattered around the web. Please provide information about the open source projects you own / you use. Add Projects.