HPCC System - Hadoop alternative

  •        2456

HPCC is a proven and battle-tested platform for manipulating, transforming, querying and data warehousing Big Data. It supports two type of configuration. Thor is responsible for consuming vast amounts of data, transforming, linking and indexing that data. It functions as a distributed file system with parallel processing power spread across the nodes. Roxie, the Data Delivery Engine, provides separate high-performance online query processing and data warehouse capabilities.

http://hpccsystems.com/

Tags
Implementation
License
Platform

   




Related Projects

Hadoop Common


Apache Hadoop is a framework for running applications on large clusters built of commodity hardware. Hadoop common supports other Hadoop subprojects

glow - Glow is an easy-to-use distributed computation system written in Go, similar to Hadoop Map Reduce, Spark, Flink, Storm, etc


Glow is providing a library to easily compute in parallel threads or distributed to clusters of machines. This is written in pure Go.I am also working on a Go+Luajit system, https://github.com/chrislusf/gleam , which is more flexible and more performant.

RethinkDB - Distributed JSON database


RethinkDB is built to store JSON documents, and scale to multiple machines with very little effort. It has a pleasant query language that supports really useful queries like table joins and group by, and is easy to setup and learn. It supports JSON data model, Distributed joins, subqueries, aggregation, atomic updates, Hadoop-style map/reduce.

gleam - Fast, efficient, and scalable distributed map/reduce system, DAG execution, in memory or on disk, written in pure Go, runs standalone or distributedly


Gleam is a high performance and efficient distributed execution system, and also simple, generic, flexible and easy to customize.Gleam is built in Go, and the user defined computation can be written in Go, Unix pipe tools, or any streaming programs.

Baum-Welch - A distributed, Hadoop based HMM trainer for Apache Mahout machine learning library.


A distributed, Hadoop based HMM trainer for Apache Mahout machine learning library.



cc - Cloud map and reduce for machine learning large jobs


Cloud map and reduce for machine learning large jobs

Apache Storm - Distributed and fault-tolerant realtime computation


Storm is a distributed real time computation system. Storm makes it easy to reliably process unbounded streams of data, doing for real time processing what Hadoop did for batch processing. Storm has many use cases: realtime analytics, online machine learning, continuous computation, distributed RPC, ETL, and more.

Apache Tajo - A big data warehouse system on Hadoop


Apache Tajo is a robust big data relational and distributed data warehouse system for Apache Hadoop. Tajo is designed for low-latency and scalable ad-hoc queries, online aggregation, and ETL (extract-transform-load process) on large-data sets stored on HDFS (Hadoop Distributed File System) and other data sources.

randhindi-PML


Parallel Machine Learning Library in Java. Right now, the only parallelization is for parameter searching, and using local threads, but a hadoop map/reduce approach is planned

Phach-Thesis - distributed, map reduce system


distributed, map reduce system

map-reduce-console - Browser based map-reduce console to quickly prototype hadoop jobs


Browser based map-reduce console to quickly prototype hadoop jobs

hadoop - Hadoop Distributed File System and MapReduce implementation


Hadoop Distributed File System and MapReduce implementation

multiverso - Parameter server framework for distributed machine learning


Multiverso is a parameter server based framework for training machine learning models on big data with numbers of machines. It is currently a standard C++ library and provides a series of friendly programming interfaces. With such easy-to-use APIs, machine learning researchers and practitioners do not need to worry about the system routine issues such as distributed model storage and operation, inter-process and inter-thread communication, multi-threading management, and so on. Instead, they are

Jubatus - Framework and Library for Distributed Online Machine Learning


Jubatus is a distributed processing framework and streaming machine learning library. Jubatus includes these functionalities: Online Machine Learning Library: Classification, Regression, Recommendation (Nearest Neighbor Search), Graph Mining, Anomaly Detection, Clustering, Feature Vector Converter (fv_converter): Data Preprocess and Feature Extraction, Framework for Distributed Online Machine Learning with Fault Tolerance.

Samza - Distributed Stream Processing Framework


Apache Samza is a distributed stream processing framework. It uses Apache Kafka for messaging, and Apache Hadoop YARN to provide fault tolerance, processor isolation, security, and resource management. It provides a very simple call-back based process message API that should be familiar to anyone who's used Map/Reduce. Samza was originally developed at LinkedIn. It's currently used to process tracking data, service log data, and for data ingestion pipelines for realtime services.

hadoop-mr-examples - Polyglot Map Reduce Examples on Hadoop


Polyglot Map Reduce Examples on Hadoop

i2b2-hadoop - A Hadoop Map/Reduce implementation of processing and executing I2B2 CRC Queries


A Hadoop Map/Reduce implementation of processing and executing I2B2 CRC Queries

hello-hadoop - A simple Hadoop Map/Reduce Job (WordCount)


A simple Hadoop Map/Reduce Job (WordCount)

knowing-hadoop - Hadoop Map-Reduce Job for Knowing


Hadoop Map-Reduce Job for Knowing

hadoop-streaming - [ABANDONED] Support libraries for writing Hadoop Streaming-compatible map/reduce tasks


[ABANDONED] Support libraries for writing Hadoop Streaming-compatible map/reduce tasks.