s4 - Distributed Stream Computing Platform

  •        0

S4 is a general-purpose, distributed, scalable, partially fault-tolerant, pluggable platform that allows programmers to easily develop applications for processing continuous unbounded streams of data. S4 has been deployed in production systems at Yahoo to process thousands of search queries per second.




comments powered by Disqus

Related Projects

Cassandra - Scalable Distributed Database

The Apache Cassandra Project develops a highly scalable second-generation distributed database, bringing together Dynamo's fully distributed design and Bigtable's ColumnFamily-based data model. Cassandra is suitable for applications that can't afford to lose data. Data is automatically replicated to multiple nodes for fault-tolerance.

Katta - Lucene and more in the cloud.

Katta is a scalable, failure tolerant, distributed, data storage for real time access. Katta serves large, replicated, indices as shards to serve high loads and very large data sets. These indices can be of different type. Currently implementations are available for Lucene and Hadoop mapfiles.

Hypertable - A high performance, scalable, distributed storage and processing system for structured

Hypertable is based on Google's Bigtable Design, which is a proven scalable design that powers hundreds of Google services. Many of the current scalable NoSQL database offerings are based on a hash table design which means that the data they manage is not kept physically ordered. Hypertable keeps data physically sorted by a primary key and it is well suited for Analytics.

Akka - Build Concurrent and Scalable Applications

Akka is the platform for the next generation event-driven, scalable and fault-tolerant architectures on the JVM. It helps to write simpler correct concurrent applications using Actors, STM & Transactors. It could scale out on multi-core or multiple nodes using asynchronous message passing. For fault-tolerance it adopts the Let it crash or Embrace failure model to build applications that self-heals, systems that never stop.

Gearman - Application Framework to farm out work to other Machines

Gearman provides a generic application framework to farm out work to other machines or processes that are better suited to do the work. It allows you to do work in parallel by doing load balancing. It also supports to call functions between languages. It is the nervous system for how distributed processing communicates. It is fault tolerant.

S-Space - A scalable software library for semantic spaces

The S-Space Package is a collection of algorithms for building Semantic Spaces as well as a highly-scalable library for designing new distributional semantics algorithms. Distributional algorithms process text corpora and represent the semantic for words as high dimensional feature vectors.

HBase - Hadoop database

HBase provides support to handle BigTable - billions of rows X millions of columns. It is a scalable, distributed, versioned, column-oriented store modeled after Google's Bigtable and runs on top of HDFS (Hadoop Distributed Filesystem). It features compression, in-memory operation per-column. Data could be replicated between the nodes. HBase is used in Facebook and Twitter.

Hyperdex - A Searchable Distributed Key-Value Store

HyperDex is a distributed, searchable key-value store. HyperDex provides a unique search primitive which enables searches over stored values. By design, HyperDex retains the performance of traditional key-value stores while enabling support for the search operation. It is fast, scalable, Consistent, Fault tolerant.

Project-voldemort - A distributed database, Clone of Amazon's Dynamo

Voldemort is a distributed key-value storage system. Data is automatically replicated over multiple servers. Data is automatically partitioned so each server contains only a subset of the total data. Server failure is handled transparently. It is used at LinkedIn for certain high-scalability storage problems where simple functional partitioning is not sufficient.

Infinispan - Key value NOSQL data store and data grid

Infinispan is an extremely scalable, highly available key/value NoSQL datastore and distributed data grid platform. The purpose of Infinispan is to expose a data structure that is highly concurrent, designed ground-up to make the most of modern multi-processor/multi-core architectures while at the same time providing distributed cache capabilities. Infinispan offers enterprise features such as efficient eviction algorithms to control memory usage as well as JTA compatibility.