ZooKeeper - Mandatory for distributed systems

ZooKeeper is a centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services. All of these kinds of services are used in some form or another by distributed applications.

The name space provided by ZooKeeper is much like that of a standard file system. A name is a sequence of path elements separated by a slash (/). Every node in ZooKeeper's name space is identified by a path. ZooKeeper supports the concept of watches. Clients can set a watch on a znodes. A watch will be triggered and removed when the znode changes. It supports replication.

In case of distributed system, there may be multiple servers interacting with each other. Few may need to share some common resources. Consider a scenario, where a application server wants to know the list of database servers, then it could communicate the Zookeeper server and fetch the required information.



http://hadoop.apache.org/zookeeper/


Bookmark and Share          2488



comments powered by Disqus


Related Products

Sqoop - Transfers data between Hadoop and Datastores

Apache Sqoop is a tool designed for efficiently transferring bulk data between Apache Hadoop and structured datastores such as relational databases. You can use Sqoop to import data from external structured datastores into Hadoop Distributed File System or related systems like Hive and HBase. Conversely, Sqoop can be used to extract data from Hadoop and export it to external structured datastores such as relational databases and enterprise data warehouses.

Read more

Hadoop Common

Apache Hadoop is a framework for running applications on large clusters built of commodity hardware. Hadoop common supports other Hadoop subprojects

Read more

HBase - Hadoop database

HBase provides support to handle BigTable - billions of rows X millions of columns. It is a scalable, distributed, versioned, column-oriented store modeled after Google's Bigtable and runs on top of HDFS (Hadoop Distributed Filesystem). It features compression, in-memory operation per-column. Data could be replicated between the nodes. HBase is used in Facebook and Twitter.

Read more

Service Repository - Service Directory for SOA Services

A distributed "no single point of failure" service directory of SOA services which allow clients to easily manage(find, use, publish, monitor, deploy) their services. By using it you increase the number of hot deployments, decrease server load on your services, increase availability of your service response times by using it's internal auto load balancing mechanism.

Read more

Katta - Lucene and more in the cloud.

Katta is a scalable, failure tolerant, distributed, data storage for real time access. Katta serves large, replicated, indices as shards to serve high loads and very large data sets. These indices can be of different type. Currently implementations are available for Lucene and Hadoop mapfiles.

Read more

HPCC System - Hadoop alternative

HPCC is a proven and battle-tested platform for manipulating, transforming, querying and data warehousing Big Data. It supports two type of configuration. Thor is responsible for consuming vast amounts of data, transforming, linking and indexing that data. It functions as a distributed file system with parallel processing power spread across the nodes. Roxie, the Data Delivery Engine, provides separate high-performance online query processing and data warehouse capabilities.

Read more

OpenDS - Next generation Directory Service

OpenDS is an community project, building a free and comprehensive next generation directory service. OpenDS is designed to address large deployments, to provide high performance, to be highly extensible, and to be easy to deploy, manage and monitor. The OpenDS directory service will ultimately include not just the Directory Server, but also other essential directory-related services like directory proxy, virtual directory, namespace distribution and data synchronization.

Read more

Kafka - A high-throughput distributed messaging system

Kafka provides a publish-subscribe solution that can handle all activity stream data and processing on a consumer-scale web site. This kind of activity (page views, searches, and other user actions) are a key ingredient in many of the social feature on the modern web. This data is typically handled by "logging" and ad hoc log aggregation solutions due to the throughput requirements. This kind of ad hoc solution is a viable solution to providing logging data to Hadoop.

Read more

VoltDB - Fast Scalable SQL DBMS with ACID

VoltDB was specifically designed for contemporary software applications that are pushed beyond their limits by high volume data sources. VoltDB provides the ability to capture, store and process incoming data at millions of read/write operations per second. And VoltDB’s relational model opens that data to be analyzed in real-time, using familiar Business Intelligence tools, to identify data patterns and trends, spot anomalies, or perform tracking and alerting.

Read more

SenseiDB - Search engine used in LinkedIn

Sensei is a distributed data system that was built to support many product initiatives at LinkedIn, including the real-time faceted search in LinkedIn Signal and the news feed and tabs on the Homepage. Sensei is both a search engine and a database. It is designed to query and navigate through documents that consist of unstructured text and well-formed and structured metadata.

Read more




Follow feeds Follow bestopensource on Twitter Follow bestopensource on Facebook

Enter your email address:

Delivered by FeedBurner



Open source products are scattered around the web. Please provide information about the open source projects you own / you use. Add Projects.