Storm - Distributed and fault-tolerant realtime computation

  •        0

Storm is a distributed real time computation system. Storm makes it easy to reliably process unbounded streams of data, doing for real time processing what Hadoop did for batch processing. Storm has many use cases: realtime analytics, online machine learning, continuous computation, distributed RPC, ETL, and more.

Storm integrates with the queueing and database technologies you already use. A Storm topology consumes streams of data and processes those streams in arbitrarily complex ways, repartitioning the streams between each stage of the computation however needed.

http://storm-project.net/

Tags
Implementation
License
Platform

   




Related Projects

storm-example-projects


Storm is real time computing system which supports fault-tolerance, horizontal scalability and guaranteed message processing with amazing performance. Here is the library of sample projects which is essentially exposing reusable bolts for real time computation.

storm-on-dotcloud - Easily deploy Storm, the real-time computation system, on dotCloud


Easily deploy Storm, the real-time computation system, on dotCloud

snappydata - SnappyData: OLTP + OLAP Database built on Apache Spark


SnappyData is a distributed in-memory data store for real-time operational analytics, delivering stream analytics, OLTP (online transaction processing) and OLAP (online analytical processing) in a single integrated cluster. We realize this platform through a seamless integration of Apache Spark (as a big data computational engine) with GemFire XD (as an in-memory transactional store with scale-out SQL semantics).

Pinot - A realtime distributed OLAP datastore


Pinot is a realtime distributed OLAP datastore, which is used at LinkedIn to deliver scalable real time analytics with low latency. It can ingest data from offline data sources (such as Hadoop and flat files) as well as online sources (such as Kafka). Pinot is designed to scale horizontally, so that it can scale to larger data sets and higher query rates as needed.

oryx - Simple real-time large-scale machine learning infrastructure.


<img align="right" src="https://raw.github.com/wiki/cloudera/oryx/OryxLogoSmall.png"/>The Oryx open source project provides simple, real-time large-scale machine learning /predictive analytics infrastructure. It implements a few classes of algorithm commonly used in business applications:*collaborative filtering / recommendation*, *classification / regression*, and *clustering*.It can continuously build models from a stream of data at large scale using[Apache Hadoop](http://hadoop.apache.org/).

Druid IO - Real Time Exploratory Analytics on Large Datasets


Druid is a distributed, column-oriented, real-time analytics data store that is commonly used to power exploratory dashboards in multi-tenant environments. Druid excels as a data warehousing solution for fast aggregate queries on petabyte sized data sets. Druid supports a variety of flexible filters, exact calculations, approximate algorithms, and other useful calculations. Druid can load both streaming and batch data.

LogEventsProcessing - real time log event processing using storm, kafka, logstash & cassandra


real time log event processing using storm, kafka, logstash & cassandra

storm


Distributed and fault-tolerant realtime computation: stream processing, continuous computation, distributed RPC, and more

protosoup - Python based distributed real-time computation system.


Python based distributed real-time computation system.

binos-yarn - deploy distributed real-time computation system on Hadoop-Yarn


deploy distributed real-time computation system on Hadoop-Yarn

disruptor - A distributed real-time computation platform in node.js.


A distributed real-time computation platform in node.js.

VoltDB - Fast Scalable SQL DBMS with ACID


VoltDB was specifically designed for contemporary software applications that are pushed beyond their limits by high volume data sources. VoltDB provides the ability to capture, store and process incoming data at millions of read/write operations per second. And VoltDB’s relational model opens that data to be analyzed in real-time, using familiar Business Intelligence tools, to identify data patterns and trends, spot anomalies, or perform tracking and alerting.

Watch-Barcelona-v-Real-Madrid-Live-Stream-Online-Spanish-Super-Cup-2012


Watch Barcelonav Real Madrid Live Hello Sports Viewer, Welcome to Watch Barcelona vs Real Madrid Live Stream Online. This Live Stream Online game will held on 20:30 GMT, Thursday, Aug 23, 2012. Enjoy Live Spanish Super Cup 2012, Don't miss to watch this game. Watch Barcelona vs Real Madrid Live games online via on LIVE SPORTS TV. This Live Streaming Link lead you to get Every Live Sports HD Video. You can enjoy this live telecast on your PC, Laptop, Notebook. Just Go the live streaming link and

Watch-Barcelona-vs-Real-Madrid-Live-HD-Stream-Online-Spanish-League--La-Liga


Watch and Enjoy Barcelona vs Real Madrid Live Stream Spanish League ‘La Liga Video TV Broadcast Link ? Spanish League ‘La Liga will be kick off this Saturday with Barcelona vs Real Madrid live. Watch all the soccer/football game live stream, scores, highlights, news preview online from this web site. We provide 100% HD quality online streaming to watch all the sports events. So don’t miss today’s hot and exciting Barcelona vs Real Madrid Spanish League ‘La Liga fooball match. Just follow our liv

monasca-thresh - Monasca Thresholding Engine


Computes thresholds on metrics and publishes alarms to Kafka when exceeded. The current state is also saved in the MySQL database.Based on Apache Storm, a free and open distributed real-time computation system. Also uses Apache Kafka, a high-throughput distributed messaging system.

netdata - Get control of your servers. Simple. Effective. Awesome! https://my-netdata.io/


netdata is a system for distributed real-time performance and health monitoring. It provides unparalleled insights, in real-time, of everything happening on the system it runs (including applications such as web and database servers), using modern interactive web dashboards.netdata is fast and efficient, designed to permanently run on all systems (physical & virtual servers, containers, IoT devices), without disrupting their core function.

yahoo-s4 - yahoo-s4 is the real time distributed stream data process system


yahoo-s4 is the real time distributed stream data process system

Kudu - Hadoop storage layer to enable fast analytics on fast data


Kudu is a storage system for tables of structured data. Kudu provides a combination of fast inserts/updates and efficient columnar scans to enable multiple real-time analytic workloads across a single storage layer. As a new complement to HDFS and Apache HBase, Kudu gives architects the flexibility to address a wider variety of use cases without exotic workarounds.

real-time - Real-Time Google Analytics counter in the title bar


Real-Time Google Analytics counter in the title bar

SignalR - Incredibly simple real-time web for .NET


ASP.NET SignalR is a library for ASP.NET developers that makes it incredibly simple to add real-time web functionality to your applications. It's the ability to have your server-side code push content to the connected clients as it happens, in real-time. SignalR also provides a very simple, high-level API for doing server to client RPC (call JavaScript functions in your clients' browsers from server-side .NET code) in your ASP.NET application.