Open source distributed realtime computation system

©Cid Mirandahttps://www.flickr.com/photos/cidska/

Storm is a distributed, real-time computational framework that makes processing unbounded streams of data easy.

Storm can be integrated with your existing queuing and persistence technologies, consuming streams of data and processing/transforming these streams in many ways.

Topology

A Storm topology is a graph of computation where the nodes represent some individual computations and the edges represent the data being passed between nodes. We then feed data into this graph of computation in order to achieve some goal.

Tuple

The nodes in our topology send data between one another in the form of tuples. A tuple is an ordered list of values, where each value is assigned a name. A node can create and then (optionally) send tuples to any number of nodes in the graph. The process of sending a tuple to be handled by any number of nodes is called emitting a tuple.

Stream

A stream is an unbounded sequence of tuples between two nodes in the topology.

Spout

A spout is the source of a stream in the topology. Spouts normally read data from an external data source and emit tuples into the topology. Spouts can listen to message queues for incoming messages, listen to a database for changes, or listen to any other source of a feed of data.

Bolt

Unlike a spout, whose sole purpose is to listen to a stream of data, a bolt accepts a tuple from its input stream, performs some computation or transformation — filtering, aggregation, or a join, perhaps — on that tuple, and then optionally emits a new tuple (or tuples) to its output stream(s).

Stream grouping

Shuffle grouping

The stream between our spout and first bolt uses a shuffle grouping. A shuffle grouping is a type of stream grouping where tuples are emitted to instances of bolts at random

Fields grouping

The stream between our spout and first bolt uses a shuffle grouping. A shuffle grouping is a type of stream grouping where tuples are emitted to instances of bolts at random

The Storm cluster

A Storm cluster consists of two types of nodes: the master node and the worker nodes. A master node runs a daemon called Nimbus, and the worker nodes each run a daemon called a Supervisor. The master node can be thought of as the control center, this is where you’d run any of the commands — such as activate, deactivate, rebalance, or kill — available in a Storm cluster. The worker nodes are where the logic in the spouts and bolts is executed.

Another big part of a Storm cluster is Zookeeper. Storm relies on Apache Zookeeper1 for coordinating communication between Nimbus and the Supervisors. Any state needed to coordinate between Nimbus and the Supervisors is kept in Zookeeper. As a result, if Nimbus or a Supervisor goes down, once it comes back up it can recover state from Zookeeper, keeping the Storm cluster running as if nothing happened.

Programmer, musician, photographer. For a collaborative and inclusive world.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store