Apache Storm

Apache Storm
	Distributed and fault-tolerant realtime computation
Developer(s)	Backtype, Twitter
Stable release	2.5.0 / 4 August 2023[1]
Repository	Storm Repository
Written in	Clojure & Java
Operating system	Cross-platform
Type	Distributed stream processing
License	Apache License 2.0
Website	storm.apache.org

Apache Storm is a distributed stream processing computation framework written predominantly in the Clojure programming language. Originally created by Nathan Marz[2] and team at BackType,[3] the project was open sourced after being acquired by Twitter.[4] It uses custom created "spouts" and "bolts" to define information sources and manipulations to allow batch, distributed processing of streaming data. The initial release was on 17 September 2011.[5]

A Storm application is designed as a "topology" in the shape of a directed acyclic graph (DAG) with spouts and bolts acting as the graph vertices. Edges on the graph are named streams and direct data from one node to another. Together, the topology acts as a data transformation pipeline. At a superficial level the general topology structure is similar to a MapReduce job, with the main difference being that data is processed in real time as opposed to in individual batches. Additionally, Storm topologies run indefinitely until killed, while a MapReduce job DAG must eventually end.[6]

Storm became an Apache Top-Level Project in September 2014[7] and was previously in incubation since September 2013.[8][9]

Development

Apache Storm is developed under the Apache License, making it available to most companies to use.[10] Git is used for version control and Atlassian JIRA for issue tracking, under the Apache Incubator program.

Major Releases[11]
Version	Release Date
2.5.0	4 Aug 2023
2.4.0	25 March 2022
2.3.0	27 September 2021
2.2.0	30 June 2020
2.1.0	6 September 2019
1.2.3	18 July 2019
2.0.0	30 May 2019
1.1.4	8 January 2019
1.2.2	4 June 2018
1.1.3	4 June 2018
1.0.7	3 May 2018
1.2.1	19 February 2018
1.2.0	15 February 2018
1.1.2	15 February 2018
1.0.6	14 February 2018
1.0.5	15 September 2017
1.1.1	1 August 2017
1.0.4	28 July 2017
1.1.0	29 Mar 2017
1.0.3	14 February 2017
0.10.2	14 September 2016
0.9.7	7 September 2016
1.0.2	10 August 2016
1.0.1	6 May 2016
0.10.1	5 May 2016
1.0.0	12 April 2016
0.10.0	5 November 2015
0.9.6	5 November 2015
0.9.5	4 June 2015
0.9.4	25 March 2015
0.9.3	25 November 2014
0.9.2	25 June 2014
0.9.1	10 February 2014
Historical (non-Apache) Version	Release Date
0.9.0	8 December 2013
0.8.2	11 January 2013
0.8.1	6 September 2012
0.8.0	2 August 2012
0.7.0	28 February 2012
0.6.0	15 December 2011
0.5.0	19 September 2011

Apache Storm architecture

The Apache Storm cluster comprises following critical components:

Nodes- There are two types of nodes: Master Nodes and Worker Nodes. A Master Node executes a daemon Nimbus which assigns tasks to machines and monitors their performances. On the other hand, a Worker Node runs the daemon called Supervisor which assigns the tasks to other worker nodes and operates them as per the need. As Storm cannot monitor the state and health of cluster, it deploys ZooKeeper to solve this issue which connects Nimbus with the Supervisors.
Components- Storm has three critical components: Topology, Stream, and Spout. Topology is a network made of Stream and Spout. Stream is an unbounded pipeline of tuples and Spout is the source of the data streams which converts the data into the tuple of streams and sends to the bolts to be processed.[12]

Peer platforms

Storm is but one of dozens of stream processing engines, for a more complete list see Stream processing. Twitter announced Heron on June 2, 2015[13] which is API compatible with Storm. There are other comparable streaming data engines such as Spark Streaming and Flink.[14]

References

"Apache Storm 2.5.0 Released". Retrieved 4 August 2023.
Marz, Nathan. "About Nathan Marz". Nathan Marz. Retrieved 28 March 2013.
"BackType Website (defunct)". BackType. Retrieved 28 March 2013.
"A Storm is coming: more details and plans for release". Engineering Blog. Twitter Inc. Retrieved 29 July 2015.
"Storm Codebase". Github. Retrieved 8 February 2013.
"Tutorial - Components of a Storm cluster". Documentation. Apache Storm. Retrieved 29 July 2015.
"Apache Storm Graduates to a Top-Level Project".
"Storm Project Incubation Status". Apache Software Foundation. Retrieved 29 October 2013.
"Storm Proposal". Apache Software Foundation. Retrieved 29 October 2013.
"Powered By Storm". Documentation. Apache Storm. Retrieved 29 July 2015.
"Apache Storm". storm.apache.org. Retrieved 18 August 2017.
"STREAM PROCESSING BIG DATA PROCESSING" (PDF).
"Flying faster with Twitter Heron". Engineering Blog. Twitter Inc. Retrieved 3 June 2015.
Chintapalli, Sanket; Dagit, Derek; Evans, Bobby; Farivar, Reza; Graves, Thomas; Holderbaugh, Mark; Liu, Zhuo; Nusbaum, Kyle; Patil, Kishorkumar; Peng, Boyang Jerry; Poulosky, Paul (May 2016). "Benchmarking Streaming Computation Engines: Storm, Flink and Spark Streaming". 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW). IEEE. pp. 1789–1792. doi:10.1109/IPDPSW.2016.138. ISBN 978-1-5090-3682-0. S2CID 2180634.

External links

Project Homepage

This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.

[1] "Apache Storm 2.5.0 Released". Retrieved 4 August 2023.

[2] Marz, Nathan. "About Nathan Marz". Nathan Marz. Retrieved 28 March 2013.

[3] "BackType Website (defunct)". BackType. Retrieved 28 March 2013.

[4] "A Storm is coming: more details and plans for release". Engineering Blog. Twitter Inc. Retrieved 29 July 2015.

[5] "Storm Codebase". Github. Retrieved 8 February 2013.

[6] "Tutorial - Components of a Storm cluster". Documentation. Apache Storm. Retrieved 29 July 2015.

[7] "Apache Storm Graduates to a Top-Level Project".

[8] "Storm Project Incubation Status". Apache Software Foundation. Retrieved 29 October 2013.

[9] "Storm Proposal". Apache Software Foundation. Retrieved 29 October 2013.

[10] "Powered By Storm". Documentation. Apache Storm. Retrieved 29 July 2015.

[11] "Apache Storm". storm.apache.org. Retrieved 18 August 2017.

[12] "STREAM PROCESSING BIG DATA PROCESSING" (PDF).

[13] "Flying faster with Twitter Heron". Engineering Blog. Twitter Inc. Retrieved 3 June 2015.

[14] Chintapalli, Sanket; Dagit, Derek; Evans, Bobby; Farivar, Reza; Graves, Thomas; Holderbaugh, Mark; Liu, Zhuo; Nusbaum, Kyle; Patil, Kishorkumar; Peng, Boyang Jerry; Poulosky, Paul (May 2016). "Benchmarking Streaming Computation Engines: Storm, Flink and Spark Streaming". 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW). IEEE. pp. 1789–1792. doi:10.1109/IPDPSW.2016.138. ISBN 978-1-5090-3682-0. S2CID 2180634.

The Apache Software Foundation
Top-level projects	Accumulo ActiveMQ AGE Airflow Ambari Ant Aries Arrow Apache HTTP Server APR Avro Axis Axis2 Beam Bloodhound Brooklyn Buildr Calcite Camel CarbonData Cassandra Cayenne Chemistry CloudStack Cocoon Cordova CouchDB cTAKES CXF Derby Directory Drill Druid Empire-db Felix Flex Flink Flume FreeMarker Geronimo Groovy Guacamole Gump Hadoop HBase Helix Hive Impala Jackrabbit James Jena Jini JMeter Kafka Kudu Kylin Lucene Mahout Maven MINA mod_perl MyFaces NiFi NetBeans Nutch NuttX OFBiz Oozie OpenEJB OpenJPA OpenNLP OрenOffice ORC PDFBox Parquet Phoenix POI Pig Pinot Pivot Qpid Roller RocketMQ Samza ServiceMix Shiro SINGA Sling Solr Spark Storm SpamAssassin Struts 1 Struts 2 Subversion Superset SystemDS Tapestry Thrift Tika TinkerPop Tomcat Trafodion Traffic Server UIMA Velocity Wicket Xalan Xerces XMLBeans Yetus ZooKeeper
Commons	BCEL BSF Daemon Jelly Logging
Incubator	MXNet Taverna
Other projects	Batik Chainsaw FOP Ivy Log4j
Attic	Abdera Apex AxKit Beehive Bluesky iBATIS C++ Standard Library Cactus Click Continuum Deltacloud Etch Excalibur Forrest Giraph Hama Harmony HiveMind Jakarta Lenya Marmotta ODE Shale Shindig Slide Sqoop Stanbol Tuscany Wave Wink XML
Licenses	Apache License
Category

Parallel computing
General	Distributed computing Parallel computing Massively parallel Cloud computing High-performance computing Multiprocessing Manycore processor GPGPU Computer network Systolic array
Levels	Bit Instruction Thread Task Data Memory Loop Pipeline
Multithreading	Temporal Simultaneous (SMT) Speculative (SpMT) Preemptive Cooperative Clustered multi-thread (CMT) Hardware scout
Theory	PRAM model PEM model Analysis of parallel algorithms Amdahl's law Gustafson's law Cost efficiency Karp–Flatt metric Slowdown Speedup
Elements	Process Thread Fiber Instruction window Array
Coordination	Multiprocessing Memory coherence Cache coherence Cache invalidation Barrier Synchronization Application checkpointing
Programming	Stream processing Dataflow programming Models Implicit parallelism Explicit parallelism Concurrency Non-blocking algorithm
Hardware	Flynn's taxonomy SISD SIMD Array processing (SIMT) Pipelined processing Associative processing MISD MIMD Dataflow architecture Pipelined processor Superscalar processor Vector processor Multiprocessor symmetric asymmetric Memory shared distributed distributed shared UMA NUMA COMA Massively parallel computer Computer cluster Beowulf cluster Grid computer Hardware acceleration
APIs	Ateji PX Boost Chapel HPX Charm++ Cilk Coarray Fortran CUDA Dryad C++ AMP Global Arrays GPUOpen MPI OpenMP OpenCL OpenHMPP OpenACC Parallel Extensions PVM pthreads RaftLib ROCm UPC TBB ZPL
Problems	Automatic parallelization Deadlock Deterministic algorithm Embarrassingly parallel Parallel slowdown Race condition Software lockout Scalability Starvation
Category: Parallel computing