No relevant resource is found in the selected language.

This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies. Read our privacy policy>Search


To have a better experience, please upgrade your IE browser.


FusionInsight HD 6.5.0 Product Description 02

Rate and give feedback :
Huawei uses machine translation combined with human proofreading to translate this document to different languages in order to help you better understand the content of this document. Note: Even the most advanced machine translation cannot match the quality of professional translators. Huawei shall not bear any responsibility for translation accuracy and it is recommended that you refer to the English document (a link for which has been provided).


Basic Concept


Storm is a distributed, reliable, and fault-tolerant real-time computing system based on the open-source Apache Storm. It is used to process data streams of a massive scale on a real-time basis. Apache Storm applies to real-time analysis, continuous computation, and distributed Extract, Transform, and Load (ETL). Storm has the following features:

  • Wide applications
  • Scalable
  • Free from data loss
  • Fault-tolerant
  • Easy to construct and control
  • Multi-language support

Storm is a computing platform and provides Continuous Query Language (CQL) in the service layer to facilitate service implementation. CQL has the following features:

  • Easy to use: The CQL syntax is similar to the SQL syntax. Users who have basic knowledge of SQL can easily learn CQL and use CQL to develop services.
  • Rich functions: In addition to basic expressions provided by SQL, CQL provides functions, such as windows, filtering, and concurrency setting, for steam processing.
  • Easy to scale: CQL provides an extension interface to supporting increasingly complex service scenarios. Users can customize the input, output, serialization, and deserialization to meet specific service requirements.
  • Easy to debug: CQL provides detailed explanation of error codes, facilitating users to rectify faults.

The Storm service consists of the active and standby Nimbus processes, UI processes, and multiple Supervisor processes, as shown in Figure 2-109.

Figure 2-109 Storm architecture
Table 2-22 Storm modules




Control nodes of the Storm service. The Storm service in HA mode has one active Nimbus and one standby Nimbus.

  • The active Nimbus receives tasks submitted by clients and dispatches tasks to Supervisors. It also monitors status of other processes.
  • The standby Nimbus takes over services from the active Nimbus if the active Nimbus is faulty.


Monitors and receives tasks allocated by Nimbus, and start or stop Workers based on actual situation. Worker is a process running specific service logic. Each Worker is a JVM process.


Storm service monitoring interface, through which the system running topology can be viewed.


Provides distributed coordination services for processes of Storm. Active and standby Nimbus, Supervisors, and Workers are registered with ZooKeeper so that Nimbus can obtain the health status of each process.


The log viewing page of the Storm service processes is used to view the log information about worker processes.

  • Basic Concepts
    Table 2-23 Concepts




    A Tuple is an invariable Key-Value pair used to transfer data. Tuples are created and processed in distributed manner.


    A Stream is an unbounded sequence of Tuples.


    A Topology is a real-time application running on the Storm platform. It is a Directed Acyclic Graph (DAG) composed of components. A Topology can concurrently run on multiple machines. Each machine runs a part of the DAG. A Topology is similar to a MapReduce job. The difference is that Topology is a resident program. Once started, a Topology cannot stop unless it is manually terminated.


    A Spout is a source of Tuples. For example, a Spout may read data from a message queue, database, file system, or TCP connection and emit them as Tuples, which are processed by the next component.


    A Bolt is a component that receives data and executes specific logic, such as filtering or converting tuples, joining or aggregating streams, and performing statistics and result persistence.


    A Worker is a physical processing in running state in a Topology. Each Worker is a JVM process. Each Topology may be executed by multiple Workers. Each Worker executes a logic subset of the Topology.


    A Task is a Spout or Bolt thread of a Worker.

    Stream groupings

    A Stream grouping specifies the Tuple dispatching policies. It instructs the subsequent Bolt how to receive tuples. The supported policies include Shuffle Grouping, Fields Grouping, All Grouping, Global Grouping, Non Grouping, and Directed Grouping.

    Figure 2-110 shows a Topology (DAG) consisting of a Spout and Bolts. In Figure 2-110, a rectangle indicates a Spout or Bolt, the nodes in each rectangle indicate Tasks, and the lines between Tasks indicate Streams.

    Figure 2-110 Topology
  • Reliability

    Storm provides three levels of data reliability:

    • At most once: The processed data may be lost, but it cannot be processed repeatedly. This reliability level offers the highest throughput.
    • At least once: Data may be processed repeatedly to ensure reliable data transmission. If a response is not received within the specified time, the Spout resends the data for processing. This reliability level may slightly affect system performance.
    • Exactly-once: Data is successfully transmitted without loss or redundancy processing. This reliability level offers the poorest performance.

      Select the reliability level based on service requirements. For example, for the services requiring high data reliability, use the exactly-once level to ensure that data is processed only once. For the services insensitive to data loss, use other levels to improve system performance.

  • Fault Tolerance

    Storm is a fault-tolerant system that offers high availability. Table 2-24 describes the fault tolerance of the Storm components.

    Table 2-24 Fault tolerance



    Nimbus failed

    Nimbus is fail-fast and stateless. If the active Nimbus is faulty, the standby Nimbus takes over services immediately.

    Supervisor failed

    Supervisor is a background daemon of Workers. It is fail-fast and stateless. If a Supervisor is faulty, the Workers running on the node are not affected but cannot receive new tasks. The OMS can detect the fault of Supervisors and restart the processes.

    Worker failed

    If a Worker is faulty, the Supervisor will restart the Worker. If the Worker fails to start after multiple attempts, Nimbus will assign the task to another node.

    Node failed

    If a node is faulty, all the tasks being processed by the node time out and Nimbus will assign the tasks to another node for processing.

Open-source Features
  • Distributed Real-time Computing Framework

    In a Storm cluster, each machine supports the running of multiple work processes and each work process can create multiple threads. Each thread can execute multiple tasks. A task indicates concurrent data processing.

  • High Fault Tolerance

    During message processing, if a node or a process is faulty, the message processing unit can be redeployed.

  • Reliable Messages

    Data processing methods including At-Least Once, At-Most Once, and Exactly Once are supported.

  • Security Mechanism

    Storm provides Kerberos-based authentication and pluggable authorization mechanisms, supports SSL Storm UI and Log Viewer UI, and supports security integration with other big data platform components (such as ZooKeeper and HDFS).

  • Flexible Topology Definition and Deployment

    The Flux framework is used to define and deploy service topologies. If the service DAG is changed, users only need to modify YAML domain-specific language (DSL), but do not need to recompile and package service code.

  • Integration with External Components

    Storm supports integration with multiple external components such as Kafka, HDFS, HBase, Redis, and JDBC/RDBMS, implementing services that involve multiple data sources.

Relationship with Other Components

Storm provides a real-time distributed computing framework. It can obtain real-time messages from data sources (such as Kafka and TCP connection), perform high-throughput and low-latency real-time computing on a real-time platform, and export results to message queues or implement data persistence. Figure 2-111 shows the relationship between Storm and other components.

Figure 2-111 Relationship with other components
Relationship Between Storm and Streaming

Both Storm and Streaming use the open source Apache Storm kernel. However, the kernel version used by Storm is 1.0.2 whereas that used by Streaming is 0.10.0. Generally, if Streaming has been deployed and services are running on it, continue using Streaming after an upgrade. Storm is recommended in a new cluster.

New features of Storm 1.0.2:

  • Distributed cache: provides external resources required for CLI tool sharing and topology update without repacking and deploying the topology again.
  • Native Streaming Window API: provides windows0based APIs.

  • Resource scheduler: adds resources-based scheduler plug-ins so that users can specify the maximum resources that can be used when they define the topology. Additionally, users' resource quota can be specified during configuration to enable users' topology resource management.

  • State Management: provides a Bolt interface with the checkpoint mechanism. When an event fails, Storm automatically manages Bolt status and rectify the fault.

  • Message sampling and debugging: enables and disables topology-level or component-level debugging on the Storm UI and outputs the stream message sampling rate to specified logs.

  • Dynamic Worker analysis: collects Jstack and Heap logs of the Worker process on the Storm UI and restarts the Worker process.

  • Dynamic adjustment of the topology log levels: supports dynamic log modifications of running topologies by using the CLI and through the Storm UI.

  • Better performance: Storm performance is dramatically improved compared with previous versions. Despite the fact that topology performance greatly depends on case scenarios and external services. In most cases, performance is improved by 3 fold.
Updated: 2019-05-17

Document ID: EDOC1100074548

Views: 3064

Downloads: 35

Average rating:
This Document Applies to these Products
Related Documents
Related Version
Previous Next