No relevant resource is found in the selected language.

This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies. Read our privacy policy>Search

Reminder

To have a better experience, please upgrade your IE browser.

upgrade

FusionInsight HD 6.5.0 Product Description 02

Rate and give feedback:
Huawei uses machine translation combined with human proofreading to translate this document to different languages in order to help you better understand the content of this document. Note: Even the most advanced machine translation cannot match the quality of professional translators. Huawei shall not bear any responsibility for translation accuracy and it is recommended that you refer to the English document (a link for which has been provided).
Rolling Restart, Upgrade, and Patch

Rolling Restart, Upgrade, and Patch

Rolling Restart

Rolling restart refers that after the software of a service or role instance is updated or the configuration is modified in a cluster, related objects are restarted without interrupting services.

Conventional common restart (restarting all instances simultaneously) interrupts services. Rolling restart adopts different restart policies for different instance running features to ensure service continuity. However, rolling restart takes a long time and exerts an impact on the throughput and performance of corresponding services.

NOTE:

Before performing a rolling restart on an instance, ensure that the internal and external interfaces are compatible before and after the restart. If the interfaces are incompatible after a major version update, only the common restart can be performed.

  • Rolling restart policy for active and standby instances

    For roles that support high availability (HA), such as the HDFS NameNode, perform a rolling restart on the standby instance first, manually trigger an active/standby switchover, and then restart the original active instance after the switchover.

  • Rolling restart policy for the Leader instance

    Each instance of a role is divided into a Leader node and multiple Follower nodes. Therefore, the services are not interrupted when an instance is restarted. In this case, restart all instances one by one. The Leader instance is restarted at last.

  • Concurrent rolling restart policy for batch instances

    In a role, m (m ≥ 1) instances are restarted concurrently in rolling mode in each batch to ensure service continuity. This policy applies to roles whose instances' functions are the same.

    For example, if you restart one HDFS ZKFC once, the service is not interrupted. Therefore, this policy can be used and the concurrent value is 1.

  • Dynamic policy

    During RegionServer rolling restart, set the number of concurrences in each batch based on the number of instances deployed in RegionServer.

  • Concurrent rack rolling restart policy

    This policy applies to roles that supports the rack awareness function (such as HDFS DataNode) and whose instances belong to two or more racks. Therefore, services are not interrupted when a rack is restarted. When roles meet the preceding conditions, all corresponding instances in each rack are restarted concurrently.

    If each rack contains many instances, divide sub-batches based on the maximum number of concurrent instances configured in the rack policy.

Rolling Upgrade

Rolling upgrade is an online upgrade mode. During the upgrade process, service interruption interval can be minimized.

Components that support rolling upgrade can provide all or part of their services. Component services that do not support rolling upgrade are interrupted during the upgrade process. Compared with the offline upgrade mode, rolling upgrade can ensure that part of services are available during product upgrade.

For rolling upgrade operations and precautions of each service, see corresponding upgrade guide.

Rolling Patch

Rolling patch indicates that patches are installed on one component or multiple components in a cluster without interrupting the services or with a minimized service interruption interval.

Components in a cluster are divided into the following three types based on whether they support rolling patch:

  • Components that support rolling patch
  • Components that do not support rolling patch
  • Components of which some roles support rolling patch

For components that support rolling patch, all services or part of their services (differences exist among components) are not interrupted during patch installation. For components that do not support rolling patch, their services are interrupted during patch installation. For components of which some roles support rolling patch, part of their services is interrupted during patch installation.

For rolling upgrade operations and precautions of each service, see corresponding upgrade guide.

Impact of Rolling Restart on the System

Both rolling upgrade and rolling patch depend on rolling restart. The impact of rolling restart on the system also applies to rolling upgrade and rolling patch.

The following table describes the impact of rolling restart on various services in 6.5.0 (KrbServer, LdapServer, and DBService are internal services of a cluster and are not described in the table).

Table 4-6 Services affected during the rolling restart

Component

Service That Is Not Interrupted

Affected Service

ZooKeeper

ZooKeeper's read and write services are normal.

None

HDFS

An active/standby switchover is triggered for NameNodes. During the switching, no active NameNode exists temporarily. As a result, the system may report an alarm indicating that the HDFS service is unavailable, and running read/write tasks may report exceptions. However, services are not interrupted.

None

Yarn

An active/standby switchover is triggered for ResourceManager nodes. Running tasks may report exceptions, but services are not interrupted.

None

HBase

HBase's read and write services are normal.

  • Real-time read/write services (not including BulkLoad) of HMaster are normal. Other services are affected.
  • Creating a table (create)
  • Creating a namespace (create_namespace)
  • Disabling a table (disable, disable_all)
  • Recreating a table (truncate, truncate_preserve)
  • Moving a region (move)
  • Getting a region offline (unassign, close_region)
  • Combing regions (merge_region)
  • Splitting a region (split)
  • Enabling balance (balance_switch)
  • Disaster recovery operations (add_peer, remove_peer, enable_table_replication, disable_peer, show_peer_tableCFs, set_peer_tableCFs, enable_peer, disable_table_replication, set_clusterState_active, set_clusterState_standby)
  • Restoration (restore)
  • Querying the cluster status (status)

Spark/Spark2x

Except the listed items, other services are not affected.

  • When HBase is restarted, you cannot create or delete Spark on HBase tables in Spark.
  • When HBase is restarted, an active/standby switchover is triggered for HMaster. During the switching, the Spark on HBase function is unavailable.
  • If you have used the advanced API of Kafka, interruption may occur when Spark reads/writes data from/to Kafka during the rolling restart, and data may be lost.

Kafka

Kafka's read and write services are normal.

  • Topics or partitions cannot be added, deleted, or modified.
  • If acks is set to 1 or 0 in Producer and the data of the copy is not synchronized within 30 minutes during rolling restart, the next Broker will be forcibly restarted. For a dual-copy Partition whose copies are in the two Brokers that are started consecutively, if unclean.leader.election.enable is set to true in the server configuration, the data may be lost; if unclean.leader.election.enable is set to false, the Partition may have no leader for a period of time until the latter Broker is started.

SmallFS

SmallFS's read and write services are normal.

None

Redis

Redis's read and write services are normal.

Capacity expansion or reduction for Redis clusters cannot be performed.

Hive

Hive services are normal during the rolling restart.

If the execution time of an existing task exceeds the timeout interval of rolling restart, the task may fail during the restart. You can retry the task if it fails.

Solr

Solr's read and write services are normal.

None

Storm/Streaming

All of Storm/Streaming's services are normal.

After the rolling restart is complete, the worker processes are not evenly distributed in the cluster topology. You can run the storm rebalance command to balance the topology if necessary.

Flume

Service interruption and data loss can be avoided if the following conditions are met:

  • Active collection mode: Cache persistency is adopted.
  • Passive collection mode: The client must support failover or load balancing.

Before the rolling upgrade, sinks must be added to sink groups (consisting of two or more sinks). This requires more resources.

None

GraphBase

Data can be imported in real time and in batches.

  • REST APIs need to be authenticated again.
  • You need to connect to Gremlin_console again.
  • You need to connect to the Gremlin Java API again.
Download
Updated: 2019-05-17

Document ID: EDOC1100074548

Views: 3168

Downloads: 36

Average rating:
This Document Applies to these Products
Related Documents
Related Version
Share
Previous Next