No relevant resource is found in the selected language.

This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies. Read our privacy policy>Search

Reminder

To have a better experience, please upgrade your IE browser.

upgrade

FusionInsight HD 6.5.0 Software Installation 02

Rate and give feedback:
Huawei uses machine translation combined with human proofreading to translate this document to different languages in order to help you better understand the content of this document. Note: Even the most advanced machine translation cannot match the quality of professional translators. Huawei shall not bear any responsibility for translation accuracy and it is recommended that you refer to the English document (a link for which has been provided).
Installation Scheme

Installation Scheme

Before the software installation, see FusionInsight HD 6.5.0 Hardware Deployment and Networking Guide to learn the detailed planning and design scheme.

The installation scheme of the FusionInsight HD system includes Networking Scheme, Nodes Deployment Schemes and Service Deployment Scheme.

Networking Scheme

The network of the FusionInsight HD system is divided into two planes, the service plane and management plane. The two planes are deployed in physical isolation mode to ensure network security. The active and standby management nodes also support the configuration of external management network IP addresses. Users can manage the cluster through external management networks.

When the dual-plane networking is adopted, each node in the cluster accesses the management plane and service plane respectively. Each node requires a management IP address and service IP address. Each IP address uses two network interfaces to configure bond to access two access switches respectively.

  • It is recommended that GE bandwidth be adopted for the management plane of each node, GE bandwidth be adopted between the access switch and the aggregation switch on the management plane, and the stack bandwidth of the aggregation switch on the management plane be configured to 10GE.
  • It is recommended that 10GE bandwidth be adopted for the service plane of each node (such as MN1, CN3, and DN4 in Figure 1-2), 10GE bandwidth be adopted between the access switch and the aggregation switch on the service plane, and the stack bandwidth of the aggregation switch on the service plane be configured to 40GE.

Use the Layer 2 networking as an example. Figure 1-2 describes the dual-plane isolation networking scheme. A, B, and C are the racks where management nodes and control nodes are deployed, which are called basic racks. D is the rack expanded linearly according to service requirements, which is called an extension rack.

Figure 1-2 Dual-plane isolation networking
NOTE:
  • When the single-plane networking is adopted, each node has only one IP address and the service plane and management plane are integrated.
  • Elements of the external management network connected to the active and standby management nodes (such as MN1 and MN2) are not included in this figure. You can configure an external management network address on both of the active and standby management nodes as required.

Nodes Deployment Schemes

Deploy the networking based on the number of nodes where the FusionInsight HD cluster is deployed and the information in Table 1-1.

NOTE:

If you need to configure Federation and add n pairs of NameNodes besides the default pair, 2n additional control nodes need to be added. For details about Federation, see Service Operation Guide > Federation.

Table 1-1 FusionInsight HD deployment scheme

Deployment Scheme

Application Scenario

Networking Planning

The management node, control node, and data node are deployed separately.

This scheme requires at least eight nodes.

MN × 2 + CN × 11 + DN × n

(Recommended) Cluster of which the number of Data Nodes is 2000 to 5000.

  • When the number of nodes in a cluster exceeds 200, the nodes in the cluster are distributed to different subnets and the subnets communicate with each other in Layer 3 interconnection mode using core switches. Each subnet has at most 200 nodes. Ensure that the node quantity is balanced on different subnets.
  • When the number of nodes in a cluster below 200, the nodes in the cluster are deployed in the same subnet and the nodes communicate with each other in Layer 2 interconnection mode using aggregation switches.

MN × 2 + CN × 9 + DN × n

(Recommended) Cluster of which the number of Data Nodes is 500 to 2000.

MN × 2 + CN × 5 + DN × n

(Recommended) Cluster of which the number of Data Nodes is 100 to 500.

MN × 2 + CN × 3 + DN × n

(Recommended) Cluster of which the number of Data Nodes is 30 to 100.

The management node and control node are deployed integrally and the data node is deployed independently.

(MN+CN) × 3 + DN × n

(Recommended) Cluster of which the number of Data Nodes is greater than or equal to 3 but smaller than or equal to 30.

The nodes in the cluster are deployed in the same subnet and the nodes communicate with each other in Layer 2 interconnection mode using aggregation switches.

The management node, control node, and data node are deployed integrally.

  • Cluster of which the number of nodes is smaller than six.
  • This scheme requires at least three nodes.
NOTE:

This scenario is not recommended in a production environment or commercial environment:

  • If management, control, and data nodes are deployed integrally, cluster performance and reliability are greatly affected.
  • If the number of nodes meets the requirements, it is recommended that data nodes be deployed independently.
  • If the number of nodes cannot meet the requirements for separate data node deployment but this scenario is required, adopt the dual-plane networking. The traffic of the management network is isolated from that of the service network to prevent excessive data volumes on the service plane, ensuring correct delivery of management operations.

The nodes in the cluster are deployed in the same subnet and the nodes communicate with each other in Layer 2 interconnection mode using aggregation switches.

Service Deployment Scheme

The FusionInsight HD system is composed of multiple services by following specified logical architecture. A service includes one or multiple roles. A role supports the deployment of one or multiple instances.

  • Service: A service is the capability provided by a component in a cluster. A component in the cluster provides a service and is named by the service.
  • Role: A role is an element of a service. A service contains one or multiple roles. A service is installed on a host (server) by roles.
  • Instance: An instance is formed when a service role is installed on a node. A service corresponds to one or multiple role instances.

Different service roles are selected and deployed on servers during the cluster installation. Table 1-2 lists the memory requirements and deployment principles of service roles. Enter the service role information of each node in FusionInsight Configuration Planning Tool and generate related configuration files for the cluster installation.

For services and roles in clusters with different scales, the default deployment topologies of each package in the FusionInsight Configuration Planning Tool prevail.

The dependency and association relationships between services in the cluster are as follows:

  • Service A depending on service B: If service A is deployed in the cluster, service B must be deployed in advance. Service B provides basic capabilities for service A. In the multi-service scenario, if multiple B services are deployed, you need to specify the service B instance on which service A depends.
  • Service A associated with service B: Service A exchanges data with service B during service running, and their deployment does not depend on each other. In the multi-service scenario, if multiple B services are deployed, you need to specify the service B instance associated with service A.
  • Role A and role B deployed on the same server: If role A is deployed in the cluster, role B must also be deployed, and role A and role B must be deployed on the same node.
NOTE:
  • Federation: Only one pair of NameNode and Zkfc can be installed during cluster installation. If multiple pairs of NameNodes and Zkfcs need to be deployed for HDFS federation, manually add them after the cluster is installed.
  • Multi-service: The multi-service feature allows multiple components of the same type to be installed in the same cluster to better resolve resource isolation or performance problems. Services that support this feature are listed in the following table.
  • Hybrid deployment: In the hybrid deployment of x86 and ARM servers, some services do not allow multiple instances of a role to be installed on nodes on different platforms. The following table describes the details.
Table 1-2 Memory requirements and deployment principles of service roles

Service Name

Role Name

Minimum Memory

Dependency

Role Deployment Principle

Support Multi-Service

Support Hybrid Deployment

OMSServer

OMSServer

10 GB

-

Deploy OMSServers on two management nodes in active/standby mode.

No

No

LdapServer

SlapdServer (LS)

1 GB

-

  • OLdap service of OMS: Deploy LdapServers on two management nodes in active/standby mode.
  • LdapServer service: Deployed on at least 2 control nodes, with a maximum of 10 deployments. LS instances are backup instances of the OLdap service.

No

No

KrbServer

KerberosServer (KS)

3 MB

  • KrbServer depends on LdapServer.
  • The KerberosServer and KerberosAdmin are deployed on the same server.

Deploy KerberosServers on the same two control nodes as KerberosAdmins for load sharing.

No

No

KerberosAdmin (KA)

2 MB

Deploy KerberosAdmins on the same two control nodes as KerberosServers for load sharing.

ZooKeeper

Qquorumpeer (QP)

1 GB

-

Deploy three QPs on control nodes in each cluster. Ensure that the number of nodes that contain QP is an odd number if expansion is required. A maximum of nine instances are supported.

Yes

Yes

HDFS

NameNode (NN)

4 GB

  • The NameNode and Zkfc are deployed on the same server.
  • HDFS depends on ZooKeeper.

Deploy NameNodes on two control nodes in active/standby mode.

No

Yes

ZooKeeper FailoverController (Zkfc)

1 GB

Deploy Zkfcs on two control nodes in active/standby mode.

JournalNode (JN)

4 GB

Deploy JournalNodes on at least three control nodes. Each node stores one copy of backup data. To reserve three or more copies of backup data, deploy multiple JournalNodes on control nodes or data nodes. Ensure that the quantity is an odd number.

DataNode (DN)

4 GB

Deploy at least three DataNodes. You are advised to deploy DataNodes on data nodes.

Router

4 GB

Deploy Routers on at least three control nodes only in Federation scenarios. For details, see section "Federation Configuration" in Service Operation Guide under Federation.

HttpFS

128 MB

Deploy at most ten HttpFSs on the same nodes as DataNodes.

Yarn

ResourceManager (RM)

2 GB

Yarn depends on HDFS and ZooKeeper.

Deploy ResourceManagers on two control nodes in active/standby mode.

No

Yes

NodeManager (NM)

2 GB

Deploy NodeManagers on data nodes. The number of NodeManagers must be consistent with the number of HDFS DataNodes.

MapReduce

JobHistoryServer (JHS)

2 GB

MapReduce depends on Yarn, HDFS, and ZooKeeper.

  • In single-node deployment mode, deploy one JHS in one cluster on the control node. This mode is recommended to ensure compatibility with open source.
  • In HA mode, deploy JHSs on two control nodes in active/standby mode.

No

Yes

DBService

DBServer

512 MB

-

Deploy DBServers on two control nodes in active/standby mode.

Yes

No

Hue

Hue

1 GB

Hue depends on DBService.

NOTE:

In the Federation scenario, the Hue component can connect to the HttpFS for interface conversion. In this case, Hue also depends on the HttpFS role of HDFS.

Deploy Hues on two control nodes in active/standby mode.

No

NOTE:

The multi-service feature is not supported, but multiple services, such as HBase and Hive, can be managed using Hue.

No

Loader

LoaderServer (LS)

2 GB

Loader depends on MapReduce, Yarn, DBService, HDFS, and ZooKeeper.

Deploy Loaders on two control nodes in active/standby mode.

Yes

No

Spark

SparkResource (SR)

-

Spark depends on Yarn, Hive, HDFS, MapReduce, ZooKeeper, and DBService.

SR does not have an actual process and does not consume memory.

Deploy SRs on at least two control nodes in non-active/standby mode.

Yes

Yes

JobHistory (JH)

2 GB

Deploy JHs on two control nodes in non-active/standby mode.

JDBCServer (JS)

2 GB

Deploy JSs on at least two control nodes.

You can deploy JSs on multiple control nodes for load sharing.

Spark2x

SparkResource2 (SR2)

-

Spark2x depends on Yarn, Hive, HDFS, MapReduce, ZooKeeper, and DBService.

SR2 does not have an actual process and does not consume memory.

Deploy SR2s on at least two control nodes or data nodes in non-active/standby mode.

Yes

Yes

JobHistory2 (JH2)

2 GB

Deploy JH2s on two control nodes in non-active/standby mode.

JDBCServer2 (JS2)

2 GB

Deploy JS2s on at least two control nodes.

You can deploy JS2s on multiple control nodes for load sharing.

Hive

HiveServer (HS)

4 GB

Hive depends on DBService, MapReduce, HDFS, Yarn, and ZooKeeper.

Deploy HSs on at least two control nodes. Deploy HSs on multiple control nodes for load sharing.

Yes

Yes

MetaStore (MS)

2 GB

Deploy MSs on at least two control nodes. You can deploy MSs on multiple control nodes for load sharing.

WebHCat

2 GB

Deploy WebHCats on at least one control node. You can deploy WebHCats on multiple control nodes for load sharing.

HBase

HMaster (HM)

1 GB

HBase depends on HDFS, ZooKeeper, and Yarn.

Deploy HMs on two control nodes in active/standby mode.

Yes

Yes

RegionServer (RS)

6 GB

Deploy RSs on data nodes. The number of RSs must be consistent with the number of HDFS DataNodes.

ThriftServer (TS)

1 GB

Deploy TSs on three control nodes in each cluster. If there is a long delay when a TS accesses HBase and the delay cannot meet user requirements, you can deploy multiple TSs on control nodes or data nodes.

FTP-Server

FTP-Server

1 GB

FTP-Server depends on HDFS and ZooKeeper.

Each instance provides 16 concurrent channels by default. If more concurrent channels are required, you can deploy multiple instances. You are not advised to deploy FTP-Servers on control nodes or nodes where DataNodes reside. When FTP-Servers are deployed on nodes where DataNodes reside, data may be unbalanced.

Yes

Yes

Flume

Flume

1 GB

Flume depends on HDFS and ZooKeeper.

You are advised to deploy the Flume and DataNode on different nodes. If they are deployed on the same node, data imbalance may occur.

Yes

Yes

MonitorServer

128 MB

Deploy MonitorServers on two control nodes in non-active/standby mode.

Kafka

Broker

6 GB

Kafka depends on ZooKeeper.

Deploy Brokers on at least two data nodes. If the data volume generated each day exceeds 2 TB, you are advised to deploy multiple Brokers on control nodes.

Yes

Yes

KafkaUI

-

Kafka depends on Broker and ZooKeeper.

Deploy MonitorServers on two control nodes

Yes

Yes

Metadata

Metadata

512 MB

Metadata depends on DBService.

Deploy only one Metadata on one control node in each cluster.

Yes

N/A

Oozie

oozie

1 GB

Oozie depends on DBService, Yarn, HDFS, and MapReduce.

Deploy oozies on two control nodes in non-active/standby mode.

Yes

No

Solr

SolrServerN (N = 1-5)

2 GB

Solr depends on ZooKeeper.

NOTE:

When Solr data is stored on HDFS, Solr also depends on HDFS. If you select to store Solr index data on HDFS, deploy only one SolrServer instance (including SolrServerAdmin) on each node.

Each node supports five instances. You are advised to configure more than three nodes with instances evenly distributed.

  • Solr data is stored on HDFS preferentially, and three Solr instances are deployed on each node.
  • It is recommended that a node whose real-time index speed is greater than 2 MB/s be deployed on a local disk, five Solr instances be deployed on each node, and a disk be mounted to each Solr instance independently.
  • Compared with storage on a local disk, the performance of storage on HDFS decreases by 30% to 50%.

Yes

No

SolrServerAdmin

2 GB

Deploy SolrServerAdmins on two data nodes in non-active/standby mode.

HBaseIndexer

512 MB

HBaseIndex depends on HBase, HDFS, and ZooKeeper.

NOTE:

HBaseIndexer is required only when the hbase-indexer function is used.

Deploy one HBaseIndexer on each node where each SolrServer instance resides.

Elasticsearch

EsMaster

Recommended value: 31 GB

Elasticsearch depends on ZooKeeper.

Control node. Install an odd number (at least three) of EsMaster instances.

Yes

Yes

EsNode1-9

The values of -Xms and -Xmx must be consistent. Recommended value: 31 GB.

Data node. EsNode1 should be installed with at least two instances. Installation of EsNode2 to EsNode9 is optional.

EsClient

Recommended value: 31 GB

You are advised to deploy EsClients on data nodes to share the load when the cluster scale is large or when there are a large number of service requests. Determine the number of EsClients to be installed as required.

SmallFS

FGCServer

6 GB

SmallFS depends on MapReduce, Yarn, HDFS, and ZooKeeper.

Deploy FGCServers on two control nodes in active/standby mode.

No

No

Flink

FlinkResource

-

Flink depends on HDFS, Yarn, and ZooKeeper.

FlinkResource does not have an actual process and does not consume memory. Deploy FlinkResource on data nodes. The number of FlinkResources must be consistent with the number of Yarn NodeManagers.

No

Yes

Storm

Logviewer

1 GB

-

Deploy Logviewers on each node deployed with Supervisors.

Yes

Yes

Nimbus

1 GB

Nimbus depends on ZooKeeper.

Deploy Nimbuses on two control nodes in active/standby mode. This service role is associated with UI.

UI

1 GB

UI depends on ZooKeeper.

Deploy UIs on two control nodes. This service role is associated with Nimbus. (The association relationship means that on a node where Nimbus is deployed, UI is also deployed.)

Supervisor

1 GB

-

Deploy Supervisors on at least one data node. If a large number of computing capabilities are required, you are advised to deploy multiple Supervisors on independent nodes. Supervisors manage Workers, which occupy a large amount of resources. The number of Workers and memory can be configured.

NOTE:

The number of Supervisors to be deployed can be calculated using the following formula, where the number of topologies and the number of Workers in each topology are planned by the customer and the number of Workers configured for each Supervisor is five by default.

Number of Supervisors = Number of topologies x Number of Workers in each topology/Number of Workers configured for each Supervisor

Redis

Redis_1, Redis_2, Redis_3, ..., Redis_32

1 GB

Redis depends on DBService.

In single-master mode, deploy Redis on at least one data node. Deploy Redis clusters on at least three data nodes if required.

No

No

GraphBase

LoadBalancer

8 GB

GraphBase depends on HDFS, HBase, Spark, Yarn, ZooKeeper, MapReduce, Kafka, DBService, KrbServer, LdapServer, and Elasticsearch.

Deploy LoadBalancers on two control nodes in active/standby mode.

Yes

No

GraphServer

32 GB

Deploy GraphServers based on the GraphServer service volume. You can deploy GraphServers on multiple nodes or on the node where NodeManager resides for load sharing.

To reduce GraphServer resource competition, you are advised to deploy it independently.

Elk

ElkServer

16 GB

Elk depends on HDFS and Yarn.

Deploy ElkServers on at least three data nodes where the DataNode of HDFS resides.

No

No

Role Deployment Suggestions for Large-Scale Clusters

For large-scale clusters with more than 2000 nodes, the following role deployment principles are recommended:

  • Deploy five quorumpeer roles.
  • You can deploy multiple roles that support multi-instance based on service load.
  • In large-scale clusters, you can add up to 10 SlapdServer instances. Deploy each SlapdServer on an independent node.
  • If federation is required for NameNode expansion, add control nodes and deploy NameNodes independently. The number of added control nodes is two times the number of added NameNode pairs.
  • Add control nodes for HiveServers based on service requirements. You are advised to deploy HiveServers independently.
Download
Updated: 2019-05-17

Document ID: EDOC1100074555

Views: 5845

Downloads: 6

Average rating:
This Document Applies to these Products
Related Version
Related Documents
Share
Previous Next