No relevant resource is found in the selected language.

This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies. Read our privacy policy>Search

Reminder

To have a better experience, please upgrade your IE browser.

upgrade

FusionInsight HD 6.5.0 Software Installation 02

Rate and give feedback:
Huawei uses machine translation combined with human proofreading to translate this document to different languages in order to help you better understand the content of this document. Note: Even the most advanced machine translation cannot match the quality of professional translators. Huawei shall not bear any responsibility for translation accuracy and it is recommended that you refer to the English document (a link for which has been provided).
Configuration Requirements

Configuration Requirements

HUAWEI FusionInsight HD supports universal x86 servers (offered by Huawei and other vendors) or Huawei TaiShan ARM server. Enterprises can select servers as required.

  • Nodes installed in the same batch must use servers of the same architecture, x86 or ARM. During capacity expansion, servers of other architectures can be added.
  • The service end and client can be installed on servers of different architectures, x86 or ARM.

HUAWEI FusionInsight HD supports universal x86 servers (offered by Huawei and other vendors). Enterprises can select servers as required.

Rack Requirements

Nodes in the cluster need to be deployed on racks. Comply with the following rules when you plan racks.

  • The active and standby management nodes need to be deployed on different racks.
  • Control nodes that have active and standby services need to be deployed on different racks.
  • If HDFS is used as the underlying storage system, ensure that the number of DataNodes deployed on each rack is almost the same.
Figure 2-1 Rack deployment example

Node Hardware Requirements

Ensure that each server meets the requirements listed in Table 2-1.

Table 2-1 Minimum requirements for hardware

Hardware

Configuration

CPU

  • X86:
    • Minimum configuration: dual-socket four-core Intel processor.
    • Recommended configuration: dual-socket eight-core Intel processor.
    • Recommended configuration for clusters with over 2000 nodes: dual-socket ten-core Intel processor for management nodes and control nodes.
  • Huawei TaiShan ARM server: dual-socket 32-core 1616 processor or more.

Bit-mode

64 bits

Memory

  • Not less than 64 GB. To meet the requirements of actual service running, the memory size must be planned based on actual service deployment. For details, see Service Deployment Scheme.
  • Clusters with moe than 100 nodes: 256 to 512 GB memory is recommended for a single management node or control node.

NIC

  • Two GE electrical ports are used on the management plane and the ports are bonded.
  • Two 10GE optical ports are used on the service plane and the ports are bonded.
  • Two GE electrical ports are used on the external management network and the ports are bonded.
    NOTE:
    • The methods of naming NICs on the x86 platform and ARM platform are different if the NICs are not bonded. For example, the NIC on the x86 platform is named eth1 while on the ARM platform is named enahisic2i1. This document uses NICs on the x86 platform as an example. For NICs on the ARM platform, replace the names as required.
    • If the systemd patch is installed in the OS, use the default NIC name, for example, enahisic2i1. Do not change the default NIC name. Otherwise, the FusionInsight service is unavailable.
    • For details about the bond configuration type, see Network Planning > Cluster Networking >

      Host Port Configuration in FusionInsight HD 6.5.0 Hardware Deployment and Networking Guide.

RAID configuration

The non-OS disks of the management node and control node are used to store FusionInsight HD metadata, and the non-OS disks of the data node are used to store FusionInsight HD Hadoop data. To ensure reliability, the recommended RAID configurations on each node are as follows:

  • Management node: Both the OS disk and non-OS disk use an exclusive RAID group of RAID 1.
  • Control node: Both the OS disk and each non-OS disk use an exclusive RAID group of RAID 1.
  • Data node: The OS disk uses an exclusive RAID group of RAID 1; while RAID 0 or no RAID is configured for the non-OS disk.
NOTE:
  • If no RAID is configured for non-OS disks in the scenario where data disks are mounted to a RAID card, you need to configure the JBOD mode for the disks.
  • You are advised to configure RAID 5 for Kafka and Elasticsearch and Solr data disks.
  • If you use old disks for the installation, configure a RAID and format the disks before the installation.

Number of disks

The disk quantity indicates the number of disks that can be detected by the OS after RAID groups are configured. For example, if a node has six disks and three RAID 1 groups are configured, the disk quantity of the node is 3.

If not all services are installed, you can reduce the number of metadata disks or data disks required by the services that are not installed. For details about the number of metadata disks or data disks required by each service, see the Preparing OS.

  • All nodes are deployed separately:
    • Management node: Each node has three disks (one OS disk and two metadata disk, that is, 2+4 physical disks).
    • Control node: Each node has six disks (one OS disk and five metadata disks, that is, 2+10 physical disks).
    • Data node:

      Each node has two physical disks that form a RAID1 group, and the disks are used as OS disks. The number of data disks is configured based on the capacity plan in each scenario.

  • Management nodes and control nodes are deployed together, and data nodes are deployed separately:
    • Management node and control node: Each node has eight disks (one OS disk and seven metadata disks, that is, 2+14 physical disks).
    • Data node: Each node has two physical disks that form a RAID1 group, and the disks are used as OS disks. The number of data disks is configured based on the capacity plan in each scenario.
  • All nodes are deployed together: The number of disks is configured based on the capacity plan in each scenario.
  • Disk quantity planning: Plan the disk quantity based on the total service volume.

    Example:

    The total amount of raw data is 1 PB, 10 TB of the raw data is hot data, and hot data is to be stored on SSDs. The size of a SATA disk is 4 TB, and that of an SSD is 500 GB. Data is stored in HDFS with three copies, and the tiered storage policy is ONE_SSD. The minimum number of required SSDs: 10 TB/500 GB x 3 = 60, the minimum number of required SATA disks: 1 PB/4 TB x 3 = 750.

NOTE:
  • It is recommended that all OS disks and metadata disks employ SAS disks and data disks employ SATA disks. If SSDs are configured, you are advised to use them for control nodes and management nodes.
  • If the Flume role of the Flume service is deployed on an independent node and uses the file channel, one OS disk and at least one data disk (not smaller than 300 GB, RAID 1) are required.
  • If the Supervisor role of Storm is deployed on a separated node, one OS disk and one Storm data disk (not smaller than 50 GB, RAID 0) are required.
  • If the Solr service is installed and the number of Solr service nodes exceeds 50, it is recommended that ZooKeeper metadata disks employ SSD disks.
  • Install the two data nodes of SolrServerAdmin. Add a data disk to each of the two data nodes.
  • If Solr index data is stored in the HDFS, no local disk space is occupied.
  • Each Redis instance corresponds to a disk partition. (The number of Redis instances is the number of CPU cores minus 2.) It is recommended that the number of Redis partitions allocated on each disk be equal to or less than five to ensure cluster performance.
  • Data nodes can be deployed on pure SSDs or on both SSDs and hard disks. The disk type is selected based on the HDFS data storage policy.

Disk space

For details about how to configure RAID, see How to Configure the Capacity of RAID1. If the disk capacity is 600 GB, the disk can be rebuilt within 4 hours after a disk fault occurs. You can flexibly configure the disk capacity based on the actual requirements.

  • Management node: The space of the OS disk is greater than or equal to 600 GB, and the space of each non-OS disk is greater than or equal to 600 GB.
  • Control node: The space of the OS disk is greater than or equal to 600 GB, and the space of each non-OS disk is greater than or equal to 600 GB.
  • Data node: The space of the OS disk is greater than or equal to 600 GB, and the space of each non-OS disk is greater than or equal to 600 GB.
NOTE:
  • If the disk capacity is not 600 GB, configure the disk capacity when Preparations for Installation > Preparing OS. For details, see How to Configure Disk Partitions When the Disk Capacity Is Insufficient.
  • If there are less than 2000 nodes, the capacity must be greater than or equal to 600 GB. If there are more than 2000 nodes, the capacity must be greater than or equal to 1 TB.

Disk type

You can use either of the following local storage type for big data storage based on site requirements:

  • SSDs provide high read and write performance. The storage capacity of SSDs is small and the unit storage cost is higher than that of common mechanical hard disks.

    Because the NVMe SSD does not support RAID 1, use the SAS SSD as the OS disk and metadata disk on the management node and control node.

  • Disks include SATA and SAS disks. They are the main storage types used by HDFS for data storage.
    NOTE:
    • Disks are the main disk storage types on the big data platform. SSDs can be used for high-speed storage.
    • HDFS data nodes can use different disk types for data storage based on service requirements. For details, see Service Operation Guide > HDFS > Configuring HDFS Data Storage Policies. Disks must be planned for data nodes before the HDFS tiered storage function is used. Multiple disk types can be configured for a data node.
    • When using tiered storage, you are advised to configure multiple disk types and deploy heterogeneous disks (such as 12 SSD disks) separately, namely, on different nodes. Additionally, you should plan the disk quantity and capacity based on service requirements.
    NOTICE:

    Recommendations on disk selection for management nodes and control nodes:

    • OS: SAS disk or SSD is recommended.
    • Metadata: SAS disk or SSD is recommended for clusters with less than 1500 nodes, and SSD is recommended for clusters with more than 1500 nodes.

Based on the preceding requirements, the following hardware configuration suggestions are provided based on the common FusionInsight solution scenarios:

Offline Processing Scenario

Real-Time Stream Processing Scenario

Interactive Query Scenario

Real-Time Retrieval Scenario

Hybrid Scenario

Switch Configuration Requirements

Table 2-2 Recommended configuration

Usage

Name

Switch Configuration

Number of Switches

Calculation Method

Access switch on the management plane

GE switch

  • Layer 3 GE switch
  • 48 x 10/100/1000Base-T
  • 4 x 10GE SFP+ ports
  • 1 x expansion slot for 4 x 40 GE QSFP+ interface modules
  • Switching capacity: 680 Gbit/s/6.8 Tbit/s or higher
  • Packet forwarding capability: 420 Mpps or higher

Number of nodes x 3/(Number of GE ports on access switches – 2)

Number of access switches on the management plane = Number of nodes x 3 (two management plane GE ports on each node + one BMC port)/46 (Number of GE ports on the access switches 48 – Number of stack ports 2)

The calculation result is rounded up to avoid a single point of failure. The minimum value is 2.

Aggregation switch on the management plane

10GE switch

  • 10GE switch
  • 48 x 10GE SFP+ ports
  • 2 x 40GE QSFP+ ports
  • 1 x expansion slot for 4 x 40 GE QSFP+ interface modules
  • Packet forwarding capability: 1080 Mpps
  • Switching capacity: 2.56 Tbit/s/23.04 Tbit/s

Number of access switches on the management plane x 2/(Number of 10GE ports on the aggregation switches – 2)

Number of aggregation switches on the management plane = Number of access switches on the management plane x 2/46 (Number of 10GE ports on the aggregation switches 48 – Number of stack ports 2)

The calculation result is rounded up to avoid a single point of failure. The minimum value is 2.

Access switch on the service plane

10GE switch

  • 10GE switch
  • 48 x 10GE SFP+ ports
  • 2 x 40GE QSFP+ ports
  • 1 x expansion slot for 4 x 40 GE QSFP+ interface modules
  • Packet forwarding capability: 1080 Mpps
  • Switching capacity: 2.56 Tbit/s/23.04 Tbit/s

Number of nodes x 2/((Number of 10GE ports on access switches – Number of stack ports) x (Access/aggregation convergence ratio/(1 + Access/aggregation convergence ratio)))

The access/aggregation convergence ratio is 3 (3:1). Number of access switches on the service plane = Number of nodes x 2/(46 (Number of 10GE ports on the access switches 48 – Number of stack ports 2) x 0.75)

The calculation result is rounded up to avoid a single point of failure. The minimum value is 2.

Aggregation switch on the service plane

10GE switch

  • 10GE switch
  • 48 x 10GE SFP+ ports
  • 2 x 40GE QSFP+ ports
  • 1 x expansion slot for 4 x 40 GE QSFP+ interface modules
  • Packet forwarding capability: 1080 Mpps
  • Switching capacity: 2.56 Tbit/s/23.04 Tbit/s

Number of access switches on the service plane x ((Number of 10GE ports on the access switches 48 – Number of stack ports 2) x (1/(1 + Access/aggregation convergence ratio)))/Number of 10GE ports on the aggregation switches

The access/aggregation convergence ratio is 3 (3:1). Number of aggregation switches on the service plane = Number of access switches on the service plane x (46 (Number of 10GE ports on the access switches 48 – Number of stack ports 2) x 0.25)/48 (Number of 10GE ports on the aggregation switches)

The calculation result is rounded up to avoid a single point of failure. The minimum value is 2.

Download
Updated: 2019-05-17

Document ID: EDOC1100074555

Views: 6919

Downloads: 7

Average rating:
This Document Applies to these Products
Related Version
Related Documents
Share
Previous Next