No relevant resource is found in the selected language.

This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies. Read our privacy policy>Search

Reminder

To have a better experience, please upgrade your IE browser.

upgrade

HUAWEI CLOUD Stack 6.5.0 Backup and Restoration Guide 03

Rate and give feedback :
Huawei uses machine translation combined with human proofreading to translate this document to different languages in order to help you better understand the content of this document. Note: Even the most advanced machine translation cannot match the quality of professional translators. Huawei shall not bear any responsibility for translation accuracy and it is recommended that you refer to the English document (a link for which has been provided).
Restoration by Category

Restoration by Category

During restoration, perform all restoration operations specified in this chapter.

Stopping ingressproxy-er

To prevent data exchange during restoration, stop ingressproxy-er before the restoration operation.

  1. Use PuTTY to log in to the manage_lb1_ip node.

    The default username is paas, and the default password is QAZ2wsx@123!.

  1. Run the following command and enter the password of the root user to switch to the root user:

    su - root

    Default password: QAZ2wsx@123!

  2. Stop ingressproxy-er in the Tenant Management zone, and query the node where the ingressproxy-er process resides.

    kubectl get pod -n fst-manage -owide |grep ingressproxy-er

    ingressproxy-er-53jpc      1/1       Running   0        5d        10.8.95.135   paas-manage-core4-7cc7f8-175t2  
    ingressproxy-er-zb6dp      1/1       Running   0        5d        10.8.95.120   paas-manage-core5-7cc7f8-zsqsk

  3. Execute the following command to remove the script for the health check of ingressproxy-er.

    rm /var/paas/srv/keepalived/bin/healthchk4er.sh

    NOTE:

    The command must be executed on the ingressproxy-er node.

  4. Delete the er label.

    kubectl label node paas-manage-core4-7cc7f8-175t2 fst-manage.ingressproxyer- -n fst-manage

    node "paas-manage-core4-7cc7f8-175t2" labeled

    kubectl label node paas-manage-core5-7cc7f8-zsqsk fst-manage.ingressproxyer- -n fst-manage

    node "paas-manage-core5-7cc7f8-zsqsk" labeled
    NOTE:

    If the query result shows that the ingressproxy-er process is running on multiple nodes, delete the er label from all these nodes.

Restoring the Gauss Database

Performing Health Check

If the databases to be restored are deployed in the master and slave mode, check whether the database duplication status (Rpl Status) on the manage_db1_ip node is normal.

Procedure
  1. Use PuTTY to log in to the manage_db1_ip node.

    The default username is paas, and the default password is QAZ2wsx@123!.

  2. Run the following command to check database replication status.

    cd /opt/paas/oss/manager/apps/DBAgent/bin

    ./dbsvc_adm -cmd query-db-instance

    The command output is as follows:

    DBInstanceId                                     ClassId  InstNumber                      Tenant  IP               Port   State  DBType  Version            Role    Rpl Status  MasterID                        GuardMode  DataCheckSum  isSSL 
    deploydbsvr-10_109_178_191-4@10_109_211_34-4   primary  deploydbsvr-10_109_178_191-4   fst-manage  10.109.178.191  32084  Up     gauss   V100R003C20SPC112  Master  Normal      --                              --         717445868     off
    deploydbsvr-10_109_178_191-4@10_109_211_34-4   primary  deploydbsvr-10_109_211_34-4    fst-manage  10.109.211.34   32084  Up     gauss   V100R003C20SPC112  Slave   Normal      deploydbsvr-10_109_178_191-4   --         717445868     off

    The preceding command output is used as an example. The actual command output varies depending on the version.

    • Normal indicates that the database replication status is normal.
    • Abnormal indicates that the database replication status is abnormal.
    • Master indicates the master instance, and Slave indicates the standby instance.
    • 10.109.178.191 indicates the IP address of the node where the database instance resides, and 32084 indicates the number of the port used by the database instance.
    • -- indicates a single-instance node. There is no replication status for a single node.

  3. If the slave database replication status of the database to be restored is in the Abnormal state, restore it to Normal according to the operations described in Abnormal Replication State of Slave Database Instance of HUAWEI CLOUD Stack 6.5.0 Troubleshooting Guide.
Disabling Switchover Between Master and Slave Database Nodes

If the database instance to be restored is database instance in the data zone, skip this section.

Procedure
  1. Use PuTTY to log in to the manage_lb1_ip node.

    The default username is paas, and the default password is QAZ2wsx@123!.

  2. Run the following command and enter the password of the root user to switch to the root user:

    su - root

    Default password: QAZ2wsx@123!

  3. Run the following command to obtain the name of the pod corresponding to DBHASwitchService:

    kubectl get pod -n fst-manage | grep dbha

    dbhaswitchservice-294515604-bch28       1/1       Running        3          3d
    dbhaswitchservice-294515604-qccfk       1/1       Running        2          3d

  4. Enter the pod corresponding to DBHASwitchService.

    kubectl exec dbhaswitchservice-294515604-bch28 -n fst-manage -it sh

    NOTE:

    dbhaswitchservice-294515604-bch28 is the obtained name of the pod corresponding to DBHASwitchService. If the query result contains multiple pod names, use any one of these names.

  5. Use the switchtool tool to set the interval between two consecutive active/standby switchovers.

    su paas

    cd /opt/apps/DBHASwitchService/bin

    sh switchtool.sh -cmd set-ignore-nodes -nodes all

    Successful.

    exit

Restoring Using a One-Click Restoration Tool
Context

Restore a Gauss database instance using the one-click physical restoration tool.

Precautions

When the one-click physical restoration tool is used to resApplication Development database instance, is displayed in the Status column of the instance on the Application Development > Database > RDBMS page of the OM zone console.

Prerequisites
  • You have performed full physical backup on the Gauss database instance. Latest physical backup files can be used for the restoration.
  • The Gauss database instance is running properly.
  • The one-click physical restoration tool is used to restore only the database instance where full physical backup was performed.
Procedure
  1. Use PuTTY to log in to the manage_lb1_ip node.

    The default username is paas, and the default password is QAZ2wsx@123!.

  2. Run the following command and enter the password of the root user to switch to the root user:

    su - root

    Default password: QAZ2wsx@123!

  3. Run the following command to query the node where the datamgmtservice service is deployed:

    kubectl get pod -nfst-manage -oyaml `kubectl get pod -nfst-manage | grep datamgmtservice | awk '{print $1}'` | grep hostIP

  4. Use PuTTY to log in to the node queried in Step 3.

    The default username is paas, and the default password is QAZ2wsx@123!.

  5. Run the following command to perform one-click physical restoration (ossdbsvr-10_90_73_178-21@10_90_73_179-21 is used as an example here):

    If no restoration result is displayed after 30 minutes, contact technical support.

    cd /opt/paas/oss/manager/apps/DBAgent/bin

    bash dbsvc_adm -cmd restore-db-instance -instid ossdbsvr-10_90_73_178-21@10_90_73_179-21 -snapshot ossdbsvr-10_90_73_178-21@10_90_73_179-21_ossdbsvr-10_90_73_179-21_20160527151800 -name remotepolicy -method physical

    Beginning restore db instance task.
    NOTE:

    -instid: indicates the name of the database instance to be restored.

    -snapshot: indicates the snapshot name of a backup file, that is, the prefix of a backup file. You can log in to the Tenant Management portal and choose Application Development > Database > Backup List from the main menu. Copy the snapshot name of the latest backup file.

    -name: indicates the name of a backup policy. This parameter must be set to the remote backup policy during backup. If this parameter is not set to the backup policy during backup, copy the backup file to the backup policy directory.

    -method: indicates the restoration method. This parameter must be set to physical.

  6. Use PuTTY to log in to the slave database node.

    The default username is paas, and the default password is QAZ2wsx@123!.

  7. Run the following command to view the restoration result:

    tail -f /var/log/paas/oss/manager/DeployAgent/oss. dbrepair. trace |grep setupReplication

    Information similar to the following is displayed:

    2019-05-28 15:46:54.959 +08:00 (9921|140464093415232)[DBRepair:464]DBRepair.setupReplication>>dbInstanceId:ossdbsvr-10_90_73_178-21@10_90_73_179-21, targetDC:ossdbsvr-10_90_73_178-21, begin time 2019-05-28 15:46:54.959467
    2019-05-28 15:46:54.959 +08:00 (9921|140464093415232)[DBRepair:506]DBRepair.setupReplication>>dbInstanceId:ossdbsvr-10_90_73_178-21@10_90_73_179-21, targetDC:ossdbsvr-10_90_73_178-21, time cost 0 s

    If the preceding content is displayed in the log, the restoration succeeds. If the restoration fails, contact technical support.

Restoring Switchover Between Master and Slave Database Nodes

If the database instance to be restored is database instance in the data zone, skip this section.

Procedure
  1. Use PuTTY to log in to the manage_lb1_ip node.

    The default username is paas, and the default password is QAZ2wsx@123!.

  2. Run the following command and enter the password of the root user to switch to the root user:

    su - root

    Default password: QAZ2wsx@123!

  3. Run the following command to obtain the name of the pod corresponding to DBHASwitchService:

    kubectl get pod -n fst-manage | grep dbha

    dbhaswitchservice-294515604-bch28        1/1       Running        3       3d
    dbhaswitchservice-294515604-qccfk        1/1       Running        2       3d

  4. Enter the pod corresponding to DBHASwitchService.

    kubectl exec dbhaswitchservice-294515604-bch28 -n fst-manage -it sh

    NOTE:

    dbhaswitchservice-294515604-bch28 is the obtained name of the pod corresponding to DBHASwitchService. If the query result contains multiple pod names, use any one of these names.

  5. Run the following commands to remove ignored nodes and enable master/slave database switchover on these nodes:

    su paas

    cd /opt/apps/DBHASwitchService/bin

    bash switchtool.sh -cmd del-ignore-nodes

    Successful.

    exit

Restoring Redis

The logical restoration of persisted Redis instance must be performed manually. This section describes how to manually restore Febs instances.

Context

When the Redis instances of Febs are abnormal, you need to manually perform the logical restoration.

Prerequisites

The logical backup of the Redis instances has been performed.

Procedure
  1. Use PuTTY to log in to all database nodes one by one: manage_db1_ip, manage_db2_ip.

    The default username is paas, and the default password is QAZ2wsx@123!.

  2. Run following commands to check whether the master instance of Febs database locates on the database node:

    cd /opt/paas/oss/manager/apps/DBAgent/bin

    bash dbsvc_adm -cmd query-db-instance | grep febsdb

    DBInstanceId                        ClassId  Service Name           Tenant          IP          Port   State  DBType  Version  Role    Rpl Status  MasterID               GuardMode  DataCheckSum  
    febsdb-10_8_41_73-7@10_8_41_66-7    primary  febsdb-10_8_41_66-7    fst-manage      10.8.41.66  32094  Up     redis   3.0.7.7  Slave   Normal      febsdb-10_8_41_73-7    --         --            
    febsdb-10_8_41_73-7@10_8_41_66-7    primary  febsdb-10_8_41_73-7    fst-manage      10.8.41.73  32094  Up     redis   3.0.7.7  Master  Normal      --                     --         --      

  3. Refer to Disabling Switchover Between Master and Slave Database Nodes to suspend the active/standby switchover function.
  4. Run the following command to delete the{tenant name}_dcprocess.flag file from the /opt/paas/oss/manager/var/agent/ directory:

    rm /opt/paas/oss/manager/var/agent/{tenant}_dcprocess.flag

    If there is no {tenant name}_dcprocess.flag file under the /opt/paas/oss/manager/var/agent/ directory, delete the file suffixed with dcprocess.flag.

  5. On the database node where the master instance locates, run following commands to stop the master instance (febsdb-10_8_41_73-7 is used as an example here):

    When database is stopped, read and write operation of Febs database are not allowed.

    su paas

    cd /opt/paas/oss/manager/agent/container/redis/bin

    bash stop_redis.sh /opt/redis/data/febsdb-10_8_41_73-7

    If following information is displayed, the database is successfully stopped.

    Stop febsdb-10_8_41_73-7...
    Stop febsdb-10_8_41_73-7...done

    If the database failed to be stopped, repeat the preceding operation.

  6. Run following commands to switch to the dbuser user and back up the aof file of the Febs instance on the manage_db1_ip.

    su dbuser

    cp /opt/redis/data/febsdb-10_8_41_73-7/febsdb-10_8_41_73-7.aof /opt/redis/data/febsdb-10_8_41_73-7/febsdb-10_8_41_73-7.aof_bak

  7. Decompress the backup file under the backup directory, for example, /opt/pub/backup_local/, and move the decompressed file to the directory of the master Febs database instance.

    • Local restoration
      1. Use PuTTY to log in to the database node where the slave instance locates as the paas user.

        The default password is QAZ2wsx@123!.

      2. Switch to the root user.

        su - root

      3. Run following command to copy the file to the/home/paas directory.

        scp febsdb-10_8_41_73-7@10_8_41_66-7_febsdb-10_8_41_66-7_20170627152739_manual_full_day_logical.tar.gz* paas@IP address of master instance node:/home/paas

      4. Use PuTTY to log in to the database node where the master instance locates as the paas user.

        The default password is QAZ2wsx@123!.

      5. Switch to the root user.

        su - root

      6. Run following command to move the file to the backup directory.

        mv /home/paas/febsdb-10_8_41_73-7@10_8_41_66-7_febsdb-10_8_41_66-7_20170627152739_manual_full_day_logical.tar.gz* /opt/pub/backup_local

      7. Go to the backup directory.

        cd /opt/pub/backup_local

      8. Modify the file permissions.

        chown dbuser:dbgroup febsdb-10_8_41_73-7@10_8_41_66-7_febsdb-10_8_41_66-7_20170627152739_manual_full_day_logical.tar.gz*

      9. Unzip the files.

        tar zxf /opt/pub/backup_local/febsdb-10_8_41_73-7@10_8_41_66-7_febsdb-10_8_41_66-7_20170627152739_manual_full_day_logical.tar.gz -C /opt/redis/data/febsdb-10_8_41_73-7

    • Remote restoration
      1. Go to the backup directory.

        cd /opt/pub/backup_local

      2. Connect the SFTP server.

        sftp Login user name of the remote SFTP server@IP address of the remote SFTP server

      3. Gets the file under the backup directory.

        get /opt/pub/backup_local/febsdb-10_8_41_73-7@10_8_41_66-7_febsdb-10_8_41_66-7_20170627152739_manual_full_day_logical.tar.gz*

        /opt/pub/backup_local indicates the remote backup policy path that the user created himself.

      4. Exit the SFTP server.

        exit

      5. Switch to the root user.

        su - root

      6. Modify the file permissions.

        chown dbuser:dbgroup /opt/pub/backup_local/febsdb-10_8_41_73-7@10_8_41_66-7_febsdb-10_8_41_66-7_20170627152739_manual_full_day_logical.tar.gz*

      7. Unzip the files.

        tar zxf /opt/pub/backup_local/febsdb-10_8_41_73-7@10_8_41_66-7_febsdb-10_8_41_66-7_20170627152739_manual_full_day_logical.tar.gz -C /opt/redis/data/febsdb-10_8_41_73-7

    NOTE:

    If the decompressed file name is febsdb-10_8_41_66-7.aof, but not febsdb-10_8_41_73-7.aof, run the following command to rename the file:

    mv /opt/redis/data/febsdb-10_8_41_73-7/febsdb-10_8_41_66-7.aof /opt/redis/data/febsdb-10_8_41_73-7/febsdb-10_8_41_73-7.aof

  8. Run following commands to switch to the paas user start the Febs master database:

    su paas

    . /opt/paas/oss/manager/bin/engr_profile.sh

    ipmc_adm -cmd startdc

    ============================ Starting data container processes...
    Starting redis process FEBService-0-0 ... success…
    ============================ Starting data container processes is complete.

  9. Run the following commands to check whether the replication links of the master and slave Febs database instances are normal:

    cd /opt/paas/oss/manager/apps/DBAgent/bin

    bash dbsvc_adm -cmd query-db-instance | grep febsdb

  10. Run following commands to switch to the dbuser user, connect Redis usingredis-cli, log in to the nodes where Febs master and slave database instances reside respectively, and check whether the numbers of Febs master database keys are consistent:

    su dbuser

    /opt/redis/bin/redis-cli -cipherdir /opt/redis/etc/cipher/ -h 10.8.41.73 -p 32094

    10.8.41.73 and 32094 indicate the IP address and port of the Redis database instance to be connected.

    10.8.41.73:32094> auth febsdb@@<Redis read-only user>@<Redis read-only passowrd>
    OK
    10.8.41.73:32094> info Keyspace
    NOTE:

    <redis read-only user> enter the actual read-only user of Febs database instance.

    <redis read-only password> enter the actual read-only password of Febs database instance.

    Information similar to the following is displayed:

    db0:keys=3,expires=0,avg_ttl=0
    db1:keys=1,expires=0,avg_ttl=0

  11. Run following commands to delete the temporary backup file from the node where Febs database master instance resides:

    su dbuser

    rm /opt/redis/data/febsdb-10_8_41_73-7/febsdb-10_8_41_73-7.aof_bak

  12. Refer to Restoring Switchover Between Master and Slave Database Nodes to restore the active/standby switchover function.

Restoring Tenant Management Zone cfe-etcd

It takes about 30 minutes to restore the tenant management zone.

Stopping Associated Components of Tenant Management Zone cfe-etcd

Before restoring the tenant management zone cfe-etcd, you need to stop the associated components of the tenant management zone cfe-etcd. These components include swr-api-server, aos-apiserver and aos-cmdbserver.

Procedure
  1. Use PuTTY to log in to the manage_lb1_ip node.

    The default username is paas, and the default password is QAZ2wsx@123!.

  2. Run the following command and enter the password of the root user to switch to the root user:

    su - root

    Default password: QAZ2wsx@123!

  3. Run following command to query statuses of all components:

    kubectl -n fst-manage get pods

    If the states of all components are Running, as shown in the following command output, all components are running properly.

    In this case, go to Step 4. If there are components that are not running properly, rectify the environment.

    NAME                                       READY     STATUS    RESTARTS   AGE       IP             NODE
    aos-apiserver-3698640035-bu9nq             1/1       Running   0          1h        192.168.63.11    10.75.159.54
    aos-apiserver-3698640035-u3ngi             1/1       Running   0          1h        192.168.76.8     10.75.159.53
    aos-workflowengine-2914415178-3nlh6        1/1       Running   0          1h        192.168.63.14    10.75.159.54
    aos-workflowengine-2914415178-6nwte        1/1       Running   0          1h        192.168.60.8     10.75.159.52
    as-api-service-2014532315-5brog            1/1       Running   0          1h        192.168.76.12    10.75.159.53
    keepalived-controller-3029816081-s5bqy     1/1       Running   243        1h        10.75.159.53   10.75.159.53
    kube-apiserver-1504422162-bq7au            1/1       Running   0          2h        192.168.76.4     10.75.159.53
    kube-apiserver-1504422162-j52r4            1/1       Running   0          2h        192.168.63.4     10.75.159.54
    kube-controller-manager-2567681657-0h668   2/2       Running   1          2h        192.168.76.3     10.75.159.53
    kube-controller-manager-2567681657-vl9j9   2/2       Running   0          2h        192.168.60.3     10.75.159.52
    kube-scheduler-4025016076-596zs            1/1       Running   0          2h        192.168.76.5     10.75.159.53
    kube-scheduler-4025016076-ybcrk            1/1       Running   1          2h        192.168.63.5     10.75.159.54

    Record the pod names of swr-api-server, aos-apiserver and aos-cmdbserver. For example, aos-apiserver-3698640035-bu9nq.

  4. Stop swr-api-server, aos-apiserver and aos-cmdbserver.

    1. Run the following command to stop swr-api-server:

      kubectl -n fst-manage edit deployment swr-api-server

      Press Enter to open the editing window.

      Change 3 (the queried value of replicas) in the spec area to 0. Then, run the wq! command to save the change and exit.

      NOTE:

      If no swr-api-server is found, skip this step.

    2. Run the following command to stop aos-apiserver:

      kubectl -n fst-manage edit deployment aos-apiserver

      Press Enter to open the editing window.

      Change 2 (the queried value of replicas) in the spec area to 0. Then, run the wq! command to save the change and exit.

      NOTE:

      If no aos-apiserver is found, skip this step.

  5. Repeat Step 3 until no output for pods of swr-api-server, aos-apiserver is displayed. Then the components are successfully stopped.

    NOTE:

    If there is pod in the output, repeat Step 3 until no outputs are displayed.

Restoring Tenant Management Zone cfe-etcd

This section describes how to restore all data of tenant management zone cfe-etcd.

NOTE:
  • If the cluster is abnormal during restoration due to network faults, perform the restoration steps again after the network recovers.
  • Following operations need to be performed on etcd, etcd-event, and etcd-network clusters in the tenant management zone. The etcd cluster is used as an example here.
Prerequisites

Before restoration, ensure that the available space of the parent directory /var/paas/run of the cfe-etcd data directory on manage_db1_ip, manage_db2_ip and manage_db3_ip is greater than 16 GB. Otherwise, the restoration will fail.

NOTE:

/var/paas/run is a soft link that points to /opt/pass/run.

Perform following operations to check the available disk space:

  1. Use PuTTY to log in to the manage_db1_ip node.

    The default username is paas, and the default password is QAZ2wsx@123!.

  2. Run following commands to check the available space of the parent directory of the cfe-etcd data directory:

    df -h /var/paas/run

    Ensure that the available space of the /var/paas/run directory is greater than 16 GB.

  3. Log in to manage_db2_ip and manage_db3_ip nodes respectively and run the preceding command to check that there is sufficient space in the /var/paas/run directory.
Procedure
  1. Use PuTTY to log in to the manage_lb1_ip node.

    The default username is paas, and the default password is QAZ2wsx@123!.

  2. Run the following command and enter the password of the root user to switch to the root user:

    su - root

    Default password: QAZ2wsx@123!

  3. Run the following command to query the status of the etcd pod and node IP address:

    kubectl -n fst-manage get pods -o wide |grep etcd|grep -v cse | grep -v etcd-backup | grep -v etcdflow | grep -v apm-etcd

    Information similar to the following is displayed:

    etcd-server-paas-10-120-244-156                      1/1       Running             0          11h       10.120.244.156    paas-10-120-244-156
    etcd-server-paas-10-109-228-173                     1/1       Running             0          11h       10.109.228.173   paas-10-109-228-173
    etcd-server-paas-10-109-245-10                      1/1       Running             0          11h       10.109.245.10    paas-10-109-245-10

  4. Log in to the nodes (where etcd pods reside) queried in Step 3 as the paas user and run the following commands to move the manifest files in the /var/paas/kubernetes/manifests/ directory to the upper-level directory:

    • etcd nodes:

      cd /var/paas/kubernetes/manifests/

      mv etcd.manifest ../

    • etcd-event nodes:

      cd /var/paas/kubernetes/manifests/

      mv etcd-event.manifest ../

    • etcd-network nodes:

      cd /var/paas/kubernetes/manifests/

      mv etcd-network.manifest ../

    NOTE:

    Do not delete these manifest files. They need to be moved back in subsequent operations.

  5. Repeat Step 3 until no outputs are displayed.
  6. Local backup: The etcd backup files of the tenant management zone are stored on the etcd-backup node. The backup directories are as follows:

    /opt/paas/backup_cfe/etcd_backup/etcd

    /opt/paas/backup_cfe/etcd_backup/etcd-event

    /opt/paas/backup_cfe/etcd_backup/etcd-network

    The backup files of the etcd, etcd-event, and etcd-network clusters are backed up on one of the nodes manage_db1_ip, manage_db2_ip, manage_db3_ip respectively (the etcd-backup node). Confirm the latest backup files on manage_db1_ip, manage_db2_ip and manage_db3_ip.

    Log in to the SFTP server to copy the data file used for restoration to the etcd data directory on manage_db1_ip, manage_db2_ip and manage_db3_ip respectively.
    • The default data directory of etcd-event is /var/paas/run/etcd-event; the default data directory of etcd is /var/paas/run/etcd; the default data directory of etcd-network is /var/paas/run/etcd-network.
    • Ensure that only one data file in the directory is used for restoration. The file name must be the name generated by a manually triggered backup or during scheduled backup and must not be changed.

      You can run the following command to check whether there is only one data file that is generated by the backup tools or during scheduled backup in the directory:

      ls -la /var/paas/run/etcd /var/paas/run/etcd-event/ /var/paas/run/etcd-network

    • The backup data file of each cluster in the tenant management zone is named {etcd-name}_{timestamp}_{namespace}_{jobid}.tar.gz. For example, the backup data file of the etcd cluster is named etcd_2017-11-09-16-03-06_fst-manage_0126865b98473fa5dee8a64a1213b50c.tar.gz.

    scp etcd backup file paas@manage_db1_ip IP address:/var/paas/run/etcd/

    scp etcd backup file paas@manage_db2_ip IP address:/var/paas/run/etcd/

    scp etcd backup file paas@manage_db3_ip IP address:/var/paas/run/etcd/

    scp etcd-event backup file paas@manage_db1_ip IP address:/var/paas/run/etcd-event/

    scp etcd-event backup file paas@manage_db2_ip IP address:/var/paas/run/etcd-event/

    scp etcd-event backup file paas@manage_db3_ip IP address:/var/paas/run/etcd-event/

    scp etcd-network backup file paas@manage_db1_ip IP address:/var/paas/run/etcd-network/

    scp etcd-network backup file paas@manage_db2_ip IP address:/var/paas/run/etcd-network/

    scp etcd-network backup file paas@manage_db3_ip IP address:/var/paas/run/etcd-network/

  7. Run the following command and enter the password of the root user to switch to the root user:

    su - root

    Default password: QAZ2wsx@123!

  8. Restore tenant management zone cfe-etcd.

    Run the following command to locate the nodes with etcd deployed (The last column in the command output is the node list):

    kubectl -n fst-manage get pod -o wide | grep etcd

  9. Run the following commands to move the manifest files of etcd back to its original folder:

    • etcd nodes:

      su paas

      cd /var/paas/kubernetes/manifests/

      mv ../etcd.manifest .

    • etcd-event nodes:

      su paas

      cd /var/paas/kubernetes/manifests/

      mv ../etcd-event.manifest .

    • etcd-network nodes:

      su paas

      cd /var/paas/kubernetes/manifests/

      mv ../etcd-network.manifest .

  10. Repeat Step 3 until all statuses are running, which indicates that cfe-etcd clusters are restored.

    The repeated operations may take 5 to 10 minutes.

Starting Associated Components of Tenant Management Zone cfe-etcd

This section describes how to start associated components of tenant management zone cfe-etcd. Associated components include swr-api-server, aos-apiserver and aos-cmdbserver.

Procedure
  1. Use PuTTY to log in to the manage_lb1_ip node.

    The default username is paas, and the default password is QAZ2wsx@123!.

  2. Run the following command and enter the password of the root user to switch to the root user:

    su - root

    Default password: QAZ2wsx@123!

  3. Restart swr-api-server, aos-apiserver, and aos-cmdbserver.

    • Start swr-api-server.

      kubectl -n fst-manage edit deployment swr-api-server

      Press the Enter key to open the editing window.

      Change the value of replicas in the spec area from 0 to 4 (use the value queried on swr-api-server in 4.a), run wq!, save the change, and exit.

      NOTE:

      If swr-api-server is not stopped, skip this step.

    • Start aos-apiserver:

      kubectl -n fst-manage edit deployment aos-apiserver

      Press the Enter key to open the editing window.

      Change the value of replicas in the spec area from 0 to 2, run wq!, save the change, and exit.

      NOTE:

      If aos-apiserver is not stopped, skip this step.

    • Run the following command to start aos-cmdbserver:

      kubectl -n fst-manage edit deployment aos-cmdbserver

      Press the Enter key to open the editing window.

      Change the value of replicas in the spec area from 0 to 2. Then, run the wq! command to save the change and exit.

      NOTE:

      If no aos-cmdbserver is queried during the stop operation, skip this step.

  4. Run following commands to query statuses of all components:

    NOTE:

    Repeat Step 4 until all components run properly.

    kubectl -n fst-manage get pods | grep -e swr-api-server -e aos-apiserver -e aos-cmdbserver

    If the statuses of all components are Running, as shown in the following command output, all components are running properly.

    aos-apiserver-3585254387-87ntj             1/1       Running   0          40m
    aos-apiserver-3585254387-9gzjk             1/1       Running   0          40m
    aos-cmdbserver-563760893-jm74h             1/1       Running   0          40m
    aos-cmdbserver-563760893-jwsxt             1/1       Running   0          40m
    swr-api-server-3281265826-99g22            1/1       Running   0          41m
    swr-api-server-3281265826-n2kj9            1/1       Running   0          41m
    swr-api-server-3281265826-z5fcf            1/1       Running   0          41m

(Optional) Restoring a Faulty Tenant Management Zone cfe-etcd Cluster

This section describes how to restore a single faulty etcd-cfe cluster of a single tenant management zone cfe-etcd node.

NOTE:

The etcd-event node is used as an example.

Procedure
  1. Use PuTTY to log in to the manage_lb1_ip node.

    The default username is paas, and the default password is QAZ2wsx@123!.

  2. Run the following command and enter the password of the root user to switch to the root user:

    su - root

    Default password: QAZ2wsx@123!

  3. Run the following command to query the IP address of the faulty node:

    1. Run the following command to query the status of the etcd pod:

      kubectl -n fst-manage get pods -o wide |grep etcd|grep -v cse

      Information similar to the following is displayed:

      etcd-0                                     4/4       Running   0         1d        192.168.15.3     manage-cluster1-87c05eac-t6hg4
      etcd-1                                     4/4       Running   0         1d        192.168.9.3      manage-cluster1-87c05eac-k2x7q
      etcd-2                                     4/4       Completed   2       1d        192.168.8.3      manage-cluster1-87c05eac-9dmpc

      The preceding output reveals that the etcd-2 is in an intermediate status, that is, Completed, and restart times keeps increasing. Such information indicates that the etcd-2 that restarts for multiple times is faulty.

      NOTE:

      The faulty statuses of the etcd pod include Completed, ImagePullBackOff, ErrImagePull, and err. If any faulty status is displayed, the etcd-2 is faulty.

    2. Run the following command to query the IP address of the etcd-2 node:

      kubectl -n fst-manage get node manage-cluster1-87c05eac-9dmpc -ojson |grep address

      Information similar to the following is displayed:

              "address": "10.175.8.74",
              "addresses": [
                      "address": "10.175.8.74",
                      "address": "10.175.8.74",
                      "address": "manage-cluster1-87c05eac-9dmpc",

  4. Run the following command to query detailed fault information of the pod:

    kubectl -n fst-manage describe pod etcd-2

    In the command output, you can view cfe-etcd fault information.

  5. Run the following command to log in to the cfe-etcd docker:

    1. Log in to the cfe-etcd tenant management zone management nodes that are running properly as the paas user.
      NOTE:

      Log in to any properly running tenant management zone node where cfe-etcd resides (manage_db1_ip, manage_db2_ip, manage_db3_ip). You can identify the node that is running properly by referring to the IP address of the faulty node obtained in 3.b.

    2. Run following commands to query etcd-server process and obtain the docker ID:

      sudo docker ps |grep etcd-server

      The following information is displayed, indicating that the docker ID is 4fb02ef18d47:

      4fb02ef18d47 cfe-etcd:2.1.13 "/bin/sh -c 'umask 07" 4 days ago Up 4 days k8s_etcd-container.df3b3616_etcd-server-paas-192-168-0-39_om_401ef3c3a07890329524fab8bb6ec965_15eac53d
       a42f964bb9b9 paas-cfe-pause-bootstrap "/pause" 7 days ago Up 7 days k8s_POD.6d5cdc5e_etcd-server-paas-192-168-0-39_om_401ef3c3a07890329524fab8bb6ec965_ff94abf8
    3. Run the following command to log in to the docker you queried:

      sudo docker exec -it 7452c883baf1 sh

  6. Remove the faulty cfe-etcd cluster.

    1. Search for the faulty ETCD cluster members.

      ETCDCTL_API=3 /start-etcd --cacert /var/paas/kubernetes/cert/ca.crt --cert /var/paas/kubernetes/cert/tls.crt --key /var/paas/kubernetes/cert/tls.key --endpoints https://127.0.0.1:4002 member list -w table

      Record the values of client addrs, which will be used in the following start-etcd commands.

    2. Check the status of the faulty cfe-etcd cluster.
      NOTE:

      By default, 4001 is the client port of etcd cluster, 4002 is the client port of etcd-event cluster, and 4003 is the client port of etcd-network cluster.

      ETCDCTL_API=3 /start-etcd --cacert /var/paas/kubernetes/cert/ca.crt --cert /var/paas/kubernetes/cert/tls.crt --key /var/paas/kubernetes/cert/tls.key --endpoints http://etcd-0.etcd.default.svc.cluster.local:4002, http://etcd-1.etcd.default.svc.cluster.local:4002, http://etcd-2.etcd.default.svc.cluster.local:4002 endpoint health

      Replace endpoints in the preceding command with the values recorded in 6.a.

      If the following information is displayed, the cfe-etcd cluster 9ec37d80501f0a06 is abnormal:

      failed to check the health of member 9ec37d80501f0a06 on http://etcd-0.etcd.default.svc.cluster.local:4002: Get http://etcd-2.etcd.default.svc.cluster.local:4002/health: dial tcp etcd-2.etcd.default.svc.cluster.local:4002: getsockopt: connection refused
      member 3a12b00595fffd87 is healthy: got healthy result from http://etcd-1.etcd.default.svc.cluster.local:4002
      member 5a39bce0ac1ded46 is healthy: got healthy result from http://etcd-0.etcd.default.svc.cluster.local:4002
    3. Run the following command to remove the cfe-etcd cluster 9ec37d80501f0a06:

      ETCDCTL_API=3 /start-etcd --cacert /var/paas/kubernetes/cert/ca.crt --cert /var/paas/kubernetes/cert/tls.crt --key /var/paas/kubernetes/cert/tls.key --endpoints http://etcd-0.etcd.default.svc.cluster.local:4002, http://etcd-1.etcd.default.svc.cluster.local:4002, http://etcd-2.etcd.default.svc.cluster.local:4002 member remove 9ec37d80501f0a06

      Replace endpoints in the preceding command with the values recorded in 6.a.

      After the removal, the docker automatically restarts.

  7. Checking the restoration status of the faulty cfe-etcd cluster.

    1. Log in to the faulty cfe-etcd cluster as the paas user.
    2. Run the following command to check that the docker is started successfully:

      sudo docker ps |grep etcd-event

      If the following information is displayed, the docker is successfully started and runs properly:

      e42217ce8af2        10.175.11.229:8081/root/paas-cfe-etcd-cfe:2.0.6                      "/bin/sh -c '/start.s"   12 s ago        Up 12 s                             k8s_etcd-event.4ed02c36_etcd-0_default_ee82904a-b253-11e6-ae8a
    3. Query the status of the cfe-etcd cluster:
      1. Log in to the cfe-etcd cluster that is running properly as the paas user again.
      2. Run following commands to log in to the docker 7452c883baf1, and query the status of the cfe-etcd cluster:

        sudo docker exec -it 7452c883baf1 sh

        ETCDCTL_API=3 /start-etcd --cacert /var/paas/kubernetes/cert/ca.crt --cert /var/paas/kubernetes/cert/tls.crt --key /var/paas/kubernetes/cert/tls.key --endpoints http://etcd-0.etcd.default.svc.cluster.local:4002, http://etcd-1.etcd.default.svc.cluster.local:4002, http://etcd-2.etcd.default.svc.cluster.local:4002 endpoint health

        Replace endpoints in the preceding command with the values recorded in 6.a.

        If the following information is displayed, the restoration is complete:

        member 2932a24b2ab07be is healthy: got healthy result from http://etcd-2.etcd.default.svc.cluster.local:4002
        member 3a12b00595fffd87 is healthy: got healthy result from http://etcd-1.etcd.default.svc.cluster.local:4002
        member 5a39bce0ac1ded46 is healthy: got healthy result from http://etcd-0.etcd.default.svc.cluster.local:4002
    4. Run the following command to query the status of the etcd pod:
      1. Use PuTTY to log in to the manage_lb1_ip node.

        The default username is paas, and the default password is QAZ2wsx@123!.

      2. Run the following command and enter the password of the root user to switch to the root user:

        su - root

        Default password: QAZ2wsx@123!

      3. Run the following command to query the status of the etcd pod:

        kubectl -n fst-manage get pods -o wide |grep etcd|grep -v cse

        If the following information is displayed, the etcd pod is normal:

        etcd-0                                     4/4       Running   0         1d        192.168.15.3     manage-cluster1-87c05eac-t6hg4
        etcd-1                                     4/4       Running   0         1d        192.168.9.3      manage-cluster1-87c05eac-k2x7q
        etcd-2                                     4/4       Running   24         1d        192.168.8.3      manage-cluster1-87c05eac-9dmpc

Restoring the Software Repository

To avoid the inconsistency between the data you hope to obtain and the data obtained after the restoration, stop uploading data to and downloading data from the SWR before performing the restoration.

Restoring the Software Repository in a Cluster in the Tenant Management Zone
  1. Perform the following operations before restoration:

    1. Use PuTTY to log in to the manage_lb1_ip node.

      The default username is paas, and the default password is QAZ2wsx@123!.

    2. Run the following command and enter the password of the root user to switch to the root user:

      su - root

      Default password: QAZ2wsx@123!

    3. Run the following command to query the status of all swr pods:

      kubectl get pods -o wide -n fst-manage|grep swr

      swr-api-server-452856201-216bi             1/1       Running   0          7h        10.16.89.3        10.106.211.238
      swr-api-server-452856201-ertlx             1/1       Running   0          7h        10.16.12.3        10.106.211.19
      swr-api-server-452856201-u5mbe             1/1       Running   0          7h        10.16.49.16       10.106.211.214
      swr-api-server-452856201-nl7ff             1/1       Running   0          7h        10.16.79.6        10.106.211.66
    4. Stop all swr pods.

      Run the following command to edit the swr-api-server file in the vi editor:

      kubectl edit deployment swr-api-server -n fst-manage

      Use the vi editor to change the value of replicas to 0.

      spec:
       replicas: 0
        selector:
          matchLabels:
            name: swr-api-server

      Run the :wq command to save the change and exit.

    5. Repeat 1.c until no outputs are displayed.

  1. Run the following command to query the node where swr resides in the tenant zone:

    kubectl describe node $(kubectl get pod -n fst-manage -owide|grep swr-api-server | awk '{print $7}') -n fst-manage|grep IP

    Information similar to the following is displayed:

                        kubernetes.io/kubelet.common.envs=HOSTING_SERVER_IP=10.106.211.238
      InternalIP:  10.106.211.238
      DataIP:      10.106.211.238
                        kubernetes.io/kubelet.common.envs=HOSTING_SERVER_IP=10.106.211.19
      InternalIP:  10.106.211.19
      DataIP:      10.106.211.19
                        kubernetes.io/kubelet.common.envs=HOSTING_SERVER_IP=10.106.211.214
      InternalIP:  10.106.211.214
      DataIP:      10.106.211.214
                        kubernetes.io/kubelet.common.envs=HOSTING_SERVER_IP=10.106.211.66
      InternalIP:  10.106.211.66
      DataIP:      10.106.211.66

  2. Log in as the paas user to the node you queried in Step 2 and run the following command to delete data under the /var/paas/dockyard/ directory on the node where the swr database resides:

    rm -rf /var/paas/dockyard/*

  3. Log in as the paas user to the node you queried in Step 2 and run the following commands to copy the to-be-restored data to the /var/paas/dockyard/ directory on the node where the swr database resides:

    • Local restoration

    su root

    cp -r /opt/dockyard/* /var/paas/dockyard/

    chown -R pass:paas /var/paas/dockyard/

    • Remote restoration

    su root

    scp -r Name of the SFTP server user@SFTP server address:/SFTP server backup directory/Name of the node where the SWR service is deployed/* /var/paas/dockyard/

    An example command is scp -r sftpuser@10.118.38.35:/opt/dockyard/manage-swr-22a5deec-7653-xqgll/* /var/paas/dockyard/.

    chown -R paas:paas /var/paas/dockyard/

  4. Perform the following operations after restoration:

    1. Use PuTTY to log in to the manage_lb1_ip node.

      The default username is paas, and the default password is QAZ2wsx@123!.

    2. Run the following command and enter the password of the root user to switch to the root user:

      su - root

      Default password: QAZ2wsx@123!

    3. Run the following command to query the status of all swr pods:

      kubectl get pods -n fst-manage -o wide|grep swr

    4. Start all swr pods.

      Run the following command to edit the swr-api-server file in a vi editor:

      kubectl edit deployment swr-api-server

      Use the vi editor to change the value of replicas to 4 (the actual value prevails).

      spec:
       replicas: 4
        selector:
          matchLabels:
            name: swr-api-server

      Run the :wq command to save the change and exit.

    5. Repeat running the following command to query the status of all swr pods until all four swr services are started:

      kubectl get pods -o wide -n fst-manage|grep swr

      Information similar to the following is displayed after the swr services are properly started:

      swr-api-server-452856201-216bi             1/1       Running   0          7h        10.16.89.3        10.106.211.238
      swr-api-server-452856201-ertlx             1/1       Running   0          7h        10.16.12.3        10.106.211.19
      swr-api-server-452856201-u5mbe             1/1       Running   0          7h        10.16.49.16       10.106.211.214
      swr-api-server-452856201-nl7ff             1/1       Running   0          7h        10.16.79.6        10.106.211.66

      If the status of all SWR instances does not switch to Running after 5 minutes, contact technical support.

Starting ingressproxy-er

  1. Use PuTTY to log in to the manage_lb1_ip node.

    The default username is paas, and the default password is QAZ2wsx@123!.

  1. Run the following command and enter the password of the root user to switch to the root user:

    su - root

    Default password: QAZ2wsx@123!

  2. Run the following command to add the er label deleted in Stopping ingressproxy-er:

    Command for adding the er label to the node in the management zone:

    kubectl label node paas-manage-core4-7cc7f8-175t2 fst-manage.ingressproxyer=ingressproxyer -n fst-manage

    node "paas-manage-core4-7cc7f8-175t2" labeled

    kubectl label node paas-manage-core5-7cc7f8-zsqsk fst-manage.ingressproxyer=ingressproxyer -n fst-manage

    node "paas-manage-core5-7cc7f8-zsqsk" labeled

Translation
Download
Updated: 2019-06-14

Document ID: EDOC1100062366

Views: 727

Downloads: 9

Average rating:
This Document Applies to these Products
Related Documents
Related Version
Share
Previous Next