No relevant resource is found in the selected language.

This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies. Read our privacy policy>Search


To have a better experience, please upgrade your IE browser.


FusionInsight HD V100R002C60SPC200 Product Description 06

Rate and give feedback :
Huawei uses machine translation combined with human proofreading to translate this document to different languages in order to help you better understand the content of this document. Note: Even the most advanced machine translation cannot match the quality of professional translators. Huawei shall not bear any responsibility for translation accuracy and it is recommended that you refer to the English document (a link for which has been provided).
Spark Core Enhanced

Spark Core Enhanced

Adding Spark Application Log to WebUI

On the YARN, if yarn.log-aggregation-enable is set to true, the container log aggregation function is enabled. All logs will be aggregated to an HDFS directory and can only be viewed by accessing an HDFS file. With the Spark application log added to the WebUI, you can directly view logs on a web page. This function is only available after the log aggregation function is enabled. See Figure 4-16

Figure 4-16 Log aggregation page

On the page, logs of the ApplicationMaster and various Executors will be displayed by job. Logs are sorted by container name (ApplicationMaster or Executor) and serial number. Clicking the log link will display corresponding aggregated logs.

In this feature, aggregated logs are parsed by using JobHistoryServer of MapReduce. Therefore, the MapReduce service must have been implemented and the JobHistory must run properly.

DAG Printing

A Directed Acyclic Graph (DAG) clearly shows a Spark Job execution process, helping users quickly analyze correctness of a job execution process and therefore fast optimize the process.

In Spark, RDD computing is triggered only when an RDD performs an action. Actions of each RDD invoke the runJob function from SparkContext. Therefore, the DAG printing logic is added to the runJob function. To print DAGs to logs, set the spark.logLineage parameter of SparkConf to true.

The DAG printing format is as follows:

  1. Information about an RDD is printed in each row, including the type, ID, position in code, and others.
  2. The parent RDD of an RDD is printed in the next row of the RDD. If there are multiple parent RDDs, each row of the parent RDDs is indented by two characters, and an identification string is printed at the beginning of the row.
Figure 4-17 DAG printing example
Updated: 2019-04-10

Document ID: EDOC1000104139

Views: 5935

Downloads: 64

Average rating:
This Document Applies to these Products
Related Documents
Related Version
Previous Next