Overview
Operator fusion, an important means to improve network performance, can be implemented by graph fusion or Unified Buffer fusion (UB fusion).
The system is built in with a range of graph fusion and UB fusion rules, which are enabled by default. This document describes only part of the fusion rules.
Graph Fusion
Graph fusion refers to the process that FE modifies a graph according to the fusion rules. The original operators in the graph are replaced by fused operators to improve the compute efficiency. Graph fusion improves the operator compute efficiency from the following aspects:
- Saves the compute time by reducing the mathematical compute workload of operators. For example, Conv and BiasAdd can be fused into one operator, so that accumulation is directly completed in the L0C Buffer to spare the add compute workload.
- Accelerates post-fusion computation by utilizing hardware instructions. In the preceding example, graph fusion is performed to move the accumulation workload of "Conv+BiasAdd" structure to the L0C Buffer, thereby accelerating the compute process by utilizing the accumulation capability of L0C Buffer.
UB Fusion
Unified Buffer is an important storage unit in the Ascend AI Processor. Assume that the compute result of operator A is stored in Unified Buffer and will be moved to Global Memory. To run operator B, the output of operator A needs to be moved from Global Memory back to Unified Buffer. After the compute process of operator B is complete, the output of operator A is moved from Unified Buffer back to Global Memory.
Throughout the process, the compute result of operator A is moved three times between Unified Buffer and Global Memory. However, with UB fusion, you can fuse operators A and B to remove the unnecessary detour through the Global Memory. UB fusion greatly improves the compute efficiency and increases the bandwidth by reducing the data movements between Unified Buffer and Global Memory.
Disabling a Fusion Rule
You can choose to disable some of the fusion rules in advance before building a model as needed to improve the build performance. However, disabling the fusion rules does not mean better compute performance. Disable a fusion rule in either of the following ways:
When converting a model with ATC, use the fusion switch --fusion_switch_file to configure the fusion rule configuration file path and file name.
The following is a template of the fusion rule configuration file. on indicates that a fusion rule is enabled, and off indicates that a fusion rule is disabled.
{ "Switch":{ "GraphFusion":{ "RequantFusionPass":"on", "ConvToFullyConnectionFusionPass":"on", "SoftmaxFusionPass":"on", "NotRequantFusionPass":"on", "SplitConvConcatFusionPass":"on", "ConvConcatFusionPass":"on", "MatMulBiasAddFusionPass":"on", "PoolingFusionPass":"on", "ZConcatv2dFusionPass":"on", "ZConcatExt2FusionPass":"on", "TfMergeSubFusionPass":"on" }, "UBFusion":{ "TbePool2dQuantFusionPass":"on" } } }