Mixed Computing
Overview
By default, the fully offloaded mode is used on the Ascend AI Processor. That is, the execution of computing operators is fully offloaded on the device side.
As a supplement to the fully offloaded mode, the mixed computing mode allows some operators that cannot be offloaded to be executed online in the frontend framework, improving the Ascend AI Processor flexibility for adapting to TensorFlow. Note the following:
- In mixed computing mode, iteration offloading is not supported. That is, iterations_per_loop must retain the default value 1.
- In addition to the operators that are not offloaded by default, you can also configure the operators that are not offloaded by using without_npu_compile_scope.
- The FusedBatchNormV3 operator was released in 2019. Its fifth output is a CUDA-optimized output. It is not supported on Ascend AI Processor in mixed computing mode. If tf.layers.batch_normalization is used in your training script, you can use "with compat.forward_compatibility_horizon(2019, 5, 1):" to skip this operator.
Enabling Mixed Computing with Estimator
from npu_bridge.estimator.npu.npu_config import NPURunConfig from npu_bridge.estimator import npu_ops session_config=tf.ConfigProto() config = NPURunConfig(session_config=session_config, mix_compile_mode=True, iterations_per_loop=1)
Enabling Mixed Computing with sess.run()
import tensorflow as tf from npu_bridge.estimator import npu_ops from npu_bridge.estimator.npu import npu_scope from tensorflow.core.protobuf.rewriter_config_pb2 import RewriterConfig X = tf.random_normal([2,]) Y = tf.random_normal([2,]) with npu_scope.without_npu_compile_scope(): pred = tf.add(tf.multiply(X, 1.), 0.) cost = tf.reduce_sum(tf.abs(pred-Y)) config = tf.ConfigProto() custom_op = config.graph_options.rewrite_options.custom_optimizers.add() custom_op.name = "NpuOptimizer" custom_op.parameter_map["use_off_line"].b = True custom_op.parameter_map["mix_compile_mode"].b = True config.graph_options.rewrite_options.remapping = RewriterConfig.OFF # Disable remapping. with tf.Session(config=config) as sess: print(sess.run(cost)) # The reduce_sum node is executed on the host.