Migration with sess.run
About sess.run
As a low-level API of TensorFlow, sess.run appears more flexible than Estimator. On the flip side, using it for model implementation could be complex.
To use the sess.run API to develop a training script, perform the following steps.
No. |
Step |
---|---|
1 |
Data preprocessing |
2 |
Model building, loss calculation, and gradient update |
3 |
Session creation and resource initialization |
4 |
Training |
The following describes how to migrate the sess.run API for training on Ascend AI Processor.
Data Preprocessing
The code snippet is ready to use in normal cases. Manual tweaking is required only in the following scenario:
dataset = dataset.batch(batch_size, drop_remainder=True)
This may discard the last few samples in the file to ensure that each batch has a static shape (batch_size). Note that during inference, if the inference data volume of the last iteration is less than the batch size, you need to pad blank data to the iteration to the batch size. Otherwise, training may fail if the script has an assertion at the end, indicating that the number of validation results equals to the number of validation samples.
assert num_written_lines == num_actual_predict_examples
Model Building, Loss Calculation, and Gradient Update
The code snippet is ready to use in normal cases. Manual tweaking is required only in the following scenarios:
- If tf.device is used on the original network, delete the related code.
- Replacing dropout in the original network with the corresponding AscendCL API:Original TensorFlow code.
layers = tf.nn.dropout()
Code after migration.
from npu_bridge.estimator import npu_ops layers = npu_ops.dropout()
- Replacing gelu in the original network with the corresponding AscendCL API:Original TensorFlow code.
def gelu(x): cdf = 0.5 * (1.0 + tf.tanh( (np.sqrt(2 / np.pi) * (x + 0.044715 * tf.pow(x, 3))))) return x*cdf layers = gelu()
Code after migration.
from npu_bridge.estimator.npu_unary_ops import npu_unary_ops layers = npu_unary_ops.gelu(x)
Session Creation and Resource Initialization
When running your training script on Ascend AI Processor by using sess.run, note the following configurations:
- The following configuration option is disabled by default and should not be enabled:
rewrite_options.disable_model_pruning
- The following configuration options are enabled by default and should not be disabled:
- rewrite_options.function_optimization
- rewrite_options.constant_folding
- rewrite_options.shape_optimization
- rewrite_options.arithmetic_optimization
- rewrite_options.loop_optimization
- rewrite_options.dependency_optimization
- rewrite_options.layout_optimizer
- rewrite_options.memory_optimization
- The following configuration option is enabled by default and should be disabled explicitly:
rewrite_options.remapping
- In the distributed scenario, add the GradFusionOptimizer optimizer manually.
rewrite_options.optimizers.extend(["GradFusionOptimizer"])
- The following configuration option is disabled by default and should be enabled explicitly for training on Ascend AI Processor.
custom_op.parameter_map["use_off_line"].b = True
Original TensorFlow code.
# Construct the iterator. iterator=Iterator.from_structure(train_dataset.output_types,train_dataset.output_shapes) # Obtain the batch data. next_batch=iterator.get_next() # Initialize the iterator. training_init_op=iterator.make_initializer(train_dataset) # Initialize the variables. init=tf.global_variables_initializer() sess=tf.Session() sess.run(init) # Obtain the number of training/validation steps per epoch. train_batches_per_epoch=int(np.floor(train_size/batch_size))
Code after migration.
from npu_bridge.estimator import npu_ops from tensorflow.core.protobuf.rewriter_config_pb2 import RewriterConfig # Construct the iterator. iterator=Iterator.from_structure(train_dataset.output_types,train_dataset.output_shapes) # Obtain the batch data. next_batch=iterator.get_next() # Initialize the iterator. training_init_op=iterator.make_initializer(train_dataset) # Initialize the variables. init=tf.global_variables_initializer() # Create a session. config = tf.ConfigProto() custom_op = config.graph_options.rewrite_options.custom_optimizers.add() custom_op.name = "NpuOptimizer" custom_op.parameter_map["use_off_line"].b = True # Must be explicitly enabled for training on Ascend AI Processor. config.graph_options.rewrite_options.remapping = RewriterConfig.OFF # Remapping must be disabled explicitly. config.graph_options.rewrite_options.optimizers.extend(["GradFusionOptimizer"]) # Required in the distributed training scenario. sess = tf.Session(config=config) sess.run(init) # Obtain the number of training/validation steps per epoch. train_batches_per_epoch=int(np.floor(train_size/batch_size))
The Ascend platform supports all native functions of tf.Session.
It also allows you to enable functions such as automatic mixed precision. For details, see the corresponding API description.
Training
The code snippet is ready to use in normal cases.
# Start cyclic iteration. for epoch in range(num_epochs): ##Initialize iterator with the training dataset sess.run(training_init_op) for step in range(train_batches_per_epoch): #get next batch of data img_batch,label_batch=sess.run(next_batch) #run the training op _,train_loss = sess.run([train_op, loss],feed_dict={x:img_batch,y_:label_batch,is_training:True})