Preparing .npy Data of a Model Running on the GPU or CPU
Prerequisites
Before generating .npy files of a TensorFlow model, a complete, executable, and standard TensorFlow model training project is required.
Preparing .npy Files
You can use the TensorFlow debugger (tfdbg) to generate .npy files. The major steps are as follows:
- Add the debugging configuration option to the TensorFlow training project script.
- In Estimator mode, add the tfdbg hook, as shown in Figure 5-1.
- In session.run mode, set the tfdbg wrapper before run, as shown in Figure 5-2.
- Run the training script.
In the debugger interactive command line, enter run to run the script.
- Collect .npy files.
After the script is executed, you can run the lt command to query the stored tensors, run the pt command to view the tensor content, and save it as a file in NumPy format.
The tfdbg dumps only one tensor at a time. To dump all tensors, perform the following steps:
- Run lt > tensor_name to temporarily store all tensor names to a file.
- Exit the tfdbg command line, enter the Linux command line, and run the following command to generate commands to run in tfdbg:
timestamp=$[$(date +%s%N)/1000] ; cat tensor_name | awk '{print "pt",$4,$4}' | awk '{gsub("/", "_", $3);gsub(":", ".", $3);print($1,$2,"-n 0 -w "$3".""'$timestamp'"".npy")}' > tensor_name_cmd.txt
- The tensor names in the example are stored in the tensor_name_cmd.txt file. The .npy file names meet the naming rules for accuracy comparison, where, tensor_name is the name of the file that stores all tensor names and timestamp is of 16 bits.
- You can also run the command in the new window without exiting the tfdbg command line.
- Go back to the tfdbg command line, run the script, and paste and execute the content in the tensor_name_cmd.txt file generated in the previous step to save all .npy files.
By default, .npy files are stored using numpy.save(). Slashes (/) and colons (:) are replaced by underscores (_).
If the command cannot be pasted on the CLI, run the mouse off command in the tfdbg command line to disable the mouse mode before pasting again.
- Check whether names of the generated .npy files comply with the naming rules, as shown in Figure 5-3.
- Names of the .npy files are in {op_name}.{output_index}.{timestamp}.{npy} format, where op_name must comply with the A-Za-z0-9_- regular expression rule, timestamp is of 16 bits, and output_index is a digit in the range 0–9.
- If the name of an .npy file exceeds 255 characters due to the long operator name, comparison of this operator is not supported.
- The name of some .npy files may not meet the naming requirements due to the tfdbg or operating environment. You can manually rename the files based on the naming rules. If there are a large number of .npy files that do not meet the requirements, generate .npy files again by referring to How Do I Handle Exceptions in the Generated .npy File Names in Batches?