Built-in Operator Not Encapsulated into an ACL API
Basic Principles
The following is the basic procedure of executing a single operator. For details, see Postprocessing Data Using Single-Operator and Returning Result to Host.
- Initialize resources, including initializing the ACL, setting the loading directory of the single-operator model file and specifying the device for computation.
- Call acl.init to initialize the ACL.
- Build the .json single-operator definition file into an offline model (.om file) that adapts to Ascend AI Processors in advance by referring to ATC Tool Instructions.
- A single-operator model file can be loaded using the following APIs:
Call acl.op.set_model_dir to set the directory for loading the model file. The single-operator model file (.om file) is stored in the directory.
Call acl.op.load to load the single-operator model data from the memory. The memory is managed by the user. Single-operator model data refers to the data that is loaded to the memory from the .om file. The .om file is built from a single operator.
- Call acl.rt.set_device to specify the device for computation.
- Call acl.rt.create_context to explicitly create a context, and call acl.rt.create_stream to explicitly create a stream. The default stream is used if no stream is created explicitly. The default stream is implicitly created with the acl.rt.set_device call. To pass the default stream to any API call, pass NULL directly.
- Copy the operator input data from the host to the device.
- Call acl.rt.memcpy to implement synchronous memory copy.
- Call acl.rt.memcpy_async to implement asynchronous memory copy.
- Execute the single operator.
A single operator can be executed in the following two modes:
- Construct the operator description information (such as the input and output tensor description and operator attributes), allocate memory for storing the input and output data of the operator, and call acl.op.execute to load and execute the operator.
In this mode, the system matches the model in the memory based on the operator description in every acl.op.execute call to execute the operator.
- Construct the operator description information (such as the input and output tensor description and operator attributes), allocate memory for storing the input and output data of the operator, call acl.op.create_handle to create a handle, and call acl.op.exec_with_handle to load and execute an operator.
In this mode, when acl.op.create_handle is called, the system matches the model in the memory based on the operator description, which is cached in the Handle. The Handle improves the efficiency for scenarios where the same operator is executed for multiples times with the acl.op.exec_with_handle call. Call acl.op.destroy_handle to free the handle when it is no longer needed.
- Construct the operator description information (such as the input and output tensor description and operator attributes), allocate memory for storing the input and output data of the operator, and call acl.op.execute to load and execute the operator.
- Copy the output data of the operator from the device to the host (memory on the host needs to be allocated in advance).
- Call acl.rt.memcpy to implement synchronous memory copy.
- Call acl.rt.memcpy_async to implement asynchronous memory copy.
- Destroy streams, contexts and devices in sequence.
- Call acl.rt_destroy_stream to destroy streams.
If no stream is created explicitly and the default stream is used, acl.rt.destroy_stream does not need to be called.
- Call acl.rt.destroy_context to destroy contexts.
If no context is created explicitly and the default context is used, acl.rt.destroy_context does not need to be called.
- Call acl.rt.reset_device to reset devices.
- Call acl.rt_destroy_stream to destroy streams.
- Call acl.finalize to deinitialize the ACL.