Runtime Resource Allocation
You need to allocate runtime resources: devices, contexts, and streams in sequence. Contexts and streams can be created implicitly or explicitly.
- Implicitly created context and streams: They are applicable to simple apps without complex interaction logic. However, in multi-thread programming, each thread uses the default context or stream. The execution sequence of tasks in the default stream depends on the thread scheduling sequence of the operating system.
- (Recommended) Explicitly created contexts and streams: They are applicable to large-scale apps with complex interaction logic, offering better app readability and maintainability.
- Allocate devices, contexts, and streams in sequence.
- Call acl.rt.set_device to explicitly specify the device for computation.
- Call acl.rt.create_context to explicitly create a context, and call acl.rt.create_stream to explicitly create a stream.
- If no context and stream are created explicitly, the system uses the default context and stream, which are created implicitly with the acl.rt.set_device call.
To pass the default stream to any API call, pass NULL directly.
- Implicitly specify a device for computation.
Call acl.rt.create_context to explicitly create a context, and call acl.rt.create_stream to explicitly create a stream. When explicitly creating the context, the system calls acl.rt.set_device to specify a device. The deviceId is passed to the acl.rt.create_context call.
- Call acl.rt.set_device to explicitly specify the device for computation.
- (Optional) When the executable file of an application can be executed on both the host and device, acl.rt.get_run_mode needs to be called to obtain the run mode of the software stack during programming, and the subsequent memory application interface calling logic is determined based on the run mode.
- If the executable file of an application is executed on the host, data transfer between the host and the device may be involved. In this case, acl.rt.memcpy (synchronous mode) or acl.rt.memcpy_async (asynchronous mode) needs to be called to implement data movement through memory copy.
- If the executable file of an application is executed on the device, data movement between the host and the device is not involved.
In Atlas 200 DK scenario, the host and the device are deployed on the developer board, and data transfer between the host and device is not involved either.