Processes, Threads, Devices, Contexts, and Streams
For details about the terms, see Terminology.
Devices, Contexts, and Streams
Figure 3-2 shows the relationship between devices, contexts, and streams. The details are as follows:
Term |
Description |
---|---|
Task/Kernel |
|
Stream |
|
Context |
|
Device |
|
Threads, Contexts, and Streams
- A context must be bound to a user thread. The usage and scheduling of all device resources must be based on the context.
- Only one context that is associated with the device is used in a thread at a time.
- acl.rt.set_context can be called to quickly switch between devices. The sample code is as follows:
... ctx1, ret = acl.rt.create_context(0) stream, ret = acl.rt.create_stream() ret = acl.op.execute(op_type, input_desc, inputs, output_desc, outputs, attr, stream) ctx2, ret = acl.rt.create_context(1) /* After ctx2 is created, the context used in the current thread changes to ctx2, and the corresponding tasks are computed on device 1. In this sample, op2 is executed on device 1. */ stream2, ret = acl.rt.create_stream() ret = acl.op.execute(op_type2, input_desc, inputs, output_desc, outputs, attr, stream2) ret = acl.rt.set_context(ctx1); /* Switch devices by switching contexts in the current thread so that the subsequent tasks can be computed on device 0.*/ ret = acl.op.execute(op3,...,s1) ...
- Multiple streams can be created in a thread, where tasks in different streams can be implemented in parallel. In multi-thread scenarios, you can also create one stream in each thread, where each stream is independent on the device, and tasks in each stream are executed in its original order.
- Multi-thread scheduling depends on the OS scheduling of the running application. Multi-stream scheduling is performed by the scheduling component on the device.
Context Migration Between Threads in a Process
- Multiple contexts can be created in a process, but only one context is used in a thread at a time.
- If multiple contexts are created in a thread, the last created context is used by default.
- If multiple contexts are created in a process, call acl.rt.set_context to set the context to be used.
Application Scenarios of Default Contexts and Default Streams
- Before operations are delivered on the device, a context and a stream must exist. The implicitly created context and stream are the default context and stream.
To pass the default stream to any API call, pass NULL directly.
- If an implicitly created context is used, acl.rt.get_context, acl.rt.set_context or acl.rt.destroy_context is not available.
- Implicitly created contexts and streams are applicable to simple applications where only one device is needed for computation. For multi-thread applications, you are advised to use the explicitly created contexts and streams.
The sample code is as follows:
... ret = acl.init(config_path) ret = acl.rt.set_device(device_id) # In the default context, a default stream is created, and the stream is available in the current thread. ... ret = acl.op.execute(op1, input_desc, inputs, output_desc, outputs, attr, 0) #0 indicates that op1 is executed in the default stream. ret = acl.op.execute(op2, input_desc, inputs, output_desc, outputs, attr, 0) #0 indicates that op2 is executed in the default stream. ret = acl.rt.synchronize_stream(0) # Output the result as required when all computing tasks (op1 and op2 execution tasks) are complete. ... ret = acl.rt.reset_device(device_id) # Reset device 0. The life cycles of the corresponding default context and default stream end.
Multi-Thread and Multi-Stream Performance
- Thread scheduling depends on the OS. The device scheduling unit schedules tasks in streams. When tasks in different streams in a process contend for resources on the device, the performance may be lower than that of a single stream scenario.
- The processor provides different execution components, such as the AI Core, AI CPU and Vector Core, so that different tasks can be executed by different components. You are advised to create streams based on the operator execution engines.
- The performance depends on the logic implementation of the app. Generally, the performance of a single-thread, multi-stream scenario is slightly better than that of a multi-thread, multi-stream scenario, because less thread scheduling is involved at the app layer.