Collective Communication API Overview
The high-level API NPUDistributedOptimizer enables the user to automatically complete gradient aggregation without sensing AllReduce, implementing parallel data training. In addition, to meet the users' requirements for flexibility, collective communication provides common APIs such as rank management, gradient segmentation, and collective communication prototype.
To use the collective communication APIs, the FwkACLlib and TFPlugin software packages must be installed.
- If the --pylocal mode is used for the installation of FwkACLlib and TFPlugin, the corresponding .whl files are installed in the software installation paths, which are /fwkacllib/python/site-packages in the FwkACLlib installation path and /tfplugin/python/site-packages/ in the TFPlugin installation path.
- If the --pylocal mode is not used to for the installation of FwkACLlib and TFPlugin, the corresponding .whl files are installed in the local Python path.
Type |
API |
Description |
Definition File |
---|---|---|---|
Rank management |
Creates a collective communication group. |
{install_path_fwkacllib}/fwkacllib/python/site-packages/hccl/manage/api.py |
|
Destroys a collective communication group. |
|||
Obtains the number of ranks (that is, the number of devices) in a group. |
|||
Obtains the number of local ranks on the server where the devices in the group are located. |
|||
Obtains the rank ID of a device in a group. |
|||
Obtains the local rank ID of a device in a group. |
|||
Obtains the world rank ID of the process using the group rank ID. |
|||
Obtains the group rank ID of the process in the group using the world rank ID. |
|||
Gradient segmentation |
Sets the backward gradient segmentation policy in the collective communication group based on the gradient index ID. |
{install_path_fwkacllib}/fwkacllib/python/site-packages/hccl/split/api.py |
|
Sets the backward gradient segmentation policy in the collective communication group based on the gradient data volume ratio. |
|||
Collective communication prototype |
Provides the AllReduce function for collective communication in a group to reduce tensors with the same name on all nodes. |
{install_path_tfplugin}/tfplugin/python/site-packages/npu_bridge/hccl/hccl_ops.py |
|
Each device receives the aggregation of tensor data from all ranks in the order of the ranks. |
|||
Copies an N-element buffer on the root rank to all ranks. |
|||
Performs the same operation as the Reduce operation, except that each rank receives a subpart of the result. |
|||
Sends data to a rank within a collective communication group. |
|||
Receives data from a rank within a collective communication group. |