Ascend 310 AI Processor Cannot Be Started Properly
This section applies only to NPU 20.2.X or later.
You can check the NPU version from the driver package name. For example, if the driver package name is A300-3000-NPU_Driver-20.0.0-ARM64-Ubuntu18.04.run, the NPU version is 20.0.0.
Symptom
The Ascend 310 AI Processor cannot be started properly, and the system log records "dev id 0, The address does not comply with the NVMe protocol(must be aligned with 4KB)."
Possible Cause
The address allocated to the host kernel interface dma_alloc_coherent for the NVMe protocol does not meet the 4K alignment requirement.
According to the description in the kernel DMA interface document, the cost of using consistent memory on most platforms is high. Generally, the minimum allocation length is one page.
Using Consistent DMA mappings ============================= To allocate and map large (PAGE_SIZE or so) consistent DMA regions, you should do:: dma_addr_t dma_handle; cpu_addr = dma_alloc_coherent(dev, size, &dma_handle, gfp); where device is a ``struct device *``. This may be called in interrupt context with the GFP_ATOMIC flag. Size is the length of the region you want to allocate, in bytes. This routine will allocate RAM for that region, so it acts similarly to __get_free_pages() (but takes size instead of a page order). If your driver needs regions sized smaller than a page, you may prefer using the dma_pool interface, described below. The consistent DMA mapping interfaces, will by default return a DMA address which is 32-bit addressable. Even if the device indicates (via the DMA mask) that it may address the upper 32-bits, consistent allocation will only return > 32-bit addresses for DMA if the consistent DMA mask has been explicitly changed via dma_set_coherent_mask(). This is true of the dma_pool interface as well. dma_alloc_coherent() returns two values: the virtual address which you can use to access it from the CPU and dma_handle which you pass to the card. The CPU virtual address and the DMA address are both guaranteed to be aligned to the smallest PAGE_SIZE order which is greater than or equal to the requested size. This invariant exists (for example) to guarantee that if you allocate a chunk which is smaller than or equal to 64 kilobytes, the extent of the buffer you receive will not cross a 64K boundary.
The possible causes of 4K alignment failures are as follows:
- The value of PAGESIZE in the user kernel is less than 4096 bytes.
- The implementation of dma_alloc_coherent is different from that in the Linux community.
Solution
- Run the following command to query the value of PAGESIZE:getconf PAGESIZE
4096
- If the value of PAGESIZE is less than 4096 bytes, modify the host kernel or change the value of PAGESIZE.
- If the value of PAGESIZE is greater than 4096 bytes, check the consistency memory application mode of the host kernel interface dma_alloc_coherent and ensure that the applied address is 4K aligned.