Accuracy Optimization Solutions
Data Type Conversion
Using high-precision data for intermediate computation by converting fp16 data into fp32 data can improve the accuracy. Note that the computation result needs to be converted to the original type.
Take the implementation of res = x * y as an example.
Given input tensors x and y of type fp16, convert them to the fp32 type before computation and convert the result data res back to the original data type fp16 after computation. The following is the sample operator code:
dtype = data_x.dtype if dtype == "float16": data_x = te.lang.cce.cast_to(data_x, "float32") data_y = te.lang.cce.cast_to(data_y, "float32") res = te.lang.cce.vmul(data_x, data_y) if dtype == "float16": res = te.lang.cce.cast_to(res, "float16")
In accuracy testing, the data also needs to be converted into the fp32 type.
Instruction Accuracy Optimization
Avoid direct calling to interfaces with low accuracy by using replacing formulas. The vexp instruction and besseli0e operator are used as an example to describe how to improve the operator accuracy by formula replacement. The besseli0e operator is an even function, as shown in the following figure.
Therefore, only the (0, +∞) range needs to be considered. Two formulas can be used for fitting in the (0, 3.75) and (3.75, +∞) ranges with piecewise-defined functions. The following uses the formula in the (0, 3.75) range as an example.
Given coefficient
For and
, the formula of the besseli0e operator is as follows:
According to the test result, the relative error of some points is greater than 1‰. Use Octave to analyze the test data in a visualized manner and locate problems. The following figure shows all the points with relative error greater than 1‰.
Obviously, the points with relative error greater than 1‰ are located near the 0 and 3.5 boundaries, that is, ranges (0, 0.2) and (3.35, 3.75). After the relative errors of all points in the domain are drawn for further analysis, it is found that the ranges of the relative errors close to 1‰ are (0.7, 1) and (2.7, 3), which are inconsistent with the actual ranges. Therefore, the accuracy problem is not caused by the formula itself.
Then, analyze the ranges (0, 0.2) and (3.35, 3.75). It is found that the actual computation results of most x-coordinates are small and the abnormal points whose relative error is greater than 1‰ are distributed in the entire ranges, as shown in the following figure.
Through visualized analysis, we can reach the following preliminary conclusions:
- The points with high relative error are located near in the ranges (0, 0.2) and (3.35, 3.75)
- The approximations of besseli0e are generally small, and the abnormal values are small.
- The points with high relative error are abnormal points, which are irrelevant to the fitting formula of besseli0e.
Based on the preliminary conclusion, the accuracy problem is caused by the vexp instruction. However, we still need to analyze the intermediate result of the computation process to locate the specific step. Specifically, return the intermediate operator result and modify the comparison data generated using NumPy, perform the ST to verify the correctness of each step until the specific computation step is found. Intermediate error can accumulate or attenuate as the computation proceeds. For the besseli0e operator, pay attention to the points that yields small computation result with relative error greater than 8‰. The following figures show the relative error distribution before and after ex/ex with the vexp instruction.
Now, it can be determined that the relative error is caused by the vexp instruction. Because the computation result of the vexp instruction is relatively large, the computation result of besseli0e is relatively small, and error of some points in the range (0, 0.2) and (3.35, 3.75) is about 1‰.
For the accuracy problem caused by a single vexp instruction, we can consider fitting ev by Taylor series expansion. Taylor series expansion at x = 0 uses a simple formula. Perform Taylor series expansion on the low accuracy ranges (0, 0.2) and (3.35, 3.75) to test the accuracy after expansion, as shown in the following figures.
Obviously, the accuracy outside the range (–2, +2) does not meet the requirement. The range (3.35, 3.75) cannot be expanded using the Taylor series. Further analysis on the range (0, 0.2) shows that the accuracy is within 1‰. Add the ex implemented by Taylor series expansion to the besseli0e operator for verification. The relative error is shown in the following figure.
The accuracy of (0, 0.2) meets the requirement. Because the absolute values of the range (–0.2, 0) are found before computation, the input in the range of (3.35, 3.75) can be mapped to the range of (–0.2, +0.2) for computation. Based on the property of the ex function, select a proper formula for derivation, for example:
x=Qln2 + v, v∈(-0.2, 0.2)
Q = 5.1216 is obtained. Map x to v, and ev is obtained. Since ex=2Qev: So far, we have obtained the ex formula of the range (0, 3.75).
Select the corresponding formula to implement ex in three ranges. The ST verifies that ex implemented by Taylor series expansion meets the accuracy requirement. Add to besseli0e to verify that the operator accuracy, and the result is also as expected.
The expansion step increases the build and execution time and therefore reduces the performance. You are not advised to avoid expansion as long as the accuracy meets the requirements. For the unfulfilled range, using Taylor series expansion can only guarantee that the accuracy of the short ranges near the expanded point. Therefore, the target range must be determined before performing formula deduction. Map the computations of the unfulfillment range to the computations of the fulfillment range. For details, see Method 1: Range Mapping.
Mathematical Formula Optimization
In mathematics, the Taylor series of a function is an infinite sum of terms that are expressed in terms of the function's derivatives at a single point. For most common functions, the function and the sum of its Taylor series are equal near this point. If zero is the point where the derivatives are considered, a Taylor series is also called a Maclaurin series. The Taylor series of a real or complex-valued function f (x) that is infinitely differentiable at a real or complex number a is the power series. In the more compact sigma notation, this can be written as
where, denotes the factorial of
,
denotes the
th derivative of f evaluated at the point
.
Although the expanded expansion converges, the function may not be equal to its Taylor series. In actual application, the Taylor series needs to be truncated, taking only finite terms. The Taylor series expansion order may be selected according to the allowed maximum error. When function is expanded, the fitting error around the expansion point is relatively small. Generally, a farther distance between a definition domain and the expansion point indicates a slower convergence and larger error. To simplify the series expression, Taylor series expansion (or Maclaurin series expansion) is performed at
for fitting. For example:
They can achieve double 0.01% accuracy by using 6th-order Maclaurin series expansion and 7th-order Maclaurin series expansion respectively.
However, the convergence of some functions in some ranges is slow. The following figure shows the fitting curve of the arcsin function. When x is close to 1, the fitting error is large and cannot be directly Maclaurin series expanded (because the convergence is too slow and even if the expansion order is increased the error is still large). Therefore, segmented fitting needs to be performed for such functions. In this case, to meet the accuracy requirement, perform segmented Taylor series expansion around different expansion points, or map the fitting result of an expansion range whose accuracy meets the requirement to another range by using a mathematical formula.
The following takes the arcsin function as an example to introduce three methods for solving the fitting accuracy problem.
arcsin x Definition
arcsin x is the inverse function of sin x that limits the value range to [–π/2, +π/2]. The domain is [–1, +1]. The function is symmetrical with respect to the y-axis and therefore it is an odd function. The Maclaurin series expansion is expressed as:
In addition, it meets the following requirements:
Method 1: Range Mapping
Because y = arcsin x converges quickly around zero, and fitting accuracy is high, we can map the range around zero to a range around x = 1 by using a formula. After analyzing arcsin x, the formula may be used to perform range mapping. When the range boundary point is
, the Maclaurin expansion fitting result in the range
may be mapped to the range
by using the foregoing formula.
The following figure shows the fitting result.
However, when the Maclaurin series expansion order in the range is less than or equal to 13 (with seven coefficients), the error is high from x = 0.68 to x = 0.73 (that is, the relative error requirement of 0.01% is not met), as shown in the following figure.
At this point, we can consider using higher-order Maclaurin series expansion to reduce the error.
Method 2: Higher-order Maclaurin Series Expansion
Generally, you can use tools such as MATLAB and Octave to simulate the lowest expansion orders required to meet the accuracy requirement. The higher the expansion order, the higher the accuracy. However, more multiply-accumulate operations are required, which deteriorates the execution performance.
During simulation with Octave, the error falls within 0.01% using 15th-order Maclaurin series expansion (with eight coefficients), as shown in the following figure.
However, for some functions that are difficult to converge in some ranges, even with higher-order Taylor series expansion, the accuracy requirement cannot be met. Try to map the calculation result of a high-accuracy range (for example, mapping calculation result of range [0, 0.5] to using
) to accuracy unqualified range again. Alternatively, perform Taylor series expansion at different points. The following describes the segmented Taylor series expansion method.
Method 3: Segmented Taylor Series Expansion
When Taylor series expansion is performed on arcsin x at , the infinite series may be represented as:
We may consider using Maclaurin series expansion to perform fitting in the range [0, 0.5], and using Taylor series expansion at to approximate
where Maclaurin series expansion is directly used to perform fitting and yielding low accuracy. The result of the range
is still obtained by Maclaurin result mapping of range
.
As shown in the following figure, the red curve is the Taylor series expansion curve. The fitting effect around 0.6 is satisfactory.
The maximum relative error is 0.000070770, meeting the accuracy requirement.
# y is the approximation of arcsin x, and z is the fitting value. >> max(abs(y-z)./z) ans = 0.000070770
The error curve can also be viewed in a visualized manner. The code is as follows.
# x is an independent variable in the range [0, 1], y is the approximation of arcsin x, z is the fitting value, and m is a vector whose element sizes are all 0.0001. figure plot(x,abs(y-z)./z) hold on plot(x,m)