# Accuracy Optimization Solutions

## Data Type Conversion

Using high-precision data for intermediate computation by converting fp16 data into fp32 data can improve the accuracy. Note that the computation result needs to be converted to the original type.

Take the implementation of res = x * y as an example.

Given input tensors x and y of type fp16, convert them to the fp32 type before computation and convert the result data **res **back to the original data type fp16 after computation. The following is the sample operator code:

dtype = data_x.dtype if dtype == "float16": data_x = te.lang.cce.cast_to(data_x, "float32") data_y = te.lang.cce.cast_to(data_y, "float32") res = te.lang.cce.vmul(data_x, data_y) if dtype == "float16": res = te.lang.cce.cast_to(res, "float16")

In accuracy testing, the data also needs to be converted into the fp32 type.

## Instruction Accuracy Optimization

Avoid direct calling to interfaces with low accuracy by using replacing formulas. The vexp instruction and besseli0e operator are used as an example to describe how to improve the operator accuracy by formula replacement. The besseli0e operator is an even function, as shown in the following figure.

Therefore, only the (0, +∞) range needs to be considered. Two formulas can be used for fitting in the (0, 3.75) and (3.75, +∞) ranges with piecewise-defined functions. The following uses the formula in the (0, 3.75) range as an example.

Given coefficient

For and , the formula of the besseli0e operator is as follows:

According to the test result, the relative error of some points is greater than 1‰. Use Octave to analyze the test data in a visualized manner and locate problems. The following figure shows all the points with relative error greater than 1‰.

Obviously, the points with relative error greater than 1‰ are located near the 0 and 3.5 boundaries, that is, ranges (0, 0.2) and (3.35, 3.75). After the relative errors of all points in the domain are drawn for further analysis, it is found that the ranges of the relative errors close to 1‰ are (0.7, 1) and (2.7, 3), which are inconsistent with the actual ranges. Therefore, the accuracy problem is not caused by the formula itself.

Then, analyze the ranges (0, 0.2) and (3.35, 3.75). It is found that the actual compute process results of most x-coordinates are small and the abnormal points whose relative error is greater than 1‰ are distributed in the entire ranges, as shown in the following figure.

Through visualized analysis, we can reach the following preliminary conclusions:

- The points with high relative error are located near in the ranges (0, 0.2) and (3.35, 3.75)
- The approximations of
**besseli0e**are generally small, and the abnormal values are small. - The points with high relative error are abnormal points, which are irrelevant to the fitting formula of besseli0e.

Based on the preliminary conclusion, the accuracy problem is caused by the vexp instruction. However, we still need to analyze the intermediate result of the compute process to locate the specific step. Specifically, return the intermediate operator result, modify the comparison data generated using NumPy, and perform the ST to verify the correctness of each step until the specific compute process step is found. Intermediate error can accumulate or attenuate as the compute process proceeds. For the besseli0e operator, pay attention to the points that yields small compute process result with relative error greater than 0.8‰. The following figures show the relative error distribution before and after **e**^{x}**/e**** ^{x}** with the vexp instruction.

**e**

**after vexp compute process**

^{x}**e**

^{x}**/e**

**after vexp compute process**

^{x}Now, it can be determined that the relative error is caused by the vexp instruction. Because the compute process result of the vexp instruction is relatively large, the compute process result of besseli0e is relatively small, and error of some points in the range (0, 0.2) and (3.35, 3.75) is about 1‰.

For the accuracy problem caused by a single vexp instruction, we can consider fitting **e**** ^{v}** by Taylor series expansion. Taylor series expansion at x = 0 uses a simple formula. Perform Taylor series expansion on the low accuracy ranges (0, 0.2) and (3.35, 3.75) to test the accuracy after expansion, as shown in the following figures.

Obviously, the accuracy outside the range (–2, +2) does not meet the requirement. The range (3.35, 3.75) cannot be expanded using the Taylor series. Further analysis on the range (0, 0.2) shows that the accuracy is within 1‰. Add the **e**** ^{x}** implemented by Taylor series expansion to the besseli0e operator for verification. The relative error is shown in the following figure.

The accuracy of (0, 0.2) meets the requirement. Because the absolute values of the range (–0.2, 0) are found before compute process, the input in the range of (3.35, 3.75) can be mapped to the range of (–0.2, +0.2) for compute process. Based on the property of the **e**** ^{x}** function, select a proper formula for derivation, for example:

x=Qln2 + v, v∈(-0.2, 0.2)

Q = 5.1216 is obtained. Map **x** to **v**, and **e**** ^{v}** is obtained. Since e

^{x}=2

^{Q}e

^{v}: So far, we have obtained the

**e**

**formula of the range (0, 3.75).**

^{x}Select the corresponding formula to implement **e**** ^{x}** in three ranges. The ST verifies that

**e**

**implemented by Taylor series expansion meets the accuracy requirement. Add**

^{x}**e**

**to besseli0e to verify that the operator accuracy, and the result is also as expected.**

^{x}The expansion step increases the build and execution time and therefore reduces the performance. You are advised to avoid expansion as long as the accuracy meets the requirements. For the unfulfilled range, using Taylor series expansion can only guarantee that the accuracy of the short ranges near the expanded point. Therefore, the target range must be determined before performing formula deduction. Map the computations of the unfulfillment range to the computations of the fulfillment range. For details, see Method 1: Range Mapping.

## Mathematical Formula Optimization

In mathematics, the Taylor series of a function is an infinite sum of terms that are expressed in terms of the function's derivatives at a single point. For most common functions, the function and the sum of its Taylor series are equal near this point. If zero is the point where the derivatives are considered, a Taylor series is also called a Maclaurin series. The Taylor series of a real or complex-valued function **f (x)** that is infinitely differentiable at a real or complex number **a** is the power series. In the more compact sigma notation, this can be written as

where, denotes the factorial of , denotes the th derivative of **f **evaluated at the point .

Although the expanded expansion converges, the function may not be equal to its Taylor series. In actual application, the Taylor series needs to be truncated, taking only finite terms. The Taylor series expansion order may be selected according to the allowed maximum error. When function is expanded, the fitting error around the expansion point is relatively small. Generally, a farther distance between a definition domain and the expansion point indicates a slower convergence and larger error. To simplify the series expression, Taylor series expansion (or Maclaurin series expansion) is performed at for fitting. For example:

They can achieve double 0.01% accuracy by using 6th-order Maclaurin series expansion and 7th-order Maclaurin series expansion respectively.

However, the convergence of some functions in some ranges is slow. The following figure shows the fitting curve of the arcsin function. When **x** is close to **1**, the fitting error is large and cannot be directly Maclaurin series expanded (because the convergence is too slow and even if the expansion order is increased the error is still large). Therefore, segmented fitting needs to be performed for such functions. In this case, to meet the accuracy requirement, perform segmented Taylor series expansion around different expansion points, or map the fitting result of an expansion range whose accuracy meets the requirement to another range by using a mathematical formula.

The following takes the arcsin function as an example to introduce three methods for solving the fitting accuracy problem.

### arcsin x Definition

arcsin **x **is the inverse function of sin** x** that limits the value range to [–π/2, +π/2]. The domain is [–1, +1]. The function is symmetrical with respect to the y-axis and therefore it is an odd function. The Maclaurin series expansion is expressed as:

In addition, it meets the following requirements:

### Method 1: Range Mapping

Because y = arcsin x converges quickly around zero, and fitting accuracy is high, we can map the range around zero to a range around** x = 1** by using a formula. After analyzing arcsin x, the formula may be used to perform range mapping. When the range boundary point is , the Maclaurin expansion fitting result in the range may be mapped to the range by using the foregoing formula.

The following figure shows the fitting result.

However, when the Maclaurin series expansion order in the range is less than or equal to 13 (with seven coefficients), the error is high from x = 0.68 to x = 0.73 (that is, the relative error requirement of 0.01% is not met), as shown in the following figure.

At this point, we can consider using higher-order Maclaurin series expansion to reduce the error.

### Method 2: Higher-order Maclaurin Series Expansion

Generally, you can use tools such as MATLAB and Octave to simulate the lowest expansion orders required to meet the accuracy requirement. The higher the expansion order, the higher the accuracy. However, more multiply-accumulate operations are required, which deteriorates the execution performance.

During simulation with Octave, the error falls within 0.01% using 15th-order Maclaurin series expansion (with eight coefficients), as shown in the following figure.

However, for some functions that are difficult to converge in some ranges, even with higher-order Taylor series expansion, the accuracy requirement cannot be met. Try to map the calculation result of a high-accuracy range (for example, mapping calculation result of range [0, 0.5] to using ) to accuracy unqualified range again. Alternatively, perform Taylor series expansion at different points. The following describes the segmented Taylor series expansion method.

### Method 3: Segmented Taylor Series Expansion

When Taylor series expansion is performed on arcsin x at , the infinite series may be represented as:

We may consider using Maclaurin series expansion to perform fitting in the range [0, 0.5], and using Taylor series expansion at to approximate where Maclaurin series expansion is directly used to perform fitting and yielding low accuracy. The result of the range is still obtained by Maclaurin result mapping of range .

As shown in the following figure, the red curve is the Taylor series expansion curve. The fitting effect around 0.6 is satisfactory.

The maximum relative error is 0.000070770, meeting the accuracy requirement.

# y is the approximation of arcsin x, and z is the fitting value. >> max(abs(y-z)./z) ans = 0.000070770

The error curve can also be viewed in a visualized manner. The code is as follows.

# x is an independent variable in the range [0, 1], y is the approximation of arcsin x, z is the fitting value, and m is a vector whose element sizes are all 0.0001. figure plot(x,abs(y-z)./z) hold on plot(x,m)