QoS Queue Scheduling and Packet Discarding on S5700 Series Switches
Introduction
When the network is overloaded, queue scheduling ensures that the switch preferentially processes key services, or packets can be discarded to avoid congestion. This document uses Huawei S5700 series switches as an example to introduce traffic control for relieving network congestion.
Prerequisites
This document uses the S5700 series switches of V200R019C00 as an example. There may be differences in the implementation of different switch models and versions. For details, see the product documentation of the matching series and version.
Context
Queue scheduling and packet discarding are used to cope with network congestion. Congestion means the low data forwarding rate and extra delay resulting from insufficient network resources. For example, when a large amount of data flows from a high-bandwidth link to a low-bandwidth link, the outbound interface of the low-bandwidth link cannot process excess data flows. Congestion is common in a complex networking environment where IP packet switching and multi-service are deployed.
Increasing the link bandwidth is the best solution to prevent congestion. On hardware devices with limited bandwidth resources, other methods are available for preventing, mitigating, or controlling network congestion. In scenarios where important services need to be preferentially processed upon burst traffic, queue scheduling (also called congestion management) and packet discarding (also called congestion avoidance) are recommended. In scenarios where the service traffic exceeds the bandwidth limit for a long period of time, you need to expand the network capacity or use dedicated devices to control services based on upper-layer applications.
Queue Scheduling and Packet Discarding
PQ Scheduling
Priority queuing (PQ) schedules packets in descending order of priority. Packets in queues with a lower priority can be scheduled only after all packets in queues with a higher priority have been scheduled. A switch enabled with PQ scheduling schedules packets based on priorities 7, 6, 5, 4, 3, 2, 1, and 0 in descending order.
In Figure 1-1, the priorities of queues 7 to 0 are in descending order of priority. The packets in queue 7 are processed first. The scheduler processes packets in queue 6 only after queue 7 becomes empty. The packets in queue 6 are sent at the link rate when packets in queue 6 need to be sent and queue 7 is empty. The packets in queue 5 are sent at the link rate when queue 6 and queue 7 are empty, and so on.
PQ scheduling is valid for short-delay services. Assume that data flow X is mapped to the queue of the highest priority on each node. When packets of data flow X reach a node, the packets are processed first.
The PQ scheduling mechanism, however, may result in starvation of packets in queues with lower priorities. For example, if data flows mapped to queue 7 arrive at a 100% link rate in a period, the scheduler does not process flows in queues 0 to 6.
To prevent starvation of packets in some queues, upstream devices need to accurately define service characteristics of data flows so that service flows mapped to queue 7 do not exceed a given percentage of the link capacity. By doing this, queue 7 is not full and the scheduler can process packets in queues with lower priorities.
WRR Scheduling
Figure 1-2 shows the process of Weighted Round Robin (WRR) scheduling.
In WRR scheduling, the device schedules packets in queues in a polling manner based on the queue weight. After one round of scheduling, the weights of all queues are decreased by 1. The queue whose weight is decreased to 0 cannot be scheduled. When the weights of all the queues are decreased to 0, the next round of scheduling starts. For example, the weights of eight queues on an interface are set to 4, 2, 5, 3, 6, 4, 2, and 1. Table 1-1 lists the WRR scheduling results.
Queue Index |
Queue 7 |
Queue 6 |
Queue 5 |
Queue 4 |
Queue 3 |
Queue 2 |
Queue 1 |
Queue 0 |
---|---|---|---|---|---|---|---|---|
Queue weight |
4 |
2 |
5 |
3 |
6 |
4 |
2 |
1 |
Queue in the first round of scheduling |
Queue 7 |
Queue 6 |
Queue 5 |
Queue 4 |
Queue 3 |
Queue 2 |
Queue 1 |
Queue 0 |
Queue in the second round of scheduling |
Queue 7 |
Queue 6 |
Queue 5 |
Queue 4 |
Queue 3 |
Queue 2 |
Queue 1 |
- |
Queue in the third round of scheduling |
Queue 7 |
- |
Queue 5 |
Queue 4 |
Queue 3 |
Queue 2 |
- |
- |
Queue in the fourth round of scheduling |
Queue 7 |
- |
Queue 5 |
- |
Queue 3 |
Queue 2 |
- |
- |
Queue in the fifth round of scheduling |
- |
- |
Queue 5 |
- |
Queue 3 |
- |
- |
- |
Queue in the sixth round of scheduling |
- |
- |
- |
- |
Queue 3 |
- |
- |
- |
Queue in the seventh round of scheduling |
Queue 7 |
Queue 6 |
Queue 5 |
Queue 4 |
Queue 3 |
Queue 2 |
Queue 1 |
Queue 0 |
Queue in the eighth round of scheduling |
Queue 7 |
Queue 6 |
Queue 5 |
Queue 4 |
Queue 3 |
Queue 2 |
Queue 1 |
- |
Queue in the ninth round of scheduling |
Queue 7 |
- |
Queue 5 |
Queue 4 |
Queue 3 |
Queue 2 |
- |
- |
Queue in the tenth round of scheduling |
Queue 7 |
- |
Queue 5 |
- |
Queue 3 |
Queue 2 |
- |
- |
Queue in the eleventh round of scheduling |
- |
- |
Queue 5 |
- |
Queue 3 |
- |
- |
- |
Queue in the twelfth round of scheduling |
- |
- |
- |
- |
Queue 3 |
- |
- |
- |
The statistics show that the scheduling count in each queue is proportional to the queue weight. A higher queue weight indicates a larger scheduling count. WRR scheduling is performed on a per-packet basis, and there is no fixed bandwidth for each queue. If packets are scheduled in the same manner, large-sized packets obtain more bandwidth than small-sized packets.
By default, queues on the S5720I-SI, S5720-LI, S5720S-LI, S5720S-SI, S5720-SI, S5730S-EI, S5730-SI, S5735-L, S5735S-L, S5735S-L-M, S5735-S, S5735-S-I, and S5735S-S use the WRR scheduling mode, and the WRR weight for WRR scheduling is 1 for all these queues. That is, the switch schedules packets in queues 0 to 7 in sequence.
WRR scheduling offsets the disadvantage of PQ scheduling in which packets in queues with lower priories may be not processed for a long period of time if congestion occurs. In addition, WRR scheduling can dynamically change the time of scheduling packets in queues. For example, if a queue is empty, WRR scheduling ignores this queue and starts to schedule the next queue. This improves bandwidth utilization. WRR scheduling, however, cannot schedule short-delay services in a timely manner.
WDRR Scheduling
Weighted Deficit Round Robin (WDRR) schedules packets based on the packet length, which is used as the weight. If a packet is too long, WDRR allows a negative weight value so that long packets can be scheduled. In the next round, the queue with a negative weight value is not scheduled until its weight value becomes positive. In WRR scheduling, with the same chance of scheduling, a large-sized packet obtains less bandwidth than a small-sized packet. WDRR overcomes this shortcoming by considering the packet length during scheduling, ensuring that packets of different sizes are scheduled equally.
When WDRR scheduling is used, you can set the weight for each queue. The switch schedules queues in round-robin manner according to their weights. By default, queues on the S5720-HI, S5730-HI, S5731-H, S5731-S, S5731S-H, S5731S-S, and S5732-H use the WDRR scheduling mode, and the WDRR weight of each queue is 1.
In WDRR scheduling, the deficit indicates the bandwidth deficit of each queue. The initial value is 0. The system allocates bandwidth to each queue based on the weight and calculates the deficit as follows: If the deficit of a queue is greater than 0, the queue participates in scheduling. The switch sends a packet and calculates the deficit based on the length of the sent packet. If the deficit of a queue is less than 0, the queue does not participate in scheduling. The current deficit is used as the basis for the next round of scheduling.
In Figure 1-3, the weights of queues 7, 6, 5, 4, 3, 2, 1, and 0 are set to 40, 30, 20, 10, 40, 30, 20, and 10, respectively. During scheduling, queues 7, 6, 5, 4, 3, 2, 1, and 0 obtain 20%, 15%, 10%, 5%, 20%, 15%, 10%, and 5% of the bandwidth, respectively. Queues 7 and 6 are used as examples to describe WDRR scheduling. Assume that queue 7 obtains 400 bytes/s bandwidth and queue 6 obtains 300 bytes/s bandwidth.
First round of scheduling
Deficit [7][1] = 0 + 400 = 400
Deficit [6][1] = 0 + 300 = 300
After a packet of 900 bytes in queue 7 and a packet of 400 bytes in queue 6 are sent, the deficits of queues are calculated as follows:
Deficit [7][1] = 400 – 900 = –500
Deficit [6][1] = 300 – 400 = –100
Second round of scheduling
Deficit [7][2] = –500 + 400 = –100
Deficit [6][2] = –100 + 300 = 200
No packet in queue 7 is scheduled because the deficit of queue 7 is negative. After a packet of 300 bytes in queue 6 is sent, the deficit is calculated as follows:
Deficit [6][2] = 200 – 300 = –100
Third round of scheduling
Deficit [7][3] = –100 + 400 = 300
Deficit [6][3] = –100 + 300 = 200
After a packet of 600 bytes in queue 7 and a packet of 500 bytes in queue 6 are sent, the deficits of queues are calculated as follows:
Deficit [7][3] = 300 – 600 = –300
Deficit [6][3] = 200 – 500 = –300
Such a process is repeated and finally queues 7 and 6 respectively obtain 20% and 15% of the bandwidth. Therefore, you can obtain the required bandwidth by setting proper weights.
WDRR offsets the disadvantages of PQ scheduling and WRR scheduling. That is, in PQ scheduling, packets in queues with lower priorities cannot be scheduled for a long period of time if congestion occurs; in WRR scheduling, bandwidth is allocated unevenly when the packet length of each queue is different or varies significantly. WDRR, however, cannot schedule delay-sensitive services in a timely manner.
Tail Drop
Traditionally, tail drop is used to discard packets. When network congestion occurs, packets enter a queue. If the queue is full, subsequent traffic will be directly dropped and cannot be buffered. This policy cannot provide differentiated services. The tail drop policy is the default policy on a switch.
This packet drop policy may cause global TCP synchronization. As a result, TCP connections cannot be set up. Below, three colors represent three TCP connections. When packets from multiple TCP connections are discarded, these TCP connections enter the congestion avoidance and slow start state. The traffic volume varies greatly.
On the S5720I-SI, S5720-LI, S5720S-LI, S5720S-SI, S5720-SI, S5730S-EI, and S5730-SI, a tail drop profile can be used to specify the maximum number of bytes and packets that can be buffered in a queue. If the maximum number of bytes or packets is reached, the switch considers that congestion occurs and will discard subsequent packets.
If the buffer size in the queue is always lower than the upper threshold, the switch does not discard any packet.
If the size of packets in the buffer increases gradually or even exceeds the upper limit, the switch discards the packets in the buffer until the size of packets in the queue does not exceed the lower limit.
WRED Scheduling
Based on RED, the switch supports Weighted Random Early Detection (WRED).
Compared with RED, WRED can discard packets based on the DSCP priority or IP precedence of packets in queues. The upper drop threshold, lower drop threshold, and drop probability can be set for each priority.
When the length of a queue is less than the lower drop threshold, no packet is dropped.
When the length of a queue exceeds the upper drop threshold, all the newly arrived packets are tail dropped.
When the length of a queue is between the lower drop threshold and the upper drop threshold, newly arrived packets are randomly dropped. WRED generates a random number for each incoming packet and compares it with the drop probability of the current queue. If the random number is less than the drop probability, the packet is dropped. A longer queue indicates a higher drop probability.
A WRED drop profile can be used to configure packet discarding only on the S5720-EI, S5720-HI, S5730-HI, S5731-H, S5731-S, S5731S-H, S5731S-S, and S5732-H.
WRED is configured in the outbound direction on an interface and is applied to queues. Different drop parameters are set for packets of different colors. WRED allocates different drop parameters to packets of different colors based on queues. The upper and lower drop thresholds of important packets are greater than those of non-important packets, whereas the maximum drop probability is lower.
WRED discards packets in queues based on the drop probability, preventing congestion to a certain degree.
A WRED drop profile processes packets based on their colors. Therefore, you need to color packets before configuring WRED.
The color represents the internal drop priority of packets on a switch and determines the sequence in which packets in one queue are dropped when traffic congestion occurs. The IEEE defines three colors: green, yellow, and red, in ascending order of priority. The drop priority depends on parameter settings. For example, green packets can use a maximum of 50% buffer and red packets can use a maximum of 100% buffer. Green packets have a higher drop priority than red packets. That is, red packets may not have a higher drop priority than green packets. Table 1-2 describes recommended WRED parameter settings for packets of different colors.
Packet Color |
Lower Drop Threshold (%) |
Upper Drop Threshold (%) |
Maximum Drop Probability |
---|---|---|---|
Green |
80 |
100 |
10 |
Yellow |
60 |
80 |
20 |
Red |
40 |
60 |
30 |
For details about how to map packet priorities to internal priorities or colors, see "Configuring Priority Mapping" under "CLI-based Configuration Guide - QoS Configuration Guide - Priority Mapping Configuration (DiffServ Domain Mode)" in the S2720, S5700, and S6700 Series Ethernet Switches Product Documentation Product.
Mappings Between Internal Priorities and Queue Indexes
By default, internal priorities and interface queues are mapped on a one-to-one basis. In real-world applications, you may need to change the mappings between internal priorities and queues or map different internal priorities to the same queue to save the device buffer. The device sends packets to different interface queues based on the internal priorities, and performs queue scheduling or packet discarding accordingly for each queue. Table 1-3 lists the default mappings between internal priorities (also CoS values or local priorities) and queue indexes on a fixed switch. For more information, see "Default Settings for Priority Mapping" under "CLI-based Configuration Guide - QoS Configuration Guide - Priority Mapping Configuration" in the product documentation of the corresponding version.
Internal Priority |
Queue Index |
---|---|
BE |
0 |
AF1 |
1 |
AF2 |
2 |
AF3 |
3 |
AF4 |
4 |
EF |
5 |
CS6 |
6 |
CS7 |
7 |
Multiple methods are available for changing the queues to which traffic is allocated, including:
- Configure MQC-based priority re-marking in the inbound direction. If a traffic policy contains remark 8021p, remark ip-precedence, or remark dscp, the system maps the re-marked priorities of packets to internal priorities and sends the packets to queues based on the mapped priorities.
- Run the qos local-precedence-queue-map local-precedence queue-index command to configure the mappings between internal priorities and queue indexes. The mappings between internal priorities and queue indexes take effect only on an inbound interface. That is, traffic enters queues based on the mappings.
Configuration Examples
Configuring WRR Scheduling
Among S5700 series switches, only the S5720I-SI, S5720-LI, S5720S-LI, S5720S-SI, S5720-SI, S5730S-EI, and S5730-SI support WRR scheduling.
Assume that the switch receives voice, video, and data packets from the Internet through upstream port GE0/0/1 and forwards the packets to users through downstream port GE0/0/2. The 802.1p priorities of voice, video, and data packets are 7, 5, and 2, respectively. To mitigate network congestion and ensure bandwidth for high-priority and delay-sensitive services, set the related parameters according to the following table.
Service Type |
802.1p Priority |
WRR Weight |
---|---|---|
Voice |
7 |
0 |
Video |
5 |
20 |
Data |
2 |
10 |
After the network is connected, perform the following operations:
- Configure the service packet priorities trusted by the inbound interface.
# Configure GE0/0/1 to trust 802.1p priorities of packets. Service packets are differentiated based on the 802.1p priority. Therefore, after packets enter GE0/0/1, the switch performs mapping based on the 802.1p priority.
[Switch] interface gigabitethernet 0/0/1 [Switch-GigabitEthernet0/0/1] trust 8021p [Switch-GigabitEthernet0/0/1] quit
- Configure a scheduling profile.
# Create a scheduling profile named p1 and set queue scheduling parameters. By default, packets with 802.1p priorities 7, 5, and 2 enter queue 7, queue 5, and queue 2 respectively.
[Switch] qos schedule-profile p1 [Switch-qos-schedule-profile-p1] qos wrr //Configure the WRR scheduling mode for interface queues. [Switch-qos-schedule-profile-p1] qos queue 7 wrr weight 0 //Set the WRR weight of voice traffic to 0. [Switch-qos-schedule-profile-p1] qos queue 5 wrr weight 20 //Set the WRR weight of video traffic to 20. [Switch-qos-schedule-profile-p1] qos queue 2 wrr weight 10 //Set the WRR weight of data traffic to 10. [Switch-qos-schedule-profile-p1] quit
- Apply the scheduling profile.
# Apply the scheduling profile p1 to downstream interface GE0/0/2 on the switch.
[Switch] interface gigabitethernet 0/0/2 [Switch-GigabitEthernet0/0/2] qos schedule-profile p1 [Switch-GigabitEthernet0/0/2] quit
- (Optional) Verify the configuration.
Run the display qos queue statistics interface interface-type interface-number [ queue queue-index ] command to view queue-based traffic statistics on the interface. In this example, pay attention to the number of packets that traverse queue 7, queue 5, and queue 2 (value of Passed Packets in the command output). The WRR weight of queue 5 is 20, and the WRR weight of queue 2 is 10. Therefore, the number of times that packets in queue 5 are scheduled is twice that in queue 2. The weight of queue 7 is set to 0, and the queue is scheduled in PQ mode. In this case, the overall scheduling mode is PQ+WRR. During scheduling, the switch first schedules traffic in queue 7 in PQ mode and schedules traffic in queues 5 and 2 in WRR mode only after all the traffic in queue 7 is scheduled.
[Switch] display qos queue statistics interface GigabitEthernet 0/0/2 ... ------------------------------------------------------------ Queue ID : 2 CIR(kbps) : 0 PIR(kbps) : 1,000,000 Passed Packets : 0 Passed Rate(pps) : 0 Passed Bytes : 0 Passed Rate(bps) : 0 Dropped Packets : 0 Dropped Rate(pps) : 0 Dropped Bytes : 0 Dropped Rate(bps) : 0 ------------------------------------------------------------ ... ------------------------------------------------------------ Queue ID : 5 CIR(kbps) : 0 PIR(kbps) : 1,000,000 Passed Packets : 0 Passed Rate(pps) : 0 Passed Bytes : 0 Passed Rate(bps) : 0 Dropped Packets : 0 Dropped Rate(pps) : 0 Dropped Bytes : 0 Dropped Rate(bps) : 0 ------------------------------------------------------------ ... ------------------------------------------------------------ Queue ID : 7 CIR(kbps) : 0 PIR(kbps) : 1,000,000 Passed Packets : 0 Passed Rate(pps) : 0 Passed Bytes : 0 Passed Rate(bps) : 0 Dropped Packets : 0 Dropped Rate(pps) : 0 Dropped Bytes : 0 Dropped Rate(bps) : 0 ------------------------------------------------------------
Configuring Packet Discarding Based on a WRED Drop Profile
Among S5700 series switches, packet discarding can be configured based on a WRED drop profile only on the S5720-EI, S5720-HI, S5730-HI, S5731-H, S5731-S, S5731S-H, S5731S-S, and S5732-H.
The recommended values in Table 1-2 are used to configure packet discarding based on a WRED drop profile on GE0/0/3. The 802.1p priority of traffic from department 1 is 2, and that of traffic from department 2 is 5.
After the network is connected, perform the following operations:
- Configure priority mapping.
# Configure a DiffServ domain sp, color packets, and apply the DiffServ domain sp to GE0/0/1 and GE0/0/2. A WRED drop profile processes packets based on their colors. Therefore, you need to color packets before configuring WRED.
[Switch] diffserv domain sp //Configure a DiffServ domain. [Switch-dsdomain-sp] 8021p-inbound 2 phb af2 red //Mark the packets from department 1 (packets with 802.1p priority 2) red and retain the default mappings between 802.1p values and PHBs. [Switch-dsdomain-sp] 8021p-inbound 5 phb ef yellow //Mark the packets from department 2 (packets with 802.1p priority 5) yellow and retain the default mappings between 802.1p values and PHBs. [Switch-dsdomain-sp] quit [Switch] interface gigabitEthernet 0/0/1 [Switch-GigabitEthernet0/0/1] trust upstream sp [Switch-GigabitEthernet0/0/1] quit [Switch] interface gigabitEthernet 0/0/2 [Switch-GigabitEthernet0/0/2] trust upstream sp [Switch-GigabitEthernet0/0/2] quit
- Configure a WRED drop profile.
# Create a WRED drop profile named sp and set WRED parameters for red and yellow packets according to the recommended values.
[Switch] drop-profile sp [Switch-drop-sp] color yellow low-limit 60 high-limit 80 discard-percentage 20 [Switch-drop-sp] color red low-limit 40 high-limit 60 discard-percentage 30 [Switch-drop-sp] quit
- Apply the WRED drop profile.
# Apply the WRED drop profile sp to queues 2 and 5 on the outbound interface GE0/0/3. By default, packets with 802.1p priorities 2 and 5 enter queue 2 and queue 5, respectively, and their internal priorities are AF2 and EF, respectively.
[Switch] interface gigabitEthernet 0/0/3 [Switch-GigabitEthernet0/0/3] qos queue 2 wred sp [Switch-GigabitEthernet0/0/3] qos queue 5 wred sp
- (Optional) Verify the configuration.
# Check the configuration of DiffServ domain sp.
[Switch] display diffserv domain name sp diffserv domain name:sp 8021p-inbound 0 phb be green 8021p-inbound 1 phb af1 green 8021p-inbound 2 phb af2 red 8021p-inbound 3 phb af3 green 8021p-inbound 4 phb af4 green 8021p-inbound 5 phb ef yellow 8021p-inbound 6 phb cs6 green 8021p-inbound 7 phb cs7 green 8021p-outbound be green map 0 ...
# Check the configuration of the WRED drop profile sp. The WRED parameters of green packets are not set in the WRED drop profile sp. Therefore, the upper drop threshold, lower drop threshold, and maximum drop probability of green packets are set to the default value 100.
[Switch] display drop-profile name sp Drop-profile[1]: sp Queue depth : default Color Low-limit High-limit Discard-percentage - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Green 100 100 100 Yellow 60 80 20 Red 40 60 30 -----------------------------------------------------------------
Based on the preceding configuration, when the length of packets from department 2 in queue 5 is 60% of the total queue length, the switch starts to discard packets in the queue. When the length of packets from department 2 in queue 5 is 80% of the total queue length, the switch discards all newly received packets, which is equivalent to the tail drop policy.
Related Information
Queue Scheduling Modes and Packet Loss Policies - QoS Implementation - QoS Issues (Issue 5)
QoS Configuration Guide in the S2720, S5700, and S6700 Series Ethernet Switches Product Documentation