What is QoS?
One of the key aspects that must be considered when designing Enterprise Campus solutions is Quality of Service (QoS). There are many different categories of the QoS approach, including:
- Shaping
- Policing
- Congestion management
- Congestion avoidance
- Link efficiency mechanisms
QoS involves a wide variety of techniques, especially in networks that offer multimedia services (voice and/or video), because these services are usually delay sensitive and require low latency and low jitter. Traffic generated by these applications must be prioritized using QoS techniques. You must understand what is QoS if you plan to study for Cisco or CompTIA exams as well as work on live networks.
We cover QoS basics in our free CompTIA Network+ book.
We teach QoS in detail in our Cisco CCNP video course.
Congestion Management
Congestion happens for many different reasons in modern networks. In situations where a specific link is constantly congested, the link may need to be upgraded, but when experiencing occasional congestion on a particular link, QoS congestion management techniques can be used to take care of the problem.
The approach used for congestion management is called queuing. Applying queuing techniques means using techniques other than the default First In First Out (FIFO) method. An interface consists of two different queuing areas (see figure below):
- Hardware queue (or Transmit Ring – TX Ring)
- Software queue
Interface Queue Types
The hardware queue on the interface always uses the FIFO method for packet treatment. This mode of operation ensures that the first packet in the hardware queue is the first packet that will leave the interface. The only TX Ring parameter that can be modified on most Cisco devices is the queue length.
The software queue is the place where most of the congestion management manipulations occur. The software queue is used to order packets before they use the hardware queue, and they can be configured with different queuing strategies.
Congestion might occur because of the high-speed LAN connections that aggregate into the lower speed WAN connections. Aggregation refers to being able to support the cumulative effect of all the users wanting to use the connection.
There are many different approaches (queuing strategies) that can be used in congestion management, such as the following:
- First In First Out (FIFO)
- Priority Queuing (PQ)
- Round Robin (RR)
- Weighted Round Robin (WRR)
- Deficit Round Robin (DRR)
- Modified Deficit Round Robin (MDRR)
- Shaped Round Robin (SRR)
- Custom Queuing (CQ)
- Fair Queuing (FQ)
- Weighted Fair Queuing (WFQ)
- Class Based Weighted Fair Queuing (CBWFQ)
- Low Latency Queuing (LLQ)
Note: All of the techniques mentioned above are used in the interface software queue. The hardware queue always uses FIFO.
FIFO is the least complex method of queuing. It operates by giving priority to the first packets received. This is also the default queuing mechanism for software queues in high-speed Cisco interfaces. Having a sufficient budget to overprovision the congested links will allow the use of FIFO on all of the interfaces (hardware and software queues). However, in most situations, this is not possible, so some other kinds of advanced queuing techniques, such as WFQ, CBWFQ, or LLQ, will have to be employed. These are the most modern queuing strategies that will ensure that important packets receive priority during times of congestion.
FIFO used in the software queue will not make a determination on packet priority that is usually signaled using QoS markings. Relying on FIFO but still experiencing congestion means the traffic could be affected by situations like delay or jitter, and important traffic might be starved and might not reach its destination.
WFQ is a Cisco default technique used on slow-speed interfaces (less than 2 Mbps) because it is considered more efficient than FIFO in this case. WFQ functions by dynamically sorting the traffic into flows, and then dedicating a queue for each flow while trying to allocate the bandwidth fairly. It will do this by inspecting the QoS markings and giving priority to higher priority traffic.
WFQ is not the best solution in every scenario because it does not provide enough control in the configuration (it does everything automatically), but it is far better than the FIFO approach because interactive traffic flows that generally use small packets (e.g., VoIP) get prioritized to the front of the software queue. This ensures that high-volume talkers do not use all of the interface bandwidth. The WFQ fairness aspect also makes sure that high-priority interactive conversations do not get starved by high-volume traffic flows.
Weighted Fair Queuing Logic
As illustrated in the figure above, the different WFQ traffic flows are placed into different queues before entering the WFQ scheduler, which will allow them to pass to the hardware queue based on the defined logic. If one queue fills, the packets will be dropped, but this will also be based on a WFQ approach (lower priority packets are dropped first), as opposed to the FIFO approach of tail dropping.
Because WFQ lacks a certain level of control, another congestion management technology called Custom Queuing (CQ) was created. Even though CQ is a legacy technology, it is still implemented in some environments. CQ is similar to WFQ but it operates by manually defining the 16 static queues and the allocation of the number of bytes or packets for each queue. The network designer can assign a byte count for each queue (i.e., the number of bytes that are to be sent from each queue). Queue number 0 is reserved for the system to avoid starvation of key router messages.
Even though Custom Queuing provides flexible congestion management, this does not work well with VoIP implementations because of the round robin nature of CQ. For example, four queues are allocated a different number of packets (Q1=10 packets, Q2=20 packets, Q3=50 packets, and Q4=100 packets) over a time interval. Even though Q4 has priority, the interface is still using a round robin approach (Q4-Q3-Q2-Q1-Q4…and so on). This is not appropriate for VoIP scenarios because voice traffic needs strict priority for a constant traffic flow that will minimize jitter. As a result, another legacy technology called Priority Queuing (PQ) was invented. PQ places packets into four priority queues:
- Low
- Normal
- Medium
- High
As mentioned, VoIP traffic is placed in the high-priority queue to ensure absolute priority. However, this can lead to the starvation of other queues, so PQ is not recommended for use in modern networks.
If VoIP is not used in the network, the most recommended congestion management technique is CBWFQ, which defines the amount of bandwidth that the various forms of traffic will receive. Minimum bandwidth reservations are defined for different classes of traffic.
Class Based Weighted Fair Queuing Logic
As illustrated above in the above figure, CBWFQ logic is based on a CBWFQ scheduler that receives information from queues defined for different forms of traffic. The traffic that does not fit any manually defined queue automatically falls into the “class-default” queue. These queues can be assigned minimum bandwidth guarantees for all traffic classes. CBWFQ offers powerful methodologies for controlling exactly how much bandwidth these various classifications will receive. If it contains more than one traffic type, each individual queue will use the FIFO method inside the hardware queue, so the network designer should not combine too many forms of traffic inside a single queue.
Considering the inefficiency of CBWFQ when using VoIP, another QoS technique was developed: LLQ. As shown in the figure below, this adds a priority queue (usually for voice traffic) to the CBWFQ system, so LLQ is often referred to as an extension of CBWFQ (i.e., LLQ=PQ-CBWFQ).
Low Latency Queuing Logic
Adding a priority queue to CBWFQ will not lead to starvation because this queue is policed so that the amount of bandwidth guaranteed for voice cannot exceed a particular value. Since voice traffic gets its own priority treatment, the remaining traffic forms will use WFQ based on bandwidth reservation values.
Congestion Avoidance
Congestion avoidance is another category of Differentiated Services QoS often deploys in WANs. When both the hardware and the software queues fill up, they are tail dropped at the end of the queue, which can lead to voice traffic starvation and/or to the TCP global synchronization process described earlier. Using congestion avoidance techniques can guard against global synchronization problems. The most popular congestion avoidance mechanism is called Random Early Detection (RED). Cisco’s implementation is called Weighted Random Early Detection (WRED). These QoS tools try to prevent congestion from occurring by randomly dropping unimportant traffic before the queue gets full.
Shaping and Policing
Shaping and policing are not the same technique, but many people think they are. Shaping is the process that controls the way traffic is sent (i.e., it buffers excess packets). Policing, on the other hand, will drop or re-mark (penalize) packets that exceed a given rate. Policing might be used to prevent certain applications from using all of the connection resources in a fast WAN or from offering only as many resources as certain applications with clear bandwidth requirements need.
Shaping is often used to prevent congestion in situations where there is asymmetric bandwidth. An example of this is a headquarters router that connects to a branch office router that has a lower bandwidth connection. In this type of environment, shaping can be employed when the headquarters router sends data so that it does not overwhelm the branch office router. Many times the contract between an ISP and its customer specifies a Committed Information Rate (CIR) value. This represents the amount of bandwidth purchased from the ISP. Shaping can be used to ensure that the data sent conforms to the specified CIR.
When comparing shaping and policing, shaping can be used only in the egress direction, while policing can be used in both the ingress and the egress directions. Another key distinction is that policing will drop or re-mark the packet, while shaping will queue the excess traffic. Because of this behavior, policing uses less buffering. Finally, shaping has the advantage of supporting Frame Relay congestion indicators by responding to Forward Explicit Congestion Notification (FECN) and Backward Explicit Congestion Notification (BECN) messages.
Link Efficiency Mechanisms
Link efficiency mechanisms include compression and Link Fragmentation and Interleaving (LFI). Compression involves reducing the size of certain packets to increase the available bandwidth and decrease delay and includes the following types:
- Transmission Control Protocol (TCP) header compression (compresses the IP and TCP headers, reducing the overhead from 40 bytes to 3 to 5 bytes)
- Real-time Transport Protocol (RTP) header compression (compresses the IP, UDP, and RTP headers of voice packets, reducing the overhead to 2 to 4 bytes)
LFI techniques are efficient on slow links, where certain problems might appear even when applying congestion management features. These problems are generated by big data packets that arrive at the interface before other small, more important packets. If a big packet enters the FIFO TX Ring before a small VoIP packet arrives at the software queue, the VoIP packet will get stuck behind the data packet and might have to wait a long time before its transmission is finished. To solve this problem, LFI splits the large data packet into smaller pieces (fragments). The voice packets are then interleaved between these fragments, so they do not have to wait for the large packet to be completely transmitted first. This process is illustrated in the figure below:
Link Fragmentation and Interleaving
There are three different flavors of LFI used today:
- Multilink Point-to-Point Protocol (PPP) with interleaving (used in PPP environments)
- 12 (used with Frame Relay data connections)
- 11 Annex C (used with Voice over Frame Relay – VoFR)
QoS Design Recommendations for Voice Transport
Network designers should use some general guidelines for designing Quality of Service for voice transport. These QoS techniques are applied when using WAN connections equal to T1 or lower. When using high-bandwidth connections, QoS techniques do not need to be considered because congestion is less likely to occur.
QoS techniques are most effective on bursty connections, typically in Frame Relay environments where CIRs and burst rates are usually specified in the contract. Traffic bursts occur when sending large packets over the network or when the network is very busy during certain periods of the day. QoS techniques should be mandatory if the Enterprise Network uses any kind of delay-sensitive applications (e.g., applications that must function in real time, such as presentations over the Internet or video training sessions) and other traffic on that connection might affect the user’s experience.
Congestion management should be considered only when the network experiences congestion, and this should be planned for by analyzing the organization’s policies and goals and following the network design steps (PPDIOO). Before applying any QoS configuration, traffic should be carefully analyzed to detect congestion problems. The best QoS mechanism should be chosen based on the specific situation, and this can include packet classification and marking, queuing techniques, congestion avoidance techniques, or bandwidth reservation mechanisms (RSVPs).
Network designers should also be familiar with the most important QoS mechanisms available for IP Telephony, such as the following:
- Compressed RTP (cRTP)
- LFI
- PQ-WFQ
- LLQ
- AutoQoS
cRTP is a compression mechanisms that reduces the size of the IP, UDP, and RTP headers from 40 bytes to 2 or 4 bytes. cRTP is configured on a link-by-link basis, and Cisco recommends using this technique for links that are slower than 768 kbps.
Note: cRTP should not be configured on devices that have high processor utilization (above 75%).
LFI is a QoS mechanisms used to reduce serialization delays. PQ is also referred to as IP RTP priority. This adds a single priority queue to the WFQ technique, which is used for VoIP traffic. All the other traffic is queued based on the WFQ algorithm. When using PQ, the router places VoIP RTP packets in a strict priority queue that is always serviced first.
LLQ (a.k.a. PQ-CBWFQ) also provides a single priority queue but it is preferred over the PQ-WFQ technique because it guarantees bandwidth for different classes of traffic. All voice traffic is assigned to the priority queue, while VoIP signaling and video traffic is assigned to its own traffic class. For example, FTP can be assigned to a low-priority traffic class and all other data traffic can be assigned to a regular traffic class.
AutoQoS is a Cisco IOS feature that uses a very simple CLI to enable Quality of Service for VoIP in WAN and LAN environments. This is a great feature to use on Cisco Integrated Services Routers (ISRs) (routers that integrate data and media collaboration features) because it provides many capabilities to control the transport VoIP protocols. AutoQoS inspects the device capabilities and automatically enables LFI and RTP where necessary and it is usually used in small- to medium-sized businesses that need to deploy IP Telephony fast but do not have experienced staff that can plan and deploy complex QoS features. Large companies can also deploy IPT using AutoQoS, but the auto-generated configuration should be carefully revised, tested, and tuned to meet the organization’s needs.
Cisco QoS configuration guide.
Note: QoS techniques should be carefully configured on all of the devices involved in voice transport, not just on individual devices.
Summary
Converged networks are networks with the capacity to transport a multitude of applications and data, including high-quality video and delay-sensitive data such as real-time voice. Although bandwidth-intensive applications stretch network capabilities and resources, they also complement, add value, and enhance every business process.
Converged networks must provide secure, predictable, measurable, and sometimes guaranteed services. In order to ensure successful end-to-end business solutions, Quality of Service (QoS) is required to manage network resources.
In this post we covered what is QoS and how it works in the real world. Please join our newsletter for more articles like this.
Leave a Reply