In this post, we will learn about some common troubleshooting approaches that can be applied when troubleshooting problem reports. Before we delve into these structured approaches, however, it is essential to understand the implications of not using a structured troubleshooting approach when troubleshooting a network problem.
We cover network troubleshooting in detail in our Cisco CCNP ENCOR video course.
One of the most common mistakes made by engineers is not following some kind of structured troubleshooting approach when attempting to identify the root cause of a problem. While it is true that ad-hoc troubleshooting may eventually produce the desired objective, such an approach is unpredictable and is often very inefficient. One of the most commonly employed troubleshooting techniques, especially by experienced troubleshooters, is the shoot-from-the-hip troubleshooting approach. With this method, after the troubleshooter has collected information, he or she leverages his or her intimate knowledge of the network or calls on experience (past or present) and then immediately implements a change in the hope that the change he or she implemented will resolve the issue.
The primary problem with this approach is that while it may work for seasoned engineers who can call on their experience or knowledge of the network, for example, it does not work for inexperienced engineers. A structured, systematic approach, on the other hand, will reduce the amount of time the troubleshooter spends on the problem. In addition, a structured approach increases the efficiency of the overall troubleshooting process itself. The shoot-from-the-hip troubleshooting method is illustrated in the figure below:
The Shoot-from-the-Hip Troubleshooting Method
As was previously stated, there is no single way to troubleshoot. Different problems call for different approaches. However, regardless of the approach used, it is important to adhere to a structured troubleshooting approach. Common structured troubleshooting methods include the following:
- The top-down troubleshooting method
- The bottom-up troubleshooting method
- The follow-the-traffic-path troubleshooting method
- The compare configurations troubleshooting method
- The divide-and-conquer troubleshooting method
- The component-swapping troubleshooting method
These troubleshooting methods will be described in the following sections.
The Top-Down Troubleshooting Method
When using the top-down troubleshooting approach, the troubleshooter begins troubleshooting at the Application layer of the OSI Model and works his or her way down to the Physical layer. This approach works best when you believe that the problem resides within an application and not within the network or internetwork devices. For example, if a user reports that he or she cannot access a particular server but is able to ping the server IP address, then it can be assumed that Layers 3 through 1 are working fine because there is IP connectivity between the user’s machine and the server. The troubleshooting process would therefore begin at the Application layer.
The Bottom-Up Troubleshooting Method
When using the bottom-up troubleshooting approach, the troubleshooter starts troubleshooting at the Physical layer of the OSI Model and works his or her way up to the Application layer. This approach is based on the assumption that the problem resides at the lower half of the OSI Model. The bottom-up troubleshooting approach is efficient and is one of the most commonly used troubleshooting methods. However, while it works well in smaller networks, it is typically inefficient in larger networks, as it becomes more difficult to discover which network device is actually causing the problem.
The Follow-the-Traffic-Path Troubleshooting Method
The follow-the-traffic-path troubleshooting method requires intimate knowledge of the network, as well as the traffic flows, which, if following best practices, should be included in network documentation. This troubleshooting approach is based on the path that the traffic or packets will take through the network. A common practice when collecting information is to request a traceroute from the user reporting the problem. The troubleshooter can then use this troubleshooting method to eliminate internetwork devices based on the path the traffic takes.
The Compare Configurations Troubleshooting Method
The compare configurations, or spot-the-difference, the method entails comparing the configuration on the current device with an older or archived version of the configuration that had been confirmed to be working. Another approach that is also commonly used is to compare device configurations with that of another similarly configured device that is working.
The Divide and Conquer Troubleshooting Method
The divide-and-conquer troubleshooting method begins at the Network layer of the OSI Model and then goes either up or down the stack, depending on the results of the test. For example, assume that a user reports that he or she is unable to access a particular server. Using this approach, if a ping to the server IP address was successful, the troubleshooter would begin the troubleshooting process at the top of the OSI stack. On the other hand, if the ping failed, then the troubleshooter would begin the troubleshooting process at the bottom of the OSI stack.
The divide-and-conquer troubleshooting method also works well when several troubleshooters are working on the same problem. Once all possible causes of the problem have been hypothesized, individual troubleshooters can be asked to test and verify individual hypotheses. The advantage of using this approach when multiple troubleshooters are all working on the same problem is that it increases efficiency and reduces the likelihood that two or more people are doing the same thing (i.e., duplication of effort) while other aspects are being neglected.
The Component-Swapping Troubleshooting Method
The component-swapping troubleshooting method entails the replacement of components and observing whether the problem moves with the components. For example, referencing the user intermittent network connectivity example used at the beginning of this chapter, if after replacing the network cable, the user is still experiencing issues, the next step would be to move the user to another switch port. If that does not resolve the issue, the workstation NIC card could be replaced next, and so forth. If the problem disappears after a component is replaced, for example, the network cable, then it can be concluded that the component is faulty.
Leave a Reply