CBT IT Certification Training

Unlimited IT Certification Courses via Streaming Video

Login to this site requires ssl communication.
Click here to reload the page over ssl.

  • Lost your password?

  • Back to login
Loading
Login to this site requires ssl communication.
Click here to reload the page over ssl.

  • Lost your password?

  • Back to login
Loading
  • Home
  • Courses
        • Amazon
          • Solutions Architect
          • SysOps Associate
        • CompTIA
          • CASP+
          • Cloud Essentials
          • CySA+
          • Data+
          • Linux+
          • Network+
          • PenTest+
          • Project+
          • Security+
        • Juniper
          • JNCIA-Junos
          • JNCIA-SEC
          • JNCIS-ENT
          • JNCIS-SEC
        • Wireshark
          • WCNA
        • Career
          • How to Break into IT
          • IT Freelancing
          • Ace Your IT Exams
        • DevOps
          • DevOps Foundations
          • Docker Basics
        • Linux
          • CompTIA Linux+
          • Linux LPI Essentials
          • Linux LPIC-1
          • Linux LPIC-2
          • Linux LPIC-3 Security
          • Red Hat RHCSA
        • TCP-IP
          • IP Subnetting
          • IPv6 Associate
          • IPv6 Professional
          • IPv6 Expert
        • Python
          • PCEP
          • PCAP
        • Cisco
          • CCNA Primer
          • CCNA Exam Coaching
          • CCNA
          • CCNA CyberOps
          • DevNet Associate
          • CCNP – ENARSI
          • CCNP ENCOR Primer
          • CCNP – ENCOR
        • Google
          • Cloud Architect
        • Microsoft
          • Microsoft SQL Server
          • Windows 10
          • Windows Server 2016
          • Microsoft Security
          • Azure Fundamentals
          • Azure Administrator
          • Azure Developer
        • ITIL
          • ITIL Foundations
        • Coding Academy
          • PhP Fundamentals
          • MySQL Fundamentals
          • Web Development
          • Python For Beginners
        • EC Council
          • Certified Ethical Hacker
        • ISC2
          • SSCP
        • VMware
          • VCA Data Center
        • Wireless
          • CWNA
          • CWSP
  • Racks
    • GNS3 VM – Virtual Cisco Rack
    • Live Cisco Racks
  • Tour
  • Blog
  • Join
  • Join
  • Free IT Training
    • Free CCNA Study and Lab Guide
    • CCNA Security Study and Lab Guide
    • CompTIA Network+ Study Guide
    • CompTIA Security+ Study Guide
    • Network Design Workbook
    • Free IT Webinars
    • Free IT Exams
    • Free Labs
  • Meet the Trainers
  • Help
    • Helpdesk
    • FAQ
    • Contact Us
    • Privacy
    • Meet the Trainers
  • Products
    • IT Study Guides
  • Start $1 Trial
  • Login
  • Members
    • Account
    • Exam Coaching
    • Exams
    • Forum
    • Live Cisco Rack Training
    • Members Training
    • Member Bonuses
    • My Courses
    • Nuggets
  • Home
  • Courses
        • Amazon
          • Solutions Architect
          • SysOps Associate
        • CompTIA
          • CASP+
          • Cloud Essentials
          • CySA+
          • Data+
          • Linux+
          • Network+
          • PenTest+
          • Project+
          • Security+
        • Juniper
          • JNCIA-Junos
          • JNCIA-SEC
          • JNCIS-ENT
          • JNCIS-SEC
        • Wireshark
          • WCNA
        • Career
          • How to Break into IT
          • IT Freelancing
          • Ace Your IT Exams
        • DevOps
          • DevOps Foundations
          • Docker Basics
        • Linux
          • CompTIA Linux+
          • Linux LPI Essentials
          • Linux LPIC-1
          • Linux LPIC-2
          • Linux LPIC-3 Security
          • Red Hat RHCSA
        • TCP-IP
          • IP Subnetting
          • IPv6 Associate
          • IPv6 Professional
          • IPv6 Expert
        • Python
          • PCEP
          • PCAP
        • Cisco
          • CCNA Primer
          • CCNA Exam Coaching
          • CCNA
          • CCNA CyberOps
          • DevNet Associate
          • CCNP – ENARSI
          • CCNP ENCOR Primer
          • CCNP – ENCOR
        • Google
          • Cloud Architect
        • Microsoft
          • Microsoft SQL Server
          • Windows 10
          • Windows Server 2016
          • Microsoft Security
          • Azure Fundamentals
          • Azure Administrator
          • Azure Developer
        • ITIL
          • ITIL Foundations
        • Coding Academy
          • PhP Fundamentals
          • MySQL Fundamentals
          • Web Development
          • Python For Beginners
        • EC Council
          • Certified Ethical Hacker
        • ISC2
          • SSCP
        • VMware
          • VCA Data Center
        • Wireless
          • CWNA
          • CWSP
  • Racks
    • GNS3 VM – Virtual Cisco Rack
    • Live Cisco Racks
  • Tour
  • Blog
  • Join
  • Join
  • Free IT Training
    • Free CCNA Study and Lab Guide
    • CCNA Security Study and Lab Guide
    • CompTIA Network+ Study Guide
    • CompTIA Security+ Study Guide
    • Network Design Workbook
    • Free IT Webinars
    • Free IT Exams
    • Free Labs
  • Meet the Trainers
  • Help
    • Helpdesk
    • FAQ
    • Contact Us
    • Privacy
    • Meet the Trainers
  • Products
    • IT Study Guides
  • Start $1 Trial
  • Login
  • Members
    • Account
    • Exam Coaching
    • Exams
    • Forum
    • Live Cisco Rack Training
    • Members Training
    • Member Bonuses
    • My Courses
    • Nuggets

Network Troubleshooting

Given a scenario, implement a network troubleshooting methodology. This chapter aims to break down a multiple-step network troubleshooting methodology. Although the scope may be unclear at the beginning, troubleshooting is considered a science and it always involves a pre-determined process. Network engineers who follow this process will continue to get better at this over time, optimizing it based on their experience.

Back to book index.

Contents hide
Gather Information
Identify the Affected Areas in the Network
Determine whether Anything Has Changed in the Network
Establish the Most Probable Cause of the Problem
Determine Whether Escalation Is Necessary
Create an Action Plan and a Possible Solution
Implement and Test the Solution
Analyze the Results
Document the Process and Solution
Summary

A methodology is needed in order to speed up the problem finding and solving process. Although the methodology can be different from case to case, it is important to use one that best suits your company, network, and internal structure. A proposed methodology includes the following steps:

Step 1: Gather information

Step 2: Identify the affected areas in the network

Step 3: Determine whether anything has changed in the network

Step 4: Establish the most probable cause of the problem

Step 5: Determine whether escalation is necessary

Step 6: Create an action plan and a possible solution

Step 7: Implement and test the solution

Step 8: Analyze the results

Step 9: Document the process and solution

This process is represented by the flow diagram in Figure 6.1 below:

Network Troubleshooting Methodology

Figure 6.1 – Network Troubleshooting Methodology

 In the following sections, we will analyze each of the nine troubleshooting methodology steps.

Gather Information

Information gathering is the first step in the network troubleshooting methodology, which is similar to many other processes in the IT world, including the process of handling a security incident. Information gathering is also called the reconnaissance phase and it aims to obtain as quickly as possible all the relevant information that will assist you during the next phases.

The event that generated the information gathering phase is a network issue that was most likely reported by an end-user who discovered it or who was directly affected by the issue. The end-user could have reported this via e-mail, phone, or by opening a help desk ticket.

One of the first things that should occur in this phase is interviewing the affected parties and impacted users to find out the exact nature of the network incident. You should correlate these interviews with the application/device logs and error messages gathered from the affected systems. One of the first things that you need to do is isolate the issue and figure out whether the problem affects a single user (end-station) or a group of users, such as an entire VLAN or network segment.

Depending on the number of users affected, you can start the troubleshooting process at the Physical Layer or at an upper layer in the OSI reference model. For example, if you have multiple users in a VLAN/subnet and none of them are able to access a specific application, you should not begin by examining the cabling from their workstations to see if they are plugged in; instead, you should immediately move up the OSI model, as the probability of physical connectivity issues is low. Focus on the things all those users have in common from a networking perspective and this might lead to a particular switch that they are all connected to and that might be malfunctioning.

If the problem reported involves an application issue, you should try to gather some screenshots from the affected users to see how the problem manifested. Depending on the situation, you might need to walk the users through certain areas over the phone or remotely connect to their station via RDP or other terminal services to analyze the issue. If this is not possible, the network administrator or operator might have to personally go to the affected system and do some hands-on analysis in order to properly gather the necessary information.

After all of these steps are completed, you should document the relevant information gathered, including the following:

  • The type of problem
  • Problem description
  • Which systems were affected
  • How the systems were affected
  • In what context the problem manifested

Possible effects of the problem on the systems affected include:

  • Slow performance/response
  • Data corruption
  • Logon issues
  • Resource access issues
  • Misconfigurations

Another critical aspect you should cover during the information gathering phase is identifying the specific moment at which the problem manifested. The time of occurrence should be correlated with different network events that happened in that period, including changes made by users in different areas. This information, together with details about the symptoms and error messages, should offer a complete information set specific to this phase and should allow you to proceed to the next step in the troubleshooting process.

Identify the Affected Areas in the Network

In order to properly and rapidly identify the affected areas of the network, you should make use of good mapping tools, including:

  • Packet sniffers (like Wireshark)
  • Detailed topology diagrams (both physical and logical)
  • Other schematics of the network

You need to understand the physical and logical network topology in order to identify the affected areas and trace the problem throughout the network. In addition, you should also be able to use tools like ping, event viewer, and other monitoring tools.

Note:     If the organization uses some kind of security policy, you should also understand this policy as part of the troubleshooting process.

Solid network documentation helps in this phase, including documentation about IP addressing within the network. Using VLSM and address aggregation will be useful, as they can prevent problems within a network area from affecting the entire routing domain. Using IP addressing aggregation to represent many networks can create problem domains, which will help in the troubleshooting process.

Another part of this phase is understanding which applications, services, and protocols are used by every group of users within the network. Knowing which areas of the organization are using a certain type of application can help to quickly isolate the problem domain and continue the troubleshooting process.

An important thing that helps in this troubleshooting step is designing a modular network that contains multiple layers, such as the following:

  • Core Layer
  • Distribution Layer
  • Access Layer
  • Management module
  • Remote Access module
  • VPN module

If you have WAN connectivity to remote and branch offices, it would help to train remote network operators in the troubleshooting process so they will be able to remotely help from their respective offices. In addition, if the problem includes other technology areas, you should consult with the colleagues responsible for those areas and maybe even form a troubleshooting team in order to fix the problem as soon as possible.

Depending on the situation, to minimize the effect of the problem and shorten the troubleshooting process, you should make sure you have:

  • Backups of the system
  • Roll-back techniques, especially in situations in which modifying device configurations does not solve the problem
  • Spare parts
  • Failover between devices and modules.

Determine whether Anything Has Changed in the Network

Depending on the actual problem, the network administrator/operator must follow a mental flowchart that starts with the problem reported by the user, for example, an issue logging on to a system. First, make sure you go through the standard network troubleshooting process using your knowledge of the OSI reference model. If a user is on a system, and the system is up and running but the user is not able to get his credentials passed to a central server, you should try to determine whether this is a single user issue or a widespread issue, because this will dramatically affect the troubleshooting method.

If it’s a single user issue, you should use standard networking troubleshooting tools, including:

  • Ping
  • Event logs
  • RDP to access the system

If it’s a widespread issue, you should examine the services and the system logs on the servers the users are trying to access to see if you can learn any information about the issue (maybe some kind of authentication problem).

Establish the Most Probable Cause of the Problem

At this point, you should have discovered what the problem is and how it manifested based on the following issues:

  • Service outage/inaccessible
  • Slow service
  • Logging issues
  • Dropped sessions
  • Data corruption

The next step is to find the cause of the problem, for example:

  • Cabling problems
  • Connectivity between an Access Layer switch and a Distribution Layer switch, either in the server room or in the wiring closet
  • DoS attack on a system (e.g., router, switch, or server)
  • Software issue (user misconfiguration or user adding some type of application)
  • IP addressing issue (DHCP problem)

You should ask everyone involved in the incident what the last change in the system was and try to obtain details on this. In this troubleshooting phase, you should consider every possible cause but you should put them in order (based on the symptoms) and start with the most obvious things first. In the end, you will know exactly what you should test to solve the particular issue.

Determine Whether Escalation Is Necessary

Many organizations use the Information Technology Infrastructure Library (ITIL) framework, which is a systematic approach for information technology management in an organization. One of the domains specified by this library is incident management. Many organizations have internal help desk or service desk structures.

While help desks usually serve outside customers and vendors, service desks serve internal customers and other departments within the same company. Users can issue a trouble ticket to the service desk system and that will be processed through some type of workflow using e-mail or other automatic process. At some point, the service desk operators have to decide whether the solution is beyond their capabilities and responsibilities and, if it is, whether they should escalate this to a higher-level team and involve other people in the process. Usually, organizations have a three-tier escalation model:

  • Level 1: The service desk operators are in direct contact with the customer/users. This is where the problem is reported.
  • Level 2: Service desk personnel who are more qualified than Level 1 technicians are used for escalation.
  • Level 3: This is the highest escalation level and it often includes network engineers and application developers.

Create an Action Plan and a Possible Solution

Most of the time the action plan needed in a troubleshooting methodology is based on experience and analysis of documentation created by previous network engineers or operators. The process of creating an action plan involves documenting every one of the previous steps in the troubleshooting process, often by taking notes and using a PDA or an audio recorder to capture all the meaningful information along the way.

An important thing to remember is that you should act on one event at a time. If several problems occur simultaneously, you should prioritize them based on the way they affect users and the impact on the network and even on the business. Depending on the situation, you may want to delegate other technicians to specific technology areas in order to cover all the affected zones at the same time.

Once the action to be taken has been identified, you should implement a single fix/solution at a time. Do not try to mix different solutions just to hurry things up. Most often this will lead to other problems. You should move on to the next fix only if the previous one does not work.

Another important rule states that backup should happen first and rollback second. You should have a backup of the data, system, and configuration files before implementing the fix, and then you should have some way to roll back to the last known good configuration before attempting the troubleshooting solution.

Other special cases are the ones in which the problem is intermittent or the solution cannot be implemented outside production hours. If this is the case, you should carefully schedule a change control window and follow a strict procedure to cover every possible solution. You should always have a backup plan in case the primary solution fails; this will allow you to speed up the troubleshooting process and make maximum use of the scheduled maintenance window.

Do not panic if things get out of control. Ask your colleagues for assistance when needed and stay calm so you can maintain control of the situation.

Implement and Test the Solution

One of the most critical parts of the troubleshooting process is testing the solution. If the solution involves some type of major change or fundamental modification to the network infrastructure or design, the recommendation is to test the solution in an isolated prototype environment first.

If possible, as a best-case scenario, you should have an exact mirror of the network topology, or at least try to get as close to this as possible. This could mean trying to create a subset of the network infrastructure on which the solution can be tested. For example, the solution might involve applying some type of service pack or software upgrade on different devices, and this should be carefully tested in an isolated environment before launching such a drastic change into production.

During the implementation and testing phase, network technicians usually create scripts in order to execute multiple tasks at the same time to save time. Following this advice, you should prepare a detailed implementation plan and testing procedure before starting the actual work to minimize possible problems that might occur. In addition, you should have technicians with higher seniority available if things get out of control.

From a testing standpoint, solutions should be tested based on their complexity, starting with the simple ones first, in order to achieve maximum efficiency.

Analyze the Results

The testing phase results may or may not be favorable, so you should have an iterative process in place that will permit you to go back to a different phase of the process until you find the right solution. This means that, as part of the troubleshooting process, you should know which phase you should go back to. For example, if you performed enough information gathering and you are sure that you have all the facts, you can skip this phase.

The iterative process may also involve the following actions:

  • Using an audio recording device throughout the process to capture the actions taken
  • Sharing the results with the online community, using work groups and bulletin boards that can help you obtain answers quickly
  • Escalation to a Layer 2 or Layer 3 technician

After finding and implementing the solution to the problem reported, most network professionals just move on without taking one important step into consideration: implementing preventative measures, to avoid having the issue occur again in the future. This includes procedures that mitigate the problem, such as building a redundant network topology with failover capabilities to minimize the effect of a device going down.

As a network technician you should be prepared for unexpected risk, as often times things that you do to fix a problem will have unexpected consequences on other users, systems, or applications.

Document the Process and Solution

The documentation process should cover all phases of the troubleshooting process. Using a PDA or an audio recording device can assist in recording every step of the process, including mistakes and unexpected consequences. Another source for obtaining information to be used in the documentation process is logging servers that generate customized reports based on customized filters.

Various Web-based tools from different vendors are available for documentation purposes and for creating customized reports and summaries (using XML or other formats). A common document management system used for this purpose is Microsoft SharePoint, which offers the capability of using document libraries.

The end-scope of this process is to generate a series of reports and summaries in order to complete the troubleshooting process and offer the technician the possibility of providing the final resolution to management, as he is responsible for delivering documentation on the solution and on the entire process. Part of the goal in this phase is constant improvement, which includes storing the solution in a common knowledge database that can provide valuable information for similar cases in the future.

Summary

Troubleshooting is considered a science, as it always involves a pre-determined process. Network engineers who follow this process will get better at troubleshooting over time, optimizing it based on their experience.

A methodology is needed in order to speed up the problem finding and solving process. Although the methodology can be different from case to case, it is important to use one that best suits your company, network, and internal structure. A proposed methodology includes the following steps:

Step 1: Gather information

Step 2: Identify the affected areas in the network

Step 3: Determine whether anything has changed in the network

Step 4: Establish the most probable cause of the problem

Step 5: Determine whether escalation is necessary

Step 6: Create an action plan and a possible solution

Step 7: Implement and test the solution

Step 8: Analyze the results

Step 9: Document the process and solution

content-filler

ABOUT US

This site has been created to help you make the best out of your IT career. Whether you are trying to get your first job, get promoted, or start your own IT business, we have a course for you.

MOST POPULAR

  • Account
  • Forum
  • Live Cisco Rack Training
  • Members Training
  • Member Bonuses
  • My Courses

Members

  • Account
  • Forum
  • Live Cisco Rack Training
  • Members Training
  • Member Bonuses
  • My Courses

Newsletter

Secure Site

website security secure

Copyright Reality Press Ltd . / Paul Browning

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.