Diagnosing Network Connectivity Problems

by Walt Stoneburner
"It was working just a second ago..."


The Symptom

• Router God: Amazing Technical advice with a humorous slant.
• Sam Spade network web tool
You were working on the Internet and now suddenly you're facing connectivity problems. Perhaps you can't telnet out, you can't ssh in, everything works except for the web server... what's going on?

If at all possible, boil things down to two machines that won't talk to each other during a given scenario. You want to be at one, and your buddy at the other; get real time communication happening between you, ideally on a portable phone.

FACT: If it was working, it can work again.

FACT: You might not have changed something, but something changed.

Consequently, your first goal is to find out where the change happened, then what it was, and finally how to put it back.

Start with the utterly obvious

  • Is the wall supplying power?
  • Does everything plugged in to the wall have power? Hubs and all.
  • Is each device powered on?
  • Is the evidence the device is operational, and not hung?
  • If a network card, is it firmly seated into the CPU bus?
  • Are all network cords where they belong?
  • Are all the network cords plugged in firmly?

Start with problem machine

  • Can you get physical access to the host that's unreachable?
  • Can anyone? It helps to have a remote partner.
  • Ping a site that should always be up. like www.Google.com.
    Can you get to some hosts but not others?
      Yes, the problem is not at your local network.
      No, check around your local machine's network first.
  • When you generate network traffic, does your ethernet light flicker?
      No, suspect that you don't have an interface up.
        If using a laptop, did you boot docked/undocked correctly?
      Yes, you're sending packets.
  • Can it reach the destination machine by using an IP address?
    Yes, this may be a DNS problem.

Try an uninterested external machine

  • Pick some external machine (let's call it X) not involved in your communication crisis.
    Can your problem host (call it machine A) see machine X?
      Yes, your network if good; examine routing and firewall options.
      No, examine hardware, cables, and local configuation.
  • Can machine X see machine A?
    If packets are flowing only half way, suspect your ethernet card needs replacing.
  • Can the destination machine (call it B) see machine X?
      Yes, the network is good, and machine X has services up.
  • Can machine X connect to the destination machine (B)?
      Yes, the network is good at machine B's end and it is running services.
      No, suspect the network at the destination machine.

Advanced stuff

  • Put a network sniffer on the line. Connect through an Ethernet HUB, not a switch. If using a dual-speed hub, make sure your sniffer machine is at the same speed (10 or 100Mbps) as the machine you're trying to troubleshoot -- dual-speed hubs are actually 2-port switches! (Yes, it's possible to mirror traffic on some managed switches to do network sniffing; if you know how to do this, your have no business reading this guide.)
    Do you see your own outgoing traffic?
      No, check your interface.
      Yes, make sure that the ethernet card light blinks, as do intermediate hubs. Validate your traffic can be seen by other hosts.
  • Do you see incoming traffic to you?
      Yes, your card is working. Suspect the remote network.
      No, check the local network, suspect the ethernet card.
  • Do you see cross talk of traffic on the physical subnet intended for other machines?
      Yes, your hardware is working.
      No, focus on your local hardware and physical connection.
A NAME="interface">

Ping another host

For Linux, use:
# /bin/ping -v host

For Windows, use:
C:\> ping -t host

The idea is to keep a continuous ping going so that you can see if the other side is responding or not. You may want to use a firewall to see if your requests are getting out and a response is coming back. If you're only seeing half the traffic, suspect your network card and substitute another in and see what happens. If you are seeing no traffic, examing your routing tables. If you are seeing the traffic, but other kinds of traffic aren't making it through on different ports, it's time to suspect a firewall.

Note: some firewalls block ping traffic, considering it a probe to see which machines are answering. You may want to use another service, such as telnet.

Perform a traceroute

For Linux, use:
# /usr/bin/traceroute -n host

For Windows, use:
C:\> tracert -d host

Start by supressing name resolution, as shown above, otherwise DNS problems can look like a lot like slow/no connectivity. Plus when the DNS server is broken, lookups won't work.

You are looking to see that you can make it from host all the way to another. Long delays, high turn around times, indicate network congestion. Asterisks indicate a host is not responding, or that the firewall isn't allowing ICMP packets through (you'll have to try telnet or something else).

If you notice a circular pattern where host A hops to host B, and host B hops back to host A, then someone along the way has a problem with their routers; unless it's your ISP (who you should call and report this to), there's not much you can do but wait it out -- assume they are aware of it. Most likely their router tables haven't propagated correctly.

Circular routing often indicates that an endpoint acess router is down and the two upstream routers don't know where to send the packet since the dynamic route associated with the downed connection has gone away.

If instead the traceroute dies at a particular point, try other hosts serviced by that ISP. The very clever can use whois (or Sam Spade) and ping and try to derive some IP numbers to try. It may be an ISP is down from some reason.

Telnet to another host

For Linux, use:
# /bin/telnet host 23

For Windows, use:
C:\> telnet host 23

Port 23 is the port for logging in. If a host doesn't allow logins, you can also try port 25 (the mail program) - if the host responds with any text, enter QUIT and hit return. You can also try port 80, if the connection is made you won't see an error message and instead the system will wait for input (and it may not echo it); enter GET / and hit return. If you see text and are disconnected, you made a connection.

    Concerning port 80, a lot of ISPs are using transparent proxy devices, which cache traffic destined for port 80. Same for port 25 (as an anti-spam measure). You've got to read the output to make sure you're talking to the host you intend.

IMPORTANT: If you can make a connection to the remote host and get a reply on any port, then your problem is not connectivity, but firewall rules.

If you suspect a port should be responding, but looks like it isn't, then try connecting to it from the localhost first. Do this by using the machine's IP address or 127.0.0.1. Additionally, you may wish to see which services are listing.

Check your Interface

For Linux, use:
# /sbin/ifconfig -a

For Windows, use:
C:\> ipconfig /all

You're looking to make sure that you have an interface defined for your ethernet, that it has a hardware address (MAC), that the internet address is defined, that there is a broadcast address, and a mask.

Quite often the interface will report metrics, such as how many packets, errors, dropped, overruns, etc. there were.

If you are sending, but not receiving, or receiving, but not sending, be suspicious of your network card; consider replacing it with another and see if the problem goes away.

High errors suggest a faulty line or cable. Too many overruns may mean you have too much on this leg and need to subnet your traffic. Too high of a drop count suggests that a router between you and the other guy is swamped with traffic (or is trying to convert between 10Mbs and 100Mbps) and needs more ram for a large buffer.

Dumping the Routing Table

For Linux, use one of these:
# /sbin/route -n
# /bin/netstat -ar

For Windows, use:
C:\> route print

What you are looking for is that you have a routing table defined, and that the default route points to a gateway, and that the gateway is reachable.

You should be able to ping your gateway.

Dumping the ARP Table

For Linux, use:
# /sbin/arp -n

For Windows, use:
C:\> arp -a

The ARP tables should so the explict MAC address (and corresponding IP address) of any network cards on the physical local subnet. If you can see other machines in this list, besides your own, your network card is functional, and you may have a routing problem.

You can get a general broadcast to all machines on your local subnet by doing a ping 255.255.255.255. This sends a generic message out to every card physically on the same wire and asks them to respond; so you will get duplicate messages. If ping complains, you may need to try a network address or provide an additional switch option on the command line.

    Note that broadtcast pings don't work on all machines/operating systems. Additionally, it often requires root privileges to do it.

Note that HUBs will pass along broadcasts, bridges, switches, routers, and firewalls most likely will not. If you see limited traffic, expect that you have networking hardware to take into account too. Usually if networking hardware is acting up, then all machines on the same physical wire will misbehave in the same way. Do they? If it's isolated, then suspect the firewall.

Which services are listening

For Linux, use:
# /bin/netstat -an

For Windows, use:
C:\> netstat -a

You'll see a list that shows your machine at the port it's listening on (LISTENING), and optionally a connected machine and the port it is using (ESTABLISHED). The status of TIME_WAIT means that a port disconnected and it's waiting an established period of time before making the port reallocated for use. If you see a lot of TIME_WAIT, you may be under a denial of service attack.

If you see a connection, but do not recognize the port in use, you can interrogate the system as to which process is using it.

The handle tool, from
www.sysinternals.com,
can provide "lsof"-like
functionality under Windows.

List Open Filehandles

For Linux, use:
# /usr/local/sbin/lsof -i :portnumb

Unfortunately, I know of nothing that allows this from Windows.

Even still, the lsof program is not a "standard" Linux utility and must be obtained from ftp://vic.cc.purdue.edu/pub/tools/unix/lsof/.

The utility has to be compiled for the particular version of the kernel, since it needs to know how to peek inside the kernel structures (which can and do change).

More often than not, the port number will be listed in /etc/services or be that of an application you installed.

If not, suspect that your system may have an intruder. Head over to www.freshmeat.net and do a search for rootkit to find the latest rootkit detection software.

If you are using Windows, another
useful toois Sam Spade; it gives
you easy access to netblock owner
information.

Whois Services

For Linux, use:
# /usr/bin/whois domainname

WHOIS will tell you who owns the domain, and what name servers are responding. It's possible that DNS is not operating or is pointing to the wrong machine.

Be sure to use a domain name (foobar.com) rather than a host name (www.foobar.com).

DNS Look Up

For Linux, try any of these options:
# /usr/local/bin/dig host
# /usr/local/bin/dig @dnsserver host
# /usr/local/bin/host host
# /usr/local/bin/nslookup host

For Windows, use:
C:\> nslookup hostname

You can specify either a hostname or an IP address, this will ask the domain name server(s) if they know about the machine. A machine should always respond to it's IP address, but may not always be listed in DNS. Furthermore, many names may alias back to a single IP address. For this reason, a name may resolve to an IP address, but the IP address may not resolve back to the same name.

If your connection works with an IP address but not with the hostname, then it's very likey either the DNS server isn't responding, is populated with bad data, doesn't have the host listed, the cache has expired, or the negative cache has not expired (DNS machines keep track of which names have no known IPs when you ask them).

Note that nslookup is becoming very obsolete, and that dig and host is the replacement; it does more, better.

Ethernet Sniffer

For Linux, try:
# /usr/local/sbin/tcpdump

There are a number of tools, including snort and ethereal which also capture and display traffic.

The goal is to see if you see any traffic, ideally two way. First from anywhere (such as other machines on the local physical subnet), then from and destined to you, finally over the port your interested in.

Note that you may have to provide various filters, like grep, in order to see just the traffic you're really interested in.

SlingCode Search Results About     Articles     Links     Search Tips  
SEARCH: