Requirements for WAN Hosts being Monitored
Les Cottrell and
Tom Glanzman,
Last Update: February 26, 2000
[
SLAC Computer Services (SCS) |
Network Group |
Computer Networking |
WAN Monitoring
]
Introduction
We have been plagued with problems with our
wide-area network. Often these problems are simply that the network link has
insufficient capacity to handle the traffic. Other problems reflect poorly
configured or malfunctioning equipment. It is often difficult and frustrating
to follow up on problems between SLAC and your site. SLAC and others have
developed a set of tools, called
PingER, to aid in the identification and
diagnosis of some types of networking problems. A number of
SLAC
collaborator sites are already
participating in this monitoring program which has helped to solve real
problems.
The program currently has several components
and provides these services:
- Ping.
- SLAC periodically pings a selected machine at the
collaborator site with 100
and 1000 byte messages, then records the result (elapsed time, lost packets).
These records
are automatically accumulated over time and converted into a variety of tables
and plots, all of which are viewable via WWW.
- Traceroute server.
- One can determine the network path between two sites
(in general, there may be different paths in each direction)
for the purposes of
identifying bottlenecks and outages. A
simple cgi script
(for your WWW
server) can provide the path from the WWW server back to the requester.
The net result of this program is that it facilitates communication between
networking experts at SLAC and other sites, provides a long-term record of
network performance and enables a certain amount of diagnosis when a problem
is suspected. For some examples of such reverse traceroute servers see:
Traceroute Servers for HENP & ESnet.
Requirements of the Remote Node/Site
- Identify by host name (i.e. it should be registered in the Domain Name Services (DNS)),
a reliable, lightly loaded or at least
consistently loaded (e.g. first choice is a web server though a
a name server, a mail gateway may also be suitable) host
(NOT a router, and preferably a UNIX host with a robust/modern TCP stack)
at the remote site site that can be
monitored via ping (e.g. pings must not be blocked or rate limited).
- The host must not be a proxy host that is located at another site.
- The host should be "close" to the offsite connection in order to avoid the
effects of congestion in the local area network.
- The host should have high availability.
By design the host should be available 24 hours by 7 days/week.
The host should be high on the
list of machines to be restored to service and accessibility after an outage.
Typically we monitor the host/site for several months before selecting
it as a beacon site, and for the last several months the reachability should
be over 98%.
- Ideally the host should be on an uninterruptible power source (UPS).
- The host name should be unlikely to change
(even though the underlying hardware and IP address may change) and
must refer to a single
host, i.e. there must be one and only one
IP address associated with that host name (it is not multi-homed and not one of
a set of hosts chosen by a load balancing name server for example).
The host name should identify the site, e.g. do not use a server that is not
located on the site, and where possible use top level domains that identify
the country, e.g. DON'T use .net or .org or .com or .mil or .edu for a non
U.S. site.
- If there are concerns about the
extra publicity (and therefore possible increased security concerns) for the
machine then an already well-known machine should be chosen.
- You may want to consider giving the chosen host a DNS alias name of
ping, e.g. ping.slac.stanford.edu. This will not only help identify the
function of the host, but also require no changes in the monitoring
configuration if a remote site decides to change the physical host to be
pinged, provided that the name is moved to the new host.
- Provide the platform type and Operating System of the host, and
the location in terms of the city and/or
the rough latitude and longitude and connectivity
- It would also be nice to know the Internet Service Provider(s)
and
link speed (s)) of the site.
- No software needs to be installed, no accounts need to be made available, no special
servers need to be run. the xtra network load is 100bits/s on average for each monitoring
site that monitors this remote site. This can be reduced to 10 bits/s
in cases of very poor network connectivity.
Optional:
[
Feedback |
Reporting Problems
]
Owner: Les Cottrell