Authors: Les Cottrell and Warren Matthews. Created: June 5; last updated on July 9, 1999
|
The Surveyor (and RIPE) monitoring project relies on a dedicated PC running Unix to be placed at each monitoring site. Each PC in turn relies on a Global Positioning System (GPS) device to obtain accurate time and to synchronize time between each of the monitors. The monitors send packets at Poisson randomized time intervals to each other and use these packets to gather one way end-to-end delay and loss measurements. Surveyor also makes concurrent traceroutes which provides route history information. Surveyor is more accurate and better for short term measurement, especially for sites which have good connectivity. Surveyor currently provides daily snapshots of performance. The community for Surveyor is Internet 2, though there are monitors at non Internet 2 sites, and in particular at 3 Higher Energy Physics (HEP) sites CERN, FNAL and SLAC that are also PingER monitor sites.
PingER uses the ICMP echo facility (ping) and thus only makes
round trip measurements. PingER uses an
existing host with no special software installed at the monitored site and does not
require a GPS system.
PingER is a more light weight solution, requires less management, uses less
bandwidth, requires less storage, and
nothing needs to be installed at the remotely monitored sites.
PingER is good for remote sites with poor connectivity.
PingER, today, has more reports available for showing long term trends.
The community of interest for PingER is ESnet, High Energy and Nuclear Physics
(HENP) sites and the Cross Industry Working Team (XIWT). More general
information comparing the Surveyor and PingER can be found in
Comparison of some Internet Active End-to-end Performance
Measurement projects.
The distribution of these pings (see the magenta squares in the chart
the right or above)
indicates a sharp peak (95% of the Round Trip Times (RTTs) are contained in
a 9.5 msec.) centered around 220 msec. There is both a high and a low RTT
tail. The figure also shows the Surveyor delay frequency distributions (green
and blue triangles) for
the same time period. The Surveyor distributions also show sharp peaks with a
high RTT tail. The medians of the two delay distributions (113 msec. SLAC to CERN
and 105 msec. CERN to SLAC) add up to roughly
the RTT seen by pinger (221 msec.). Note, they are not expected to be exactly
equal since the packet sizes are different.
The SLAC to CERN delay distribution (the blue dots)
also exhibits a low RTT tail similar to that seen in the ping distribution.
During this period Surveyor observed packet losses of 0.71% from CERN to SLAC,
0.68% from SLAC to CERN and the pings observed 1.04% for the round trip.
We also binned the Surveyor and ping data into 1 minute bins with the contents
of each time bin being the average Surveyor one-way delay or ping RTT for
that minute. We also added together the Surveyor one-way delays from each
direction for each minute to create a Surveyor round trip delay for
each minute. This data is
shown in the chart below or to the left.
The magenta and black dots (the bottom and next to
bottom sets of points) show the Surveyor one-way
delays, the green dots show the Surveyor round trip delays, and the blue dots
(the top set of dots)
show the ping RTT. Note that the left hand y axis is for the SLAC to CERN Surveyor delay and
the Surveyor round trip delay, and the right hand y axis is for the Surveyor to
SLAC delay and the ping RTT. The use of 2 separate y axes enables us to display
the points so they do not overlap and hide one another. Careful examination of
this chart reveals that the green and blue dots track one another very well
reproducing all the peaks and flat periods.
We optimized the adjustment of the ping timestamps mentioned above,
by varying the adjustment from -60
seconds to +60 seconds and calculating the correlation coefficient R between the
timestamp adjusted ping RTTs and the Surveyor round-trip delays. The results are shown
to the left or below. It is seen that there is a sharp peak at an adjustement of +2
minutes with a width (IQR) of about 5.5 minutes. By the time the adjustment is off
by 30 minutes or more in either direction, it is seen that the correlation has disappeared.
Comparing ping with Surveyor
We made some high statistics (~250K samples)
long term measurements with ping from SLAC to
CERN from May 9 thru May 12, 1999. The pings were made using the standard
ping utility with 100 data bytes
(including the 8 ICMP bytes but not the IP header), were made at one
second intervals and had a timeout of 20 seconds. The host (ping client)
issuing the ping echo requests was an IBM RS/6000 250/80 running AIX 4.1.5.
It is the same host (minos) that is used for the PingER monitoring at SLAC.
The host echoing the pings (ping server) at CERN was the same host that
is monitored by PingER (ping.cern.ch).
We then investigated the causes of the low and high RTT tails. The time
distribution of pings with a high RTT (> 260 msec.) is shown below. For
Tuesday May 11th, several clusters of high RTT are apparent.
The cluster aroung 18:00 hours UTC is seen below. A route change occured
(seen both from SLAC to CERN and in the reverse direction) at about 18:10
hours UTC causing traffic to take a shorter but more congested route (note
the increase in lost packets). See
Ping high statistics results for more
details. As can be seen this change in RTT performance is also evident in the
Surveyor reports for the same period. Comparing the Surveyor graph with
the ping graph above it is also evident that the ping clusters at about
01:00, 07:00 and 14:00 hours also show up in the Surveyor data.
Scatter plotting Surveyor round
trip delays for each minute vs the ping RTT for the same minute yields the
chart below or to the right. It should be noted that the timestamps of the
pings were adjusted (see below) to the nearest minute to account
for the lack of an accurate record of time correlation between the
clocks of the
hosts making the measurements (i.e. between surveyor and minos) at the time the
measurements were made.
It is seen that the points roughly follow a straight line
with an
R2
of 0.918 indicating a strong correlation
between the two sets of measurements.
Part of the reason for the slope not being one may be due to the
difference in packet sizes used by Surveyor and these pings.
Comparing PingER and Surveyor
To enable us to compare the Surveyor data with the PingER data,
Matt Zerkauskas of the Surveyor project kindly made available to us
Surveyor data for the six pairs between CERN, FNAL and SLAC from November
1998 thru May 1999. We aggregated the Surveyor data to match the
time "ticks" used in PingER (hourly, daily, monthly).
Then we reformatted the
into PingER format and made it available via the
PingER tools.
We then exported the data from PingER to Excel and
added the delays and losses from site a to site b and
b to a to create an RTT between
a and b (see Tutorial
on Internet Monitoring and PingER at SLAC for how to combine
the one way results to come up with the round trip results.)
Pinger is more parsimonious with resources (bandwidth, disk space and cpu) and does not require a dedicated host and GPS aerial to be installed at every site. This enables it to be attractive for sites that have limited bandwdith, or are unwilling to install a dedicated host and GPS aerial. It has also turned out to be attractive to groups such as the XIWT that have limited resources to gather the data and analyze the data. Though PingER is less accurate especially at low time resolution (< an hour) it is very good for looking at long term trends and grouping of sites where limited statistcs are less of a problem.
The strong correlation, both visually and statistically, between the Surveyor and PingER data for RTT and the Surveyor and ping RTT data (on which PingER is built) indicate that the results from both projects can be used together in complementary ways.