[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]


To: dnsop@cafax.se, aroot@ops.ietf.org, hardie@equinix.com (Ted Hardie)
From: hardie@equinix.com
Date: Tue, 31 Oct 2000 13:32:58 -0800 (PST)
Reply-to: hardie@equinix.com
Sender: owner-dnsop@cafax.se
Subject: Anycast root metrics and analysis

I've attached the data for the aroot reachability times from here for
10/30/2000.  As you'll note, there are long stretches where things are
fine, but also lots of 10 second + spikes which indicate that the
service was essentially unreachable.

Looking over the past few sets of data, that is a pretty typical graph
for us.  Thinking about that, I've realized that one of things that
worries me about this anycast methodology is that I'm not sure how
well it plays with traditional criteria for selecting among the roots.
The BIND selection system uses round trip time as the fundamental
measurement for determining which root server will be used.  This
isn't a single measurement, of course, but a race run and re-run based
on the number of requests coming in.  The use of RTT and periodic
checks on server response time recognizes the importance of quick
responses for the DNS system; it also recognizes that the selection of
which root server to use should be something that requires very little
intervention on the part of DNS operators.

What worries me is that the anycast system described in Masataka's
draft may end up requiring a fair amount of maintenance from the BGP
community to work optimally according to the DNS's native metrics.
The way things are put together now, this system would increase the
number of servers acting as roots but would not increase the number of
servers available for selection to any one location on the network.
Without any additional tweaking, an individual network will get the
anycast server which is reachable via the shortest AS path-length.
That may or may not map onto the server instance that would generate
the best performance.  Through our Verio link, for example, we reach
the embratel instance, but the best performance would be reached to
the mbone-cern aroot (as tracerouting its other interface
demonstrates).  As Masataka's draft notes, this is fixable, but it
isn't trivial.

As an example, I am AS 14609.

The AS used in this experiment is 4128.  Imagine that 559 and 1 are
trusted peers I want to hear it from. I want to reject everyone else.

Of course, it is best if I peer directly with 559 or 1.  If not, I
need to make sure my upstream will announce routes to the Anycast AS
thru 559 or 1, and not some less-preferred provider.  This is not
impossible, and is typical of today's inter-provider coordination, but
just another thing that could get lost thru the cracks.

# Cisco IOS snippet

# as-path route filter: only accept routes to AS 4128
# when it is seen immediately behind AS 559 or 1.
ip as-path access-list 75 permit _559_4128$
ip as-path access-list 75 permit _1_4128$
ip as-path access-list 75 deny _4128_

# apply the as-path filter via a route map
route-map EXT-PEER permit 4128
 match as-path 75

router bgp 14609
 # set up a peer-group policy that uses our route map
 # (among other typical policies)
 neighbor ext-peers peer-group
 neighbor ext-peers next-hop-self
 neighbor ext-peers send-community
 neighbor ext-peers remove-private-AS
 neighbor ext-peers version 4
 neighbor ext-peers soft-reconfiguration inbound
 # distribute list 100 has the bogon filters: RFC1918, etc.
 neighbor ext-peers distribute-list 100 in
 neighbor ext-peers distribute-list 100 out
 neighbor ext-peers route-map EXT-PEER out

 # apply the peer group to each peer BGP session
 neighbor 10.2.2.2 remote-as 559
 neighbor 10.2.2.2 peer-group ext-peers
 neighbor 10.2.2.2 description "peering with anycast-clueful AS 559"

 neighbor 10.2.2.4 remote-as 1
 neighbor 10.2.2.4 peer-group ext-peers
 neighbor 10.2.2.4 description "peering with anycast-clueful AS 1"


The access lists in the route filter would need to be updated whenever
a different instance of the anycast server is notably better.  Of
course, none of this would be necessary if there were a perfect match
between AS Path length and RTT performance, but that match isn't
perfect.  That means that maintaining the best access to the full set
of available roots requires separate testing and interdomain routing
updates rather than simply periodically checking the results of a race
condition.

In Masataka's draft, one of the arguments put forward is that there is
a parallel between locally scoped unicast addresses and the use of
anycast addresses (section 2, "Suggested Operation").  In the case
that an anycast system is so well distributed that every AS runs its
own copy of an anycast AS and root server, the parallel would indeed
be strong.  In the current case, though, the very small number of
participants in the system seems to demonstrate that the BGP policy of
the adjacent and transit ASes is going to dominate the selection of
anycast root server instances.  That produces a conflict of metrics
and suboptimal results.

You can, of course, argue that the increase in topological diversity
is worth this and that the network will move toward optimal results as
increasing numbers of ASes offer transit to their local copies of the
root server.  I would like to suggest, though, that there is
compromise possibility here.  That would be take one of the existing
root servers and assign it and its network to a new AS.  Any network
which wanted to run a local copy of the root could do so using that AS
and network, but they would be forbidden from offering transit to that
AS.  The current provider could maintain the existing service by
offering transit through their current AS to the new AS.  This would
mean that the BGP community would need to filter out any other
announcements which might happen to leak, but it could become a
standard part of the "bogon" filters and would not need a great deal
of continuing support.  That seems to offer both increased topological
diversity, a limited operational impact to the BGP community, and
some clarification of the "whose neck to wring" problem (since the
answer is always either your own or the existing provider)

What do others think of this analysis?
				regards,
					Ted Hardie







Home | Date list | Subject list