To:
dnsop@cafax.se, aroot@ops.ietf.org, hardie@equinix.com (Ted Hardie)
From:
hardie@equinix.com
Date:
Tue, 31 Oct 2000 13:32:58 -0800 (PST)
Reply-to:
hardie@equinix.com
Sender:
owner-dnsop@cafax.se
Subject:
Anycast root metrics and analysis
I've attached the data for the aroot reachability times from here for 10/30/2000. As you'll note, there are long stretches where things are fine, but also lots of 10 second + spikes which indicate that the service was essentially unreachable. Looking over the past few sets of data, that is a pretty typical graph for us. Thinking about that, I've realized that one of things that worries me about this anycast methodology is that I'm not sure how well it plays with traditional criteria for selecting among the roots. The BIND selection system uses round trip time as the fundamental measurement for determining which root server will be used. This isn't a single measurement, of course, but a race run and re-run based on the number of requests coming in. The use of RTT and periodic checks on server response time recognizes the importance of quick responses for the DNS system; it also recognizes that the selection of which root server to use should be something that requires very little intervention on the part of DNS operators. What worries me is that the anycast system described in Masataka's draft may end up requiring a fair amount of maintenance from the BGP community to work optimally according to the DNS's native metrics. The way things are put together now, this system would increase the number of servers acting as roots but would not increase the number of servers available for selection to any one location on the network. Without any additional tweaking, an individual network will get the anycast server which is reachable via the shortest AS path-length. That may or may not map onto the server instance that would generate the best performance. Through our Verio link, for example, we reach the embratel instance, but the best performance would be reached to the mbone-cern aroot (as tracerouting its other interface demonstrates). As Masataka's draft notes, this is fixable, but it isn't trivial. As an example, I am AS 14609. The AS used in this experiment is 4128. Imagine that 559 and 1 are trusted peers I want to hear it from. I want to reject everyone else. Of course, it is best if I peer directly with 559 or 1. If not, I need to make sure my upstream will announce routes to the Anycast AS thru 559 or 1, and not some less-preferred provider. This is not impossible, and is typical of today's inter-provider coordination, but just another thing that could get lost thru the cracks. # Cisco IOS snippet # as-path route filter: only accept routes to AS 4128 # when it is seen immediately behind AS 559 or 1. ip as-path access-list 75 permit _559_4128$ ip as-path access-list 75 permit _1_4128$ ip as-path access-list 75 deny _4128_ # apply the as-path filter via a route map route-map EXT-PEER permit 4128 match as-path 75 router bgp 14609 # set up a peer-group policy that uses our route map # (among other typical policies) neighbor ext-peers peer-group neighbor ext-peers next-hop-self neighbor ext-peers send-community neighbor ext-peers remove-private-AS neighbor ext-peers version 4 neighbor ext-peers soft-reconfiguration inbound # distribute list 100 has the bogon filters: RFC1918, etc. neighbor ext-peers distribute-list 100 in neighbor ext-peers distribute-list 100 out neighbor ext-peers route-map EXT-PEER out # apply the peer group to each peer BGP session neighbor 10.2.2.2 remote-as 559 neighbor 10.2.2.2 peer-group ext-peers neighbor 10.2.2.2 description "peering with anycast-clueful AS 559" neighbor 10.2.2.4 remote-as 1 neighbor 10.2.2.4 peer-group ext-peers neighbor 10.2.2.4 description "peering with anycast-clueful AS 1" The access lists in the route filter would need to be updated whenever a different instance of the anycast server is notably better. Of course, none of this would be necessary if there were a perfect match between AS Path length and RTT performance, but that match isn't perfect. That means that maintaining the best access to the full set of available roots requires separate testing and interdomain routing updates rather than simply periodically checking the results of a race condition. In Masataka's draft, one of the arguments put forward is that there is a parallel between locally scoped unicast addresses and the use of anycast addresses (section 2, "Suggested Operation"). In the case that an anycast system is so well distributed that every AS runs its own copy of an anycast AS and root server, the parallel would indeed be strong. In the current case, though, the very small number of participants in the system seems to demonstrate that the BGP policy of the adjacent and transit ASes is going to dominate the selection of anycast root server instances. That produces a conflict of metrics and suboptimal results. You can, of course, argue that the increase in topological diversity is worth this and that the network will move toward optimal results as increasing numbers of ASes offer transit to their local copies of the root server. I would like to suggest, though, that there is compromise possibility here. That would be take one of the existing root servers and assign it and its network to a new AS. Any network which wanted to run a local copy of the root could do so using that AS and network, but they would be forbidden from offering transit to that AS. The current provider could maintain the existing service by offering transit through their current AS to the new AS. This would mean that the BGP community would need to filter out any other announcements which might happen to leak, but it could become a standard part of the "bogon" filters and would not need a great deal of continuing support. That seems to offer both increased topological diversity, a limited operational impact to the BGP community, and some clarification of the "whose neck to wring" problem (since the answer is always either your own or the existing provider) What do others think of this analysis? regards, Ted Hardie