To:
dnsop@cafax.se
From:
Robert Elz <kre@munnari.OZ.AU>
Date:
Sat, 08 May 1999 12:45:47 +1000
Reply-To:
dnsop@cafax.se
Sender:
owner-dnsop@cafax.se
Subject:
Re: Experiments in multi-placed root servers
I think I understand what the aim is here, and how it is to be accomplished. I'm afraid I can't see what the problem is supposed to be, at least as such to prevent an experiment. The issues would seem to be whether or not the routing system can reasonably handle a route being injected in multiple different places, ending at different local nets (or originating at), but all with the same number. Then, whether the root can handle more servers. And last, if there happens to be a problem (that is local to a single server), whether it will be possible to locate at which server the problem is being caused. For the first of those, the answer is very simple - just try it. I'm sure the internet core is not going to be bothered by seeing one more number - it shouldn't matter a lot which particular version (or versions) of the number the core actually believes. Note: we don't need to stick root servers on this advertised net to test this - just pick the address at which the root server would be located, advertise the route to it from all over the place, and then have a bunch of people (like say, the members of this list) ping the address, and see if they get responses (and perhaps even measure the RTTs). A few traceroutes would not hurt either. Perhaps that has already been done, I don't know. If not, it should be. If it has, the results ought be reported. In any case, the proposal can clearly go nowhere unless this can be made to work. If the core sees the route from multiple sources, throws up its hands (metaphorically of course) and refuses to deal with the route at all, then until (perhaps) some changes are made there, nothing more can be done. If pings get through and replies are received, then there's no reason to believe that DNS packets wouldn't work just as well. On the second point, we already have 13 root servers. It is hard to believe that increasing that to 14, 15, or 16 (as a first step) is going to make anything noticeably less stable than it now is. For sure, the more servers there are the greater the probability that one of them will be misbehaving (or not behaving at all) at any particular time - but the incremental risk here looks pretty small to me. Further, this never stopped more root servers being added before. And last, on the third, we'll never really know until we test it with real live root servers. Fortunately, the failure mode here is not too unpalateable. If one of the replicated servers (one sharing an address with others) fails in some way, then a smaller fraction of the net will be affected than if one of the current root servers fails. (nb: if it simply fails to respond at all, that is normal, and the resolver will just pick a different root server address to try - it's only when bad answers are returned that there's a real problem. Fortunately from the root servers that is pretty rare (where "bad" answer here means one different from what the root zone maintainer instructed the root to reply, not one that is perhaps not the ideal reply .. eg: if by accident, the delegation of COM was removed from the root zone, giving NXDOMAIN for ever .COM lookup would not be "bad" for this purpose...) I would expect that there would be two fairly easy methods available to find such a bad root server - the most obvious is just to run a traceroute to the shared IP - the hop before the shared net number will indicate which root server is being reached from any particular source (actually run the traceroute at the time queries are being sent to it and bad answers returned, the routing system might not be a paragon of stability - but it doesn't generally flap that wildly either). Second, each of these new replicated root servers ought have two addresses. One the one that is advertised in the DNS as the IP address of X.root-servers.net, and the other which isn't advertised that way at all, but which is unique to each server (ie: an address taken from the normal addressing space wherever the server is connected). Those could be made widely known outside the DNS (perhaps with an alternate name for each of the servers, not listed asn an NS for '.'). Then, as needed, each of the servers could be trivially be queried using their alternate addresses if there's a problem. (If there's anyone attempting to set up such a root server who doesn't understand how to associate multiple addresses with the server, then they're probably not qualified to be running it). About the only other issue we need to deal with I think is the possibility of someone inserting a new route to their own server (not an authorised one, and one which does not provide the answers that root servers are supposed to provide), and capturing a part of the net that way. To an extent, that is possible right now of course, and is prevented only by the route filtering that the various net providers accomplish. I don't see that anything needs to change there, with the sole expection that for this one magic IP number, the people who do route filtering would not all be following a common database indicating where the route should be directed (who is permitted to advertise it). Or perhaps they would (I'm afraid I haven't been following the routing policy discussions in the past several years). If it is now possible for a route to originate from several sources, then this would just be one of those (it would appear just like a widely multi-homed site to the routing system). If that isn't possible, then different ISPs would simply have to pick one of the possible sources each, and allow that one. They would be wise to pick a nearby one, but nothing breaks if they simply pick one at random (perhaps one "special" one hat the rouuting policy says is "the" way to the address - by default everyone would go to one of these replicated servers, only by arrangement would a different one be used). If anything new is needed there it seems like it would be quite small - and in any case this need would be discovered, and handled (or not) during the initial "ping" test of this scheme (the first step above) - long before any root server data were affected. Lastly, I don't see in this anywhere at all where the DNS would be relying upon the routing system any more than it does now. We already rely upon it delivering packets to all the DNS servers (root, and otherwise), that's not about to change. Beyond that, if there's some other problem with this multi-address scheme, that results in packets not being delivered to this magic address, that's just equivalent to a root server being down. I'm sure that happens from time to time, and I know I don't notice it especially... I'd suggest hat the ping test be done, if it hasn't been, and once that is known to work OK, a replicated root server test be tried (just using one of the 13 servers to start with of course). kre