To:
Edward Lewis <Ed.Lewis@neustar.biz>, Stephane Bortzmeyer <bortzmeyer@nic.fr>
Cc:
Dan Maharry <dan@mcd.coop>, ietf-provreg@cafax.se
From:
Ted Hardie <hardie@qualcomm.com>
Date:
Thu, 5 Apr 2007 11:41:46 -0700
In-Reply-To:
<a06230906c23adfcbbfbc@[192.168.1.102]>
Sender:
owner-ietf-provreg@cafax.se
Subject:
[ietf-provreg] Re: EPP Extensions for IDN
At 1:37 PM -0400 4/5/07, Edward Lewis wrote: >At 18:21 +0200 4/5/07, Stephane Bortzmeyer wrote: >>On Wed, Apr 04, 2007 at 03:31:27PM +0100, >> Dan Maharry <dan@mcd.coop> wrote >> a message of 60 lines which said: >> >>> - the language code for the domain name >> >>No, IDN is about scripts, not about languages, so there is no reason >>to transmit the language. You know, this stuff is pretty hard. There are several things just mildy wrong with the text below. I'll try to correct them, but the key thing to remember is probably this: for registrations within a domain, a policy needs to set out which code points may be registered, which may not be *at all*, and which code points will be considered equivalent (and so may not be registered independently of their equivalent brethren). RFC 3743 is worthwhile reading for one set of registries take on this; their approach is conditioned by complexities that don't hit everyone, but is still worth trying to bend your brain around. >I agree with Stephane. I was about to reply the opposite way yesterday but then I consulted our in-house expert on such matters. > >E.g., If you are in Taiwan and write "east capital" in the local written language you get two Han characters. If you are in Japan and write "east capital" in the local written language you get the same exact looking pair of Kanji/Han characters. Never mind that one is Chinese and the other Japanese and thus differ when read aloud, they mean the same thing and look the same on paper. That is, after CJK unification in Unicode, the same code points are used in both China and Japan for Han characters/Kanji. If you select the code points that are used for "Tokyo" in Kanji, it will also be semantically correct in Chinese in traditional (aka "complex") characters. >Now shift to the People's Republic of China. You will get a different first character but when read aloud get the same sound as in Taiwan. That's because the mainland has "simplified" (reduced the strokes) of the Han for "east." There is a second code point for the simplified character for the first of the two ideographs used in Tokyo. It is semantically equivalent, and a registration policy might connect the two. >Let's shift to transliterating the Han to Latin characters. You might get at least two different and possibly three different strings of Latin characters and marks corresponding to the same concept. (BTW, "east capital" is most commonly known around the world as Tokyo.) E.g., "Middle mountain" is written as Zhongshan in the PRC and Cheongshan in Taiwan. I don't know "Middle" in Japanese romaji. Hmm, the Wade-Giles transliteration for Zhong is Chung, so I would say that the pinyin transliteration Zhongshan maps to the Wade-Giles transliteration Chungshan. >At this point my expert said - for IDN policies we stick to just what are the equivalent looking characters. The two native script forms (the Taiwan/Japan and the PRC) are considered equivalent, but not other semantic, homonyms, or transliterated forms. > IDN cares about the look (the script) and not the language meaning. I suspect your expert was simplifying this for you. Equivalent looking won't work by itself, as there are code points that look equivalent and are valid Unicode, but are generally excluded. I think your expert was trying to get at the idea that they consider the ideographs distinct from any transliteration, but may treat ideographs as equivalent if they see one as a variant of the other (see 3473 for the language variant table version of this idea). >I.e., if I register "tokyo.biz" I do not consider "dongjing.biz" or "dong1jing1.biz" or "$1$2.biz as variants - where the $1 and $2 are the appropriate characters. I don't get "activity.biz" (based on dong4jing4 meaning (n)activity) either. (To an untrained ear, the tones of the Chinese language are lost making dong1jing1 sound like dong4jing4.) In general, pronunciation is a very large rathole, as there are multiple dialects and things that sound the same will vary from dialect to dialect. >My new understanding is that EPP is already suitable for IDN. I realized this after my headache evaporated. > >PS - My mail would be a lot clearer if my mail reader was capable of more than just ASCII. ;) regards, Ted Hardie