[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]


To: Edward Lewis <Ed.Lewis@neustar.biz>, Stephane Bortzmeyer <bortzmeyer@nic.fr>
Cc: Dan Maharry <dan@mcd.coop>, ietf-provreg@cafax.se
From: Ted Hardie <hardie@qualcomm.com>
Date: Thu, 5 Apr 2007 11:41:46 -0700
In-Reply-To: <a06230906c23adfcbbfbc@[192.168.1.102]>
Sender: owner-ietf-provreg@cafax.se
Subject: [ietf-provreg] Re: EPP Extensions for IDN

At 1:37 PM -0400 4/5/07, Edward Lewis wrote:
>At 18:21 +0200 4/5/07, Stephane Bortzmeyer wrote:
>>On Wed, Apr 04, 2007 at 03:31:27PM +0100,
>> Dan Maharry <dan@mcd.coop> wrote
>> a message of 60 lines which said:
>>
>>> - the language code for the domain name
>>
>>No, IDN is about scripts, not about languages, so there is no reason
>>to transmit the language.

You know, this stuff is pretty hard.  There are several things just mildy
wrong with the text below.  I'll try to correct them, but the key
thing to remember is probably this:  for registrations within a domain,
a policy needs to set out which code points may be registered,
which may not be *at all*, and which code points will be considered
equivalent (and so may not be registered independently of their
equivalent brethren).    RFC 3743 is worthwhile reading for one
set of registries take on this; their approach is conditioned by
complexities that don't hit everyone, but is still worth trying
to bend your brain around.

>I agree with Stephane.  I was about to reply the opposite way yesterday but then I consulted our in-house expert on such matters.
>
>E.g., If you are in Taiwan and write "east capital" in the local written language you get two Han characters.  If you are in Japan and write "east capital" in the local written language you get the same exact looking pair of Kanji/Han characters.  Never mind that one is Chinese and the other Japanese and thus differ when read aloud, they mean the same thing and look the same on paper.

That is, after CJK unification in Unicode, the same code points are used in both
China and Japan for Han characters/Kanji.  If you select the code points that
are used for "Tokyo" in Kanji, it will also be semantically correct in Chinese
in traditional (aka "complex") characters.

>Now shift to the People's Republic of China.  You will get a different first character but when read aloud get the same sound as in Taiwan.  That's because the mainland has "simplified" (reduced the strokes) of the Han for "east."

There is a second code point for the simplified character for the first of the two
ideographs used in Tokyo.  It is semantically equivalent, and a registration
policy might connect the two.

>Let's shift to transliterating the Han to Latin characters.  You might get at least two different and possibly three different strings of Latin characters and marks corresponding to the same concept. (BTW, "east capital" is most commonly known around the world as Tokyo.)  E.g., "Middle mountain" is written as Zhongshan in the PRC and Cheongshan in Taiwan.  I don't know "Middle" in Japanese romaji.

Hmm, the Wade-Giles transliteration for Zhong is Chung, so I would say that the
pinyin transliteration Zhongshan maps to the Wade-Giles transliteration
Chungshan.

>At this point my expert said - for IDN policies we stick to just what are the equivalent looking characters.  The two native script forms (the Taiwan/Japan and the PRC) are considered equivalent, but not other semantic, homonyms, or transliterated forms.
> IDN cares about the look (the script) and not the language meaning. 

I suspect your expert was simplifying this for you.  Equivalent looking won't
work by itself, as there are code points that look equivalent and are valid
Unicode, but are generally excluded.  I think your expert was trying to get
at the idea that they consider the ideographs distinct from any transliteration,
but may treat ideographs as equivalent if they see one as a variant of the
other (see 3473 for the language variant table version of this idea).


>I.e., if I register "tokyo.biz" I do not consider "dongjing.biz" or "dong1jing1.biz" or "$1$2.biz as variants - where the $1 and $2 are the appropriate characters.  I don't get "activity.biz" (based on dong4jing4 meaning (n)activity) either.  (To an untrained ear, the tones of the Chinese language are lost making dong1jing1 sound like dong4jing4.)

In general, pronunciation is a very large rathole, as there are multiple dialects
and things that sound the same will vary from dialect to dialect.

>My new understanding is that EPP is already suitable for IDN.  I realized this after my headache evaporated.
>
>PS - My mail would be a lot clearer if my mail reader was capable of more than just ASCII. ;)
			regards,
				Ted Hardie

Home | Date list | Subject list