[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]


To: "'Robert Burbidge'" <robert.burbidge@poptel.coop>, "Ietf-Provreg (E-mail)" <ietf-provreg@cafax.se>
From: "Hollenbeck, Scott" <shollenbeck@verisign.com>
Date: Mon, 12 Aug 2002 12:30:14 -0400
Sender: owner-ietf-provreg@cafax.se
Subject: RE: Byte order marks and character sets

> 1. Page 5 of current draft says "EPP use with current 
> character sets other
> than UTF-8 is not described in the document". Is this meant 
> to imply that
> the current draft assumes UTF-8 for all XML document 
> interchange? If so it
> would be preferable to say so in explicit terms rather than the double
> negative at present.

This is addressed in more detail in section 5, though perhaps a "UTF-8 is
RECOMMENDED" would make things more clear.  I don't think it wise to say
that only UTF-8 can be used as the protocol may clearly be useful in
localized environments where something other than UTF-8 or UTF-16 might be
more useful.

> 2. Byte order marks are only considered in passing on page 76 
> in reference
> to RFC3023. This RFC is relevant and useful if an underlying 
> transport uses
> MIME headers for passing EPP documents around. However, I 
> don't think it
> offers any useful advice fo transports that don't have a MIME type or
> similar - such as the current TCP binding specification. Our 
> EPP server
> interprets incoming BOMs if the client provides them, and if no BOM is
> provided UTF-8 encoding is assumed. But when we generate 
> responses, should
> we provide the UTF-8 BOM or not? In an ideal world all 
> clients would have a
> conforming XML parser and it would not matter for UTF8 data. 
> However, having
> seen some of the published EPP clients it looks like no one 
> handles BOM
> correctly for server responses. [Note: if you have written a 
> client that
> does handle BOM correctly please accept my apologies!]

So what's your suggested improvement?  Section 4.3.3 of the XML rec (a
normative reference) already says that "XML processors must be able to use
this character to differentiate between UTF-8 and UTF-16 encoded documents",
and there's no mention of BOMs in RFC 2279 (UTF-8) so I don't know what you
mean by "the UTF-8 BOM".

> 3. TCP bindings section 5 "does not introduce or present any
> internationalization or localization issues". If we don't make the BOM
> situation explicit then we absolutely will be introducing 
> long term i18n
> issues for unicode (etc.) contact or domain names.

Sorry, I don't see BOMs as a transport issue.  It's an encoding issue that
should be addressed in the core document, and as far as I can tell it
already is through the normative reference to the W3C XML recommendation.

-Scott-

Home | Date list | Subject list