To:
"'Robert Burbidge'" <robert.burbidge@poptel.coop>, "Ietf-Provreg (E-mail)" <ietf-provreg@cafax.se>
From:
"Hollenbeck, Scott" <shollenbeck@verisign.com>
Date:
Mon, 12 Aug 2002 12:30:14 -0400
Sender:
owner-ietf-provreg@cafax.se
Subject:
RE: Byte order marks and character sets
> 1. Page 5 of current draft says "EPP use with current > character sets other > than UTF-8 is not described in the document". Is this meant > to imply that > the current draft assumes UTF-8 for all XML document > interchange? If so it > would be preferable to say so in explicit terms rather than the double > negative at present. This is addressed in more detail in section 5, though perhaps a "UTF-8 is RECOMMENDED" would make things more clear. I don't think it wise to say that only UTF-8 can be used as the protocol may clearly be useful in localized environments where something other than UTF-8 or UTF-16 might be more useful. > 2. Byte order marks are only considered in passing on page 76 > in reference > to RFC3023. This RFC is relevant and useful if an underlying > transport uses > MIME headers for passing EPP documents around. However, I > don't think it > offers any useful advice fo transports that don't have a MIME type or > similar - such as the current TCP binding specification. Our > EPP server > interprets incoming BOMs if the client provides them, and if no BOM is > provided UTF-8 encoding is assumed. But when we generate > responses, should > we provide the UTF-8 BOM or not? In an ideal world all > clients would have a > conforming XML parser and it would not matter for UTF8 data. > However, having > seen some of the published EPP clients it looks like no one > handles BOM > correctly for server responses. [Note: if you have written a > client that > does handle BOM correctly please accept my apologies!] So what's your suggested improvement? Section 4.3.3 of the XML rec (a normative reference) already says that "XML processors must be able to use this character to differentiate between UTF-8 and UTF-16 encoded documents", and there's no mention of BOMs in RFC 2279 (UTF-8) so I don't know what you mean by "the UTF-8 BOM". > 3. TCP bindings section 5 "does not introduce or present any > internationalization or localization issues". If we don't make the BOM > situation explicit then we absolutely will be introducing > long term i18n > issues for unicode (etc.) contact or domain names. Sorry, I don't see BOMs as a transport issue. It's an encoding issue that should be addressed in the core document, and as far as I can tell it already is through the normative reference to the W3C XML recommendation. -Scott-