To:
"Ietf-Provreg (E-mail)" <ietf-provreg@cafax.se>
From:
Robert Burbidge <robert.burbidge@poptel.coop>
Date:
Mon, 12 Aug 2002 16:40:32 +0100
Sender:
owner-ietf-provreg@cafax.se
Subject:
Byte order marks and character sets
I find the current state of documentation regarding character sets and byte order marks somewhat unsatisfactory. 1. Page 5 of current draft says "EPP use with current character sets other than UTF-8 is not described in the document". Is this meant to imply that the current draft assumes UTF-8 for all XML document interchange? If so it would be preferable to say so in explicit terms rather than the double negative at present. 2. Byte order marks are only considered in passing on page 76 in reference to RFC3023. This RFC is relevant and useful if an underlying transport uses MIME headers for passing EPP documents around. However, I don't think it offers any useful advice fo transports that don't have a MIME type or similar - such as the current TCP binding specification. Our EPP server interprets incoming BOMs if the client provides them, and if no BOM is provided UTF-8 encoding is assumed. But when we generate responses, should we provide the UTF-8 BOM or not? In an ideal world all clients would have a conforming XML parser and it would not matter for UTF8 data. However, having seen some of the published EPP clients it looks like no one handles BOM correctly for server responses. [Note: if you have written a client that does handle BOM correctly please accept my apologies!] 3. TCP bindings section 5 "does not introduce or present any internationalization or localization issues". If we don't make the BOM situation explicit then we absolutely will be introducing long term i18n issues for unicode (etc.) contact or domain names. This may also have some bearing on the current discussions surrounding international postal addresses, although this is not my field of expertise ... I'll leave it for others to consider. Rob Burbidge