[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]


To: "'Robert Burbidge'" <robert.burbidge@poptel.coop>, "Ietf-Provreg (E-mail)" <ietf-provreg@cafax.se>
From: "Hollenbeck, Scott" <shollenbeck@verisign.com>
Date: Mon, 12 Aug 2002 13:47:42 -0400
Sender: owner-ietf-provreg@cafax.se
Subject: RE: Byte order marks and character sets

> See http://www.w3.org/TR/REC-xml#sec-guessing which mentions 
> BOM for UTF-8.
> My understanding of this is that a BOM does exist for UTF-8, 
> although the
> document is discussing non-normative interferencing of 
> character encodings.
> The .NET framework for example will generate the EF BB BF 
> UTF-8 BOM. A few
> minutes with Google also found
> http://www.unicode.org/unicode/faq/utf_bom.html#25.
> 
> My problem seems to be that no clients read the UTF-8 BOM. I 
> think they
> should understand it even if it's optional; that's what a 
> conforming XML
> parser should do. If I'm right, then the problem is not 
> strictly a problem
> with the EPP spec but with client implementations which is outside the
> strict scope of this discussion group but hey....

Hmm, OK, now I see what you're saying.  Sigh, if folks used XML declarations
this wouldn't be an issue...

This is an XML issue, not directly an EPP issue, but it could be described
in the EPP document since the normative references aren't quite clear.
Given that it's not a normative part of the XML recommendation and it's not
mentioned in RFC 2279 I'm not sure of the best way to address this at the
moment.  I'm going to ping some of the folks I worked with when writing my
IETF XML guidelines I-D to see what they have to say.

FWIW, I know of at least one XML parser that seems to digest a UTF-8 BOM
just fine: Xerces-J v2.0.2.  I just ran a quick test and it worked properly.

-Scott-

Home | Date list | Subject list