[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Display types?



>>>>> "Carl" == Carl Ellison <cme@acm.org> writes:

 Carl> We started out with UTF-8 as the character set and the
 Carl> discussion on the list pushed us back to Latin-1.  I believe
 Carl> the primary objections were from Europe.

That's odd.  I assume you mean "Western Europe", i.e., that part of
Europe served by Latin-1.  This is really depressing, since that area
contains some countries that would scream very loudly if we went to
plain old ASCII, yet apparently don't seem to find anything wrong with 
being equally language-chauvinistic.

Unless you want to serve only the languages used in the EEC, Latin-1
is grossly inadequate.  I don't believe this is a valid approach.

 Carl> If we were to assume UTF-8 for a character set, we also have
 Carl> the problem that it's a variable width character set, which
 Carl> means that the byte count that preceeds a bytestring would not
 Carl> always equal the character count.  This would have little
 Carl> effect on a program but might get in the way of a human
 Carl> examining the canonical form from a text display.

Why would the mismatch between character count and byte count affect a 
human?  I must be missing something here.  If variable length is the
issue then Unicode with wide chars would serve (provided they don't
start using codes outside the basic multinational plane, i.e., codes
outside the 16-bit space).  UTF-8 has an advantage, though, in that it 
encodes characters from the Latin-1 set with the same bitstrings as
Latin-1 does.

	paul

References: