[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Display types?

>>>>> "Niels" == Niels =?ISO-8859-1?Q?M=F6ller?= <nisse@lysator.liu.se> writes:

 Niels> Paul Koning <pkoning@xedia.com> writes:
 >> Why would the mismatch between character count and byte count
 >> affect a human?  I must be missing something here.  If variable
 >> length is the issue then Unicode with wide chars would serve
 >> (provided they don't start using codes outside the basic
 >> multinational plane, i.e., codes outside the 16-bit space).  UTF-8
 >> has an advantage, though, in that it encodes characters from the
 >> Latin-1 set with the same bitstrings as Latin-1 does.

 Niels> The last sentence is wrong. Perhaps you meant some other
 Niels> encoding than UTF-8 here? For instance, my last name (written
 Niels> in hex, to survive any transformations in the mail system) is
 Niels> "4df6 6c6c 6572" in latin-1, "4dc3 b66c 6c65 72", and "004d
 Niels> 00f6 006c 006c 0065 0072" in UTF-16 (assuming network byte
 Niels> order).

Right; as has been pointed out, I was mixed up with 7-bit ASCII
backwards compatibility. 

The UFT-8 and Unicode discussion in the Plan9 document that was quoted 
earlier makes the point that Unicode in its 16-bit form has the major
problem that the byte order isn't well defined and cleanly handled.
That does sound familiar, and it's totally unacceptable to introduce
such a thing anywhere.


Follow-Ups: References: