[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: encodings: do we need binary at all? -Reply



I'd like to distinguish between representing things in binary, which I think
is a generally good thing for the purposes of on-the-wire efficiency and
server parsing performance, and the use of ASN.1, which is really just a
formal description of data structures, and BER, which is a particular
mechanism for encoding data.

Why binary? because there is much data in the world which is not
represented as 7-bit or even 8-bit ascii strings.  Unicode and 4-byte
character strings come to mind, which are important string types for
names (unless you're only interested in Western European names).  For
those of us trying to deliver product value into the Pacific Rim, or into the
Middle East and Africa, this is a real issue.  Novell couldn't use the X.509
certificate of 1988 in part because it had insufficient naming attribute
support for our Unicode names.  And no, we do NOT restrict our
commercial customers to using only the US ASCII character set for
naming.  And we will not. (or perhaps you'd like to mandate that all
internet appliance users learnAmerican English in order to play in your
world?)

ASN.1 is just a formal language to describe content.  I see it as a
convenience (or, rather, I would if compilers for it were easier to latch
onto).  Make no mistake, the packet definitions of DNS or HTTP or Photuris
make similar formal requirements on data content and structure.  They
also describe the on-the-wire encoding.

I'm not crazy about BER, but I like it better than 7-bit ascii.  Novell does a
rather IETF-like thing - our on-the-wire encodings are those which are
most efficient for our market place - c language memory structures are
rendered into 32-bit aligned, Intel memory, and copied to the wire.  That
makes sense for us because the vast majority of our server and client
market place run on Intel architecture processors.  Big Endian machines
have to marshal (ie, byte swap) because there are fewer of them in our
market space (up to now, anyway) (personally, I HATE little Endian
memory layout, so I'm NOT trying to start a flame war on that ground - nor
am I trying to encourage others to adopt our approach, I'm just describing
what we do).  

Let's be clear - that's why XDR and network byte order look the way they
do - because the predominant machines in use by their developers -
Motorola-based Sun workstations, for instance - used that ordering.  Too
bad Intel didn't, but it didn't.  But for our principal marketplace, clients and
servers don't have to marshal because they share the same
byte-ordering.

Now, it's possible to do better than either XDR or Novell does today, by
having the client and server agree on the ordering before transfers begin,
so if they share the same ordering, then neither needs to marshall.  IDL
does that, and it seems like a good thing.  I'd go another step further and
ask that the client be able to choose whether it will or won't marshall, so
the server can be tasked with marshalling for the client if the client is
really light weight, or the server can be freed from marshalling if the client
has CPU to spare.  Both cases have to be dealt with.

But what has any of this to do with certificate encodings?  The dominant
issue for certificates is that the hash of the  certificate contents must be
able to be reproduced, regardless of who is marshalling or not
marshalling, regardless of whether the producer or consumer is big
endian or little endian or some other endian.

If some alternative certificate structure is to be chosen it must have the
following characteristics, I think:

1) support string attributes which may include embedded nulls - which
pretty much mandates some sort of length-prepended string format.  This
is so its not necessary to pad or replace nulls to trick scanf parsers.  In
fact, it is so that parsing the certificate doesn't require a scanf-like
character-by-character examination at all. "oops, got a string here, so
copy [length] octets if you want to copy it at all."

2) Ideally, avoid having to copy data between buffers at all...a pointer into
the certificate ought to let you reference the contents.  When you have
1000 or more simultaneous requests hitting a server, you don't want it
spending all its time copying data from one place in memory to another -
you can, but that's not how you get any performance.  And I realize you
may have to byteswap in place for certain data types, and that's okay.

3) The rules for computing hashes on the certificate must be agreed
upon.  Yes, its easier if you treat the certificate as a single octet string,
but that's not the only way to do it as long as your hashing rules for
padding and byte ordering are consistently implemented.  In reality,
rendering the certificate into a single continguous byte-order specific
array of octets just pushes the problem out of the hash algorithm into the
BER/XDR/NDR/DER/Whatever encoder code.  Where would you rather
have it?

Enough.  Yes we need binary encodings (so we don't have a gratuitous
+12.5% increase in data size, for one thing).  We also need
length-preceded strings, so the recipient (usually a server) doesn't have
to spend a lot of time trying to guess where you meant a string to end.

Ed Reed
Novell