[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

binary vs. S-expressions



As I said at the SPKI working group meeting, I strongly think
that we should use a binary-only format.

I have a number of reasons for this opinion:

1. S-expression format adds approximately 1000 lines of parsing code
   in every implementation.  In addition, another 2000 or so lines
   will be needed for testing all variations of the format.

2. The ascii format has lots of potential for interoperability
   problems, misunderstandings, and implementation bugs.  
   These have to do with things such as:
    - what is counted as whitespace?
    - which characters are used in base-64 encoding?
    - when is whitespace compressed (e.g., Ron's sample parser compresses
      whitespace inside strings, even hashes)
    - can strings contain 8-bit characters (e.g., Ron's sample parser
      sometimes indexes an array with negative values on systems with
      signed chars)
    - there are likely to be subtle implementation bugs that may not
      even come up in testing (e.g., representing strings as
      null-terminated C strings will fail, because ascii 0 is a valid
      character in SPKI strings).
   One is much less likely to do this sort of mistakes with a simple
   length-prefixed binary format.

3. The half-binary ascii format is not reliably transferable over
   e-mail.  It is still binary, can contain arbirary characters
   (including nulls or mime escapes).  National characters are likely
   to cause major problems and in particular support headaches,
   as people don't understand why things sometimes fail when they works
   most of the time.

4. One cannot display the S-expression format to a user and assume the
   user to be able to make some resonable determination based on it.
   The certificate might start with a comment on the first line, have
   the comment contain a neatly laid out valid-looking certificate,
   and have the real certificate be outside the window or screen on
   the last line.  Fostering the illusion that the ascii certificate
   is meaningful to users is likely to lead to subtle security
   problems due to the user taking it as something other than what it
   is.  It is better to only show the user true, parsed data about the
   certificate.

5. The S-expression format is very verbose.  This will cause problems
   with smartcards.

As Ron points out, readable formats do help a bit in debugging and
troubleshooting.  However, the relevance of this is small after the
initial development; from then on, reliability, robustness and
interoperability are the major issues.


I strongly suggest making the certificate format pure binary (perhaps
something along the lines of Carl's initial draft).  I'm willing to
work with other people to produce a draft on this.

I also want to point out that there is need for a *standard* pure
7-bit ascii format that can be passed reliably in e-mail and
documents, much like PGP certificates.  My understanding is that the
current S-expression format does not fill this role, as it may contain
components that may interfere with e-mail processing (mime escapes,
lines starting "From", particular types of newlines/tabs in strings,
binary data, etc.).  This format could be a simple base-64 encoding of
the binary format, similar to what PGP uses (with minor changes to
avoid problems with some mail systems clobbering it).

Clients, servers, and certification tools should not need to know
anything about any other format than the binary format.  (Some people
have suggested having both ascii and binary formats to support; that
is the worst possible alternative.)

    Tatu

Follow-Ups: References: