[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

canonical form for S-expressions / binary form



There was some discussion of binary versus textual representation
representation for S-expression certificates in Memphis.  I believe
Tatu wished to avoid embedding S-expression parsers in "client"
programs (i.e., those that use but do not produce certificates, or at
least do not let users sign certificates).

I see four separate requirements for certificate forms.

1) On-the-wire format

Some have argued that this be a simple binary format, parseable
without a lisp "read" function.

2) Signature format

This is the format over which the hash function is run when signing.

3) Document and "source" format

This is analogous to the format being used on the mailing list in
discussing semantics and processing, and the format which might be
used in specifications.

4) Presentation language

For expert users, format 3 may be the preferred way to view
certificates.
There might be a need for a format that is more easily understood by
those who don't deeply understand certificate semantics.
There might be a need for converting key hashes found in certificates
to name references using other (verified) certificates, so that the
user doesn't have to do this.  For example, I would like to see a
representation of the certificates used in a reduction, glued together
so that I don't have to compare hash values.


Given this list of needed formats, I would argue that (1) and (2)
should be identical.

The current format for (3) being used seems to be generally agreeable.

There should be a function mapping a 3-form to a 12-form.
There should be a function mapping a 12-form to the canonical version
of a 3-form.  Adding whitespace (pretty-print) should create another
3-form whose 3->12 mapping is the same.

I believe that specification of the 4-form should be deferred or
should be left as a local implementation issue.


I have no problem with Ron's proposal 

	Example: The S-expression 
		(a b (cd e fgh)) 
	has canonical form for signing
		(#1:a#1:b(#2:cd#1:e#3:fgh))

as formats '1' and '2'.  I'm not sure if this is meant to be 7-bit
clean, or if arbitrary-valued byte strings are allowed.  Perhaps if
non-ASCII 'strings' are represented by some hex encoding in the
'3-form', they could be collapsed back to Ron's format for
compactness.

The grammar above looks like it has '(', ')', and '#' as initially
significant characters.  One might consider making the length binary
rather than ASCII encoded to save a few bytes.

In general, I think it would be reasonable to define the 12-form to be
easily machine parseable and compact, without any concern for human
readability.  As long as the 12->3 function is straightforward, humans
need only ever view the 3-form.  This could allow both the 'binary'
folks and the 'S-expression' folks to be happy at the same time.

        Greg Troxel <gdt@bbn.com>