[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Base-64 encoding proposal




I think the "right" way to think of base-64 encodings is as follows.
Rather than thinking of base-64 as a way of encoding some object
(a byte string or maybe an S-expression), just think of base-64 
as a 6-bit per character channel rather than an 8-bit per character
channel.  You just have to read enough 6-bit chars to get your
next 8 bits of data, which you spit out.  This has nice side effects,
such as eliminating the need for a separate "fragmentation" mechanism.

Specific proposal:

Use braces to delimit the 6-bit channel portions:

  ... 8-bit-channel-here ... { ... 6-bit-channel-here ... }  ... 8-bit...

In the 6-bit channel we use the usual six-bit characters:
	A -- Z    a -- z   0 -- 9  +  /
having respective significance 0...63.  

In the six-bit channel four six-bit characters yield three eight-bit
characters.  In general, s six-bit characters yield t = floor(s*6/8)
eight bit characters, with  s*6-t*8 bits "left over" waiting for more
input to happen.  When the six-bit channel finishes (with the closing
right brace) any waiting bits are thrown away.   For example, the
input
	{AA}
represents a single eight-bit character (having value zero); four bits
are thrown away.  The value
	{BBB}
represents two eight-bit characters (with two bits thrown away), and
	{CCCC}
represents three eight-bit characters (with no bits thrown away).
	
The six-bit channel is thus just an alternative way of coding up eight-bit
characters that is more robust against email damage, etc.  Once the
input is retranslated back into 8-bit characters, they are processed
just as if they were originally sent in the eight-bit channel.

In the 6-bit channel characters other than those listed above (and the
right brace) may be present, but are IGNORED.   In particular, white
space is ignored.  Thus, a switch into a six-bit channel can allow
one to insert line breaks or white-space.  For example,

	mary-had-a-lit{
	}tle-lamb-its-flee{
	}ce-was-black-as-coal

is the same as
	
	mary-had-little-lamb-its-fleece-was-black-as-coal

Thus, we do not need a separate fragmentation mechanism.  Within a
six-bit channel, it is recommended to include some whitespace after
each 64 six-bit characters.

Note that the eight-bit characters transmitted using the six-bit
channel have the same significance as if they were transmitted
directly.  Thus, if 
	{X7}
represents a left parenthesis (I didn't check this), then
	{X7}a)
is the same as
	(a)

An input channel opens by default in 8-bit mode, and needs to be
switched into 6-bit mode using a left brace, if that's what you
want.

One can choose to code up an entire S-expression this way, or only
parts of it.  The six-bit coding is entirely invisible to the rest
of the SPKI/SDSI mechanism, and doesn't even need to be mentioned again.

Cheers,
	Ron Rivest




Follow-Ups: