[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Base-64 encoding proposal
I think the "right" way to think of base-64 encodings is as follows.
Rather than thinking of base-64 as a way of encoding some object
(a byte string or maybe an S-expression), just think of base-64
as a 6-bit per character channel rather than an 8-bit per character
channel. You just have to read enough 6-bit chars to get your
next 8 bits of data, which you spit out. This has nice side effects,
such as eliminating the need for a separate "fragmentation" mechanism.
Specific proposal:
Use braces to delimit the 6-bit channel portions:
... 8-bit-channel-here ... { ... 6-bit-channel-here ... } ... 8-bit...
In the 6-bit channel we use the usual six-bit characters:
A -- Z a -- z 0 -- 9 + /
having respective significance 0...63.
In the six-bit channel four six-bit characters yield three eight-bit
characters. In general, s six-bit characters yield t = floor(s*6/8)
eight bit characters, with s*6-t*8 bits "left over" waiting for more
input to happen. When the six-bit channel finishes (with the closing
right brace) any waiting bits are thrown away. For example, the
input
{AA}
represents a single eight-bit character (having value zero); four bits
are thrown away. The value
{BBB}
represents two eight-bit characters (with two bits thrown away), and
{CCCC}
represents three eight-bit characters (with no bits thrown away).
The six-bit channel is thus just an alternative way of coding up eight-bit
characters that is more robust against email damage, etc. Once the
input is retranslated back into 8-bit characters, they are processed
just as if they were originally sent in the eight-bit channel.
In the 6-bit channel characters other than those listed above (and the
right brace) may be present, but are IGNORED. In particular, white
space is ignored. Thus, a switch into a six-bit channel can allow
one to insert line breaks or white-space. For example,
mary-had-a-lit{
}tle-lamb-its-flee{
}ce-was-black-as-coal
is the same as
mary-had-little-lamb-its-fleece-was-black-as-coal
Thus, we do not need a separate fragmentation mechanism. Within a
six-bit channel, it is recommended to include some whitespace after
each 64 six-bit characters.
Note that the eight-bit characters transmitted using the six-bit
channel have the same significance as if they were transmitted
directly. Thus, if
{X7}
represents a left parenthesis (I didn't check this), then
{X7}a)
is the same as
(a)
An input channel opens by default in 8-bit mode, and needs to be
switched into 6-bit mode using a left brace, if that's what you
want.
One can choose to code up an entire S-expression this way, or only
parts of it. The six-bit coding is entirely invisible to the rest
of the SPKI/SDSI mechanism, and doesn't even need to be mentioned again.
Cheers,
Ron Rivest
Follow-Ups: