# Base-64 proposal

```
Please ignore my last response to George Michaelson's questions.  He
responding.  Here is a better reply, and a slightly modified base64
proposal.

* The basic model for SPKI/SDSI is an ordinary 8-bit channel providing a
sequence of bytes (octets) that are then interpreted according to the
SPKI/SDSI rules to be parsed into lists, etc.

* In some cases it is desirable to code the elements of the 8-bit channel
with 6-bit (base-64) characters (hextets?) for protection against
mailer and channel damage.  The usual encoding (RFC 1521) uses characters
A -- Z   a -- z  0 -- 9  + /
to denote the hextets with decimal value 0--63, respectively.

* Thus, we propose an "augmented 8-bit channel" that allows one to drop
into six-bit mode temporarily.  We propose that in an augmented 8-bit
channel, a left brace "{" signals dropping into 6-bit mode, and that
when in six-bit mode, a right brace "}" signals popping back out into
augmented 8-bit mode.

...8-bits...{...6-bits...}...8-bits...

The difference between ordinary 8-bit mode and augmented 8-bit mode is
that the left brace is used to signal the transition to 6-bit mode.
In augmented 8-bit mode a right brace is treated as an ordinary character.

* The result of processing an byte sequence that is in augmented 8-bit
mode is a byte sequence in ORDINARY eight-bit mode.  That is, if the
result of decoding the six-bit bytes you get a left brace, that brace
does NOT recursively signal another embedded six-bit channel.

* How do you represent a left brace in augmented 8-bit mode?
The natural answer is as follows:
A left brace is octal 173 = binary 01111011
which breaks into hextets as 011110 11xxxx  (x = doesn't matter)
Then 011110 encodes as "e"
and  11xxxx encodes as "g" (as one possibility)
Thus, by writing
{eg}
in the augmented 8-bit channel, you get a left brace into the output
ordinary 8-bit channel.

One could introduce other mechanisms for accomplishing this as well, but
this is sufficient.  One natural alternative would be to let "\{" in the
augmented channel produce "{" in the output, and "\\" in the input
produce "\" in the output.  But this requires two active characters
left brace and backslash, which seems overkill.  I suggest that we
not introduce redundant mechanisms.

(Note that my previous note about using #1:{ was just wrong.)

* A reader for an augmented 8-bit channel is just a simple state machine
that remembers whether it is reading in 8-bit mode or 6-bit mode, and
(if in six-bit mode) what "left-over bits" there are.  Each character
may cause a state change and may cause an 8-bit character to be output
for the output channel (which is the ordinary 8-bit channel).  More
specifically, to process an input character x, do the following.
The state-machine has a left-over bits register that may hold up
to 12 bits, and a length-indicator for that register.
-- if in eight-bit mode:
if x is not a left brace, output x and return.
otherwise if x is left brace:
switch to 6-bit mode, clear the left-over bits
register, zero its length indicator, and return.
-- if in six-bit mode:
if x is a right brace:
switch to 8-bit mode and return.
if x is not a base-64 character (alphanumeric or +/)
ignore this character and return.
Otherwise, append the six-bit value represented by x to
the contents of the left-over-bits register, and increase
the length indicator by six.  If the left-over-bits register
now has 8 or more bits, output the leftmost 8 of those bits,
and delete them from the register, and subtract 8 from the
length indicator.

Thus, the "get character" routine can be written so as to repeatedly
read characters from the augmented channel until an 8-bit character is
output for the ordinary channel.

* Note that you can't nest or recurse, since the output channel is an
ordinary 8-bit channel, not an augmented 8-bit channel.  If one wants
to protect a byte-sequence from mutilation, it is silly and wasteful
to recursively encode stuff that is already 6-bit encoded.  With the
current proposal, if you have for some reason a partially encoded
sequence in augmented 8-bit representation:

xxxx...xxx{YYYYYYYY}zzzzzzzzz

then the correct way to protect the entire thing is then just to recode the
eight-bit portions xx's and zz's:

{XXX....XXX}{YYYYYYYY}{ZZZZ....ZZZZZ}

where the XX's are the 6-bit encoding of the xx's, and the ZZ's are the
six-bit encoding of the zz's.

Thus, it is possible to efficiently encode an sequence, even if portions
of it were already 6-bit encoded.  You only need to know that your input
is already in augmented 8-bit form in order to know that you need to
watch out for already-encoded stuff, so that you can just copy it verbatim
to the result.

* The fact that non-base-64 characters are ignored in six-bit mode gives
one the possibility of getting "fragmentation", as noted in my earlier
note:
abc{
}def
produces
abcdef
when decoded.

* This proposal for augmented 8-bit channels is formally independent of
anything else in SPKI/SDSI.  Indeed, it could be used to represent
any kind of data, not just SPKI/SDSI data.

Ron Rivest

```

Follow-Ups: