[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Linkology proposal for SPKI/SDSI


At 12:57 PM 4/20/97 EDT, Ron Rivest wrote:
>This note gives some thoughts on links, with a specific proposal for a
>better way of handling them in SPKI/SDSI.  I was motivated in part by
>my reading of the XML document
>	http://www.w3.org/pub/WWW/TR/WD-xml-link-970406.html 
>on linking, although what is given here is much simpler and directed
>specifically at SPKI/SDSI.  Comments and discussion invited.
>At the moment we have s-expressions containing subexpressions like
>	(hash md5 {...})
>	(hash md5 {...} <url>)
>	<url>
>	(ref <key> name)
>this is ad-hoc, and also fails to make a clear distinction between hash-values
>and links, between the pointers and the thing pointed to.



	I spent lunch today going over your full earlier message and have a mixed 
reaction.  I can understand that there is a real problem which both you and 
W3C are trying to solve.  On the other hand, I believe our corner of the 
world is significantly simpler and that our data structures can and should 
reflect that relative simplicity.

	As I said in my earlier reply, every time we use a hash, it is a link.
Let me formalize that remark here.

	The links you are concerned about tell you how to get to some data.  That 
data might be a file on the web, or it might be some portion of a file.  [I 
remember proposals in W3C DSig which were concerned with identifying 
portions of the text of one file.]  Your link construct also addresses the 
quoting problem.  E.g., when you speak about "http://www.clark.net/" are 
you referring to the page at that address or to the 21-byte character string 
itself?  A link might refer to a cluster of files (e.g., an HTML page and
all the inline images).  It might want to refer to a file and files that
file points to, down some depth.  This is the issue we decided not to
address when I brought it up in Memphis.

	These are important issues to resolve.  Finding a good way to refer 
unambiguously to such things would doubtless be a Good Thing.  Of course, I 
am a little suspicious of academic work which isn't in response to some 
crying need by the user community, and I haven't heard cries of pain in this 
area, but I'm a big fan of pure research and if the specification and 
quoting problems can be solved in a simple mechanism, then I think the world 
will have gained.

	For the purpose of digitally signed things, like certificates, there is an 
additional issue.  Our links to objects need to be cryptographically secure.
Therefore, there must be a secure hash of the intended object within the 
body being signed.

	The hash has an interesting property.  It's necessary for our use of links, 
but it's also sufficient to some extent.  That is, there is a procedure for 
finding which contiguous range of bytes inside which existing object is 
referred to by the hash.  The problem with that procedure is that it is a 
little inefficient to scan the entire web and compute hashes of every 
range of bytes. :)

	The question, to me, is what needs to go inside the signed body of a 
certificate.  I believe the answer is clear.  By policy, we expect a requester
of access to deliver to the server all the information that server would 			
need to evaluate the request.  If a requester has delivered all necessary
objects to the server, then the set of objects to hash and locate by hash
is much smaller -- small enough that indexing by hash is probably
the most efficient.  This makes the hash of an object the preferred
link, not just for security where it must be used, but also for 
performance.  Therefore, I expect a requester to send along objects,
some of which are certificates, some keys and some other things.
All would be hashed and hung off a hash table.  If an object is big enough, 
the requester might send along instructions for accessing the object instead
(letting the server operate with the same network traffic, without involving 
the requester).  However, none of those things -- objects or little programs 
to drive the server in accessing objects [which is what I believe these link 
proposals will end up being] -- needs to be inside the certificate.  The 
certificate must contain hashes and doesn't need to contain anything else.

	I am especially opposed to the idea of a #include for certificates.
If by (grab) you meant that a byte string being handed to a hash function
should be interrupted at that link while the linked object's bytes are
funneled to the hash, then I oppose the idea of (grab).  It is a possibly
less than fully rational prejudice of mine that the only thing which should
be handed to a hash function for signature verification is a contiguous
set of bytes which arrived from the signer.  I do not believe in "some
assembly required", re-canonicalization or "batteries not included".
The signer had all the bytes together to hash and then sign, and should
send those same bytes to anyone who wants to verify the signature.  This
is why I keep insisting that the real transport mechanism be the
canonical form.  [PEM certificate verification required re-canonicalization
and I got bit by it, since PEM used X.509/ASN.1/DER and expected DER to be
true to its promise that there would be only one encoding of a given
thing -- but that wasn't true.  They had overlooked one little thing
and that was enough to keep PEM from accepting some valid certificates.]

	I see the specification of links (eventually satisfying the desire to 
identify individual sentences within a document, perhaps) as similar to the 
design of a programming language.  I see this being as drawn-out and 
emotionally charged as all new programming language designs have been over 
the years.  Perhaps my great grandchildren will see consensus reached -- or 
maybe not.

	I do believe the work is valuable and that someone needs to address it.  
However, I believe it is not relevant to the structure of a certificate 
itself.  For our purposes, a cryptographic hash provides both unambiguous 
links and efficiency (given our assumptions).

	It also solves the quoting problem.  That is, the hash of the 21-character 
string "http://www.clark.net/" is different from the hash of the page at 
that location.  In general web reference terms, where no hashes are 
involved, there is a difference between a URL itself and the page to which 
it points and one might want to refer to the latter even without knowing 
what is at the latter (therefore, without being able to compute a hash over 
it).  However, that situation is not one we face.  If we are making secure 
references to the content of that location, we need to know what that 
content is and hash it.  Having hashed it, we have eliminated the possibility
of saying "whatever you might find at this location is what I'm saying
(tag ...) about".

	If someone does solve the link problem, then I see those solutions finding 
their way into the Ort cloud of related objects which accompanies a 
certificate -- e.g., public keys, other certificates, xrls, ....  I 
don't see them entering the body of a certificate.  We already have all the 
links we need and they are sufficient for all our purposes.

	It is an entirely different issue that a cryptographic hash, because it 
implicitly solves the quoting problem and because it is fixed length, may be 
a superior naming mechanism which the web itself might want to adopt....  I 
don't propose to take the SPKI list into that discussion, however (except
that I just did :)  .

 - Carl

Version: 2.6.2


|Carl M. Ellison  cme@cybercash.com   http://www.clark.net/pub/cme |
|CyberCash, Inc.                      http://www.cybercash.com/    |
|207 Grindall Street   PGP 2.6.2: 61E2DE7FCB9D7984E9C8048BA63221A2 |
|Baltimore MD 21230-4103  T:(410) 727-4288  F:(410)727-4293        |