November 20, 2014

Ambiguity, Semantic web, speech acts, truth and beauty


(I think this post is pretty academic for the web dev crowd, oh well)

When talking about URLs and URNs or semantic web or linked data, I keep on returning to a topic. Carl Hewitt gave me a paper about inconsistency which this post reacts to.

The traditional AI model of semantics and meaning don't work well for the web. 
Maybe this is old-hat somewhere but if you know any writings on this topic, send me references.

In the traditional model (from Bobrow's essay in Representation and Understanding), the real world has objects and people and places and facts; there is a KRL Knowledge Representation Language in which statements about the world are written, using terms that refer to the objects in the real world. Experts use their expertise to write additional statements about the world, and an "Inference Engine" processes those statements together to derive new statements of facts.

This is like classic deduction "Socrates is a man, all men are mortal, thus Socrates is mortal" or arithmetic (37+53) by adding 7+3, write 0 carry 1 plus 3 plus 5 write 9, giving 90.

And to a first approximation, the semantic web was based on the idea of using URLs as the terms to refer to real world, and relationships, and RDF as an underlying KRL where statements consisted of triples.

Now we get to the great and horrible debate over "what is the range of the http function" which has so many untenable presumptions that it's almost impossible to discuss. That the question makes sense.
That you can talk about two resources being "the same". That URLs are 'unambiguous enough', and the only question is to deal with some niggly ambiguity problems, with a proposal for new HTTP result codes.

So does http://larry.masinter.net refer to me or my web page? To my web page now or for all history, to just the HTML of the home page or does it include the images loaded, or maybe the whole site?

"http://larry.masinter.net" "looks" "good".

So I keep on coming back to the fundamental assumption, the model for the model.

Coupled with my concern that we're struggling with identity (what is a customer, what is a visitor) in every field, and phishing and fraud on another front.

Another influence has been thinking about "speech acts". It's one thing to say "Socrates is a man" and completely different thing to say "Wow!". "Wow!" isn't an assertion (by itself), so what is it? It's a "speech act" and you distinguish between assertions and questions and speech acts.

A different model for models, with some different properties:

Every speech is a speech act.

      There are no categories into assertion, question, speech act. Each message passed is just some message intending to cause a reaction, on receipt. And information theory applies: you can't supply more than the bits sent will carry. "http://larry.masinter.net" doesn't intrinsically carry any more than the entropy of the string can hold. You can't tell by any process whether it was intended to refer to me or to my web page.

Truth is too simple, make belief fundamental. 

   So in this model, individuals do not 'know' assertions, they only 'believe'  to a degree. Some things are believed so strongly that they are treated as if they were known. Some things we don't believe at all. A speech act accomplishes its mission if the belief of the  recipient changes in the way the sender wanted.   Trust is a measure of influence: your speech acts that look like statements influence my beliefs about the world insofar as I trust you. The web page telling me my account balance influences my beliefs about how much I owe.

Changing the model helps think about security

Part of the problem with security and authorization is we don't have a good model for  reasoning about it. Usually we divide the world into "Good guys" and "bad guys": Good guys make true statements ("this web page comes from bank trustme")  while bad guys lie. (Let's block the bad guys.)   By putting trust and ambiguity at the base of the model and not as an after-patch we have a much better way of describing what we're trying to accomplish.

Inference, induction, intuition are just different kinds of processing

   In this model, you would like influence of belief to resemble logic in the cases where there is trust and those communicating have some agreement about what the terms used refer to. But inference is subject to its own flaws ("Which Socrates? What do you mean by mortal? Or 'all men'"). 

Every identifier is intrinsically ambiguous

Among all of the meanings the speaker might have meant, there is no inbound right way to disambiguate. Other context, out of band, might give the receiver of the message with a URL more information about what the sender might have meant. But part of the inference, part of the assessment of trust, would have to take into account belief about the sender's model as to what the sender might have meant. Precision of terms is not absolute.

URNs are not 'permanent' nor 'unambiguous', they're just terms with a registrar

I've written more on this which i'll expand elsewhere. But URNs aren't exempt from ambiguity, they're generally just URLs with different assigned organizations to disambiguate if called on.

Metadata, linked data, are speech acts too.

When you look in or around an object on the net, you can often  find additional data, trying to tell you things about the object. This is the metadata. But it isn't "truth", metadata is also a communication act, just one where one of the terms used is the object.

There's more but I think I'll stop here. What do you think?