Re: [blogite] SmartReferer 0.1.2

Date view Thread view Subject view Author view Attachment view

From: Ian Hickson (ian@hixie.ch)
Date: Sat Dec 28 2002 - 02:41:25 GMT


On Fri, 27 Dec 2002, lenz wrote:
>
> http://www.oinko.net/smartreferer/

The idea is intruiging. (For those who haven't read it: it's basically a
referrer sanitisation system: given a requested URI and a referer URI, it
will effectively give you the permalink of the referring page.)

Comments:

: SmartReferer [...] SmartPort

"Port" is the wrong technical term; and SmartXXXX makes this sound like
non-technical marketting-speak. I recommend avoiding the invention of new
trade names in specifications, and sticking with technically accurate
pre-existing jargon. (e.g. "referrer authentication" or "canonical
referrer determination".)

: If links are the economics of the web, SmartReferer makes it easier and
: neater to make a precise balance of who is linking you

That sounds like marketting-speak, and doesn't belong in a spec.

: The SR autodiscovery process

The autodiscovery process given (relying on a fixed URI) is a very poor
design. Many sites on the net are limited to subdirectories. Furthermore,
administrators are very easily annoyed by repeated 404s appearing in their
logs (witness the fuss behind the favicon.ico or P3P systems).

: case-insensitive longest matching subsequence

This is, IMHO, a poor design. URIs are explicitly case sensitive, and two
URIs that differ only by a trailing slash, e.g.

   http://www.example.com/foo

...and

   http://www.example.com/foo/

...are NOT the same resource.

: writing special-case XML parsers from scratch

That is to be discouraged. XML parsers are complicated things, and
reducing the total number of them in the world is a good thing.

Any XML resource should be able to use <![CDATA[ ]]> blocks or whatever,
without having to worry about running into limitations of custom parsers.

: If the returned SmartPort URL has a querystring part, it should be left
: untouched and no res= and from= parameters should be added.

I disagree that "SmartPort URL"s are the way to go here, but if they are,
then I think this part of the spec contradicts the part of the spec that
wants this to work well with static backends.

: No single SmartPort document shall be longer than 32k in size. If it is,
: it should be truncated accordingly.

That appears to be an aribitray limitation. Also, truncating an XML file
makes it illformed, and XML processors MUST refuse to handle illformed XML
files.

: Spam protection

In my opinion, it is in the interest of the market to leave spam
protection at the informative (non-normative) level, and let different
implementations develop their own systems. If you explicitly state what
protections are to be used, then spammers will know exactly what to avoid
doing.

Also, blacklists and whitelists are a maintenance nightmare.

: XML format of SmartPort files

This really shouldn't be an appendix.

: <owner> ... <description> ... <title-long> ... <icon>

Specifications should pick one area, and only try to address that one
area. In this case, metadata should not be addressed by a referrer
authentication system. Leave metadata to the RDF or Dublin Core guys,
don't try to mix it in with your own spec. (Trackback made this mistake.)

: You have a right to develop software based on this document provided
: that such software will be distribuited as freeware under the GNU
: General Public Licence.

The whole point of specifications is that they should be freely
implementable by anyone. Why limit it to a tiny subset of the population?

: In order to implement the SmartReferer 'receiver role' protocol
: (i.e. the autodiscovery mechanism) in a commercial software or in a
: paid-for environment, you will have to hold a signed licence for doing
: so.

I am not a lawyer, but unless you own a patent on this stuff, I think this
is not an restriction you can levy.

Generally, I'm not convinced this is the way to go. As I understand it,
the problem is this:

   Given a requested URI and a referer for that URI, determine the
   canonical URI for the referring resource for the purposes of a link
   back to the referring resource from the requested resource.

In this respect, it appears to be very similar to pingback, where pingback
is a way for the referring resource to specifically annouce the existence
of a link on the referring resource to the requested resource.

As I see it there are three types of referring resources:

   1. Those that are dynamically generated.

   2. Those that are static pages but automatically generated, typically
      resulting in having multiple pages that contain a particular link,
      but only one canonical URI for that link.

   3. Those that are static pages with unique URIs.

The first category can cope with any canonical referer discovery system,
since it can be programmed to respond as required.

The second category poses the most trouble, but it can easily cope with
any system that only requires modifying the referring pages or providing a
single static response file.

The third is simple: the canonical referrer is the actual referrer, minus
any fragment identifier.

Note that the most common scenario is where a site's main page, or a
content aggregator, has grouped many resources under one URI, with the
result that sites get multiple hits from subtly different URIs, e.g.:

   http://example.org
   http://example.org/
   http://www.example.org/
   http://www.example.org/index.html
   http://example.org/?lastModified=2089420986

The last one is especially common, and illustrates one problem, which is
that many of these are URIs that the site itself doesn't know about.

In the problem scenario, we have two URIs:

   the requested URI
   the referrer URI

No assumptions can be made; the referrer might not be HTTP, for example,
so the only possible way of determining the canonical referrer URI is to
ask the URI we have available.

The logical next step, therefore, is to request the referrer URI.

At this point, we have several options as far as a spec goes. Pingback's
mechanism is probably the simplest: provide either an HTTP header or a
specially formatted <link> element pointing to a source for further
details on the canonical URI for this resource, given the information that
it should contain a link to the requested URI.

Note that there might be several. For example, if

   http://ln.hixie.ch/

...links to

   http://www.example.net/

...in two blog entries, then there are two canonical versions of the
referrer URI. (As far as I can tell the current spec doesn't deal with
this, by the way.)

The obvious next step is to make the HTTP header or <link> element point
to an XML-RPC server, which can then be communicated with to get a list,
using an interface such as:

   pingback.getCanonicalURI(referrerURI, requestedURI) : array of URIs

However, this does not cater for the static case.

I don't know how we can truly cope with the static case. My own Web log,
for example, can be accessed through at least 6 separate domain names,
with any number of different arguments... and it only has one URI if you
ignore the query part, since all the permalinks are merely the domain with
a query part added on the end. And a defaulting rule can't be used,
because it also has some other files in a /resources/ directory that are
unrelated to the Web log material. (The current spec doesn't cope with
this either.)

I'll let you know if I can think of a solution for the static case.

Note: You may only use these ideas if you agree not to limit their use.

-- 
Ian Hickson                                      )\._.,--....,'``.    fL
"meow"                                          /,   _.. \   _\  ;`._ ,.
http://index.hixie.ch/                         `._.-(,_..'--(,_..'`-.;.'
Message sent over the Blogite mailing list.
Archives:     http://www.aquarionics.com/misc/archives/blogite/
Instructions: http://www.aquarionics.com/misc/blogite/

Date view Thread view Subject view Author view Attachment view

This archive was generated by hypermail 2.1.5 : Sat Dec 28 2002 - 21:05:01 GMT