Re: [blogite] SmartReferer 0.1.2

Date view Thread view Subject view Author view Attachment view

From: lenz (lenzoink@libero.it)
Date: Sat Dec 28 2002 - 20:18:31 GMT


Ian,

Thanks for your long answer. I'd like to explain a little more about SR:

1. The document I published on the web is a version 0.1.x. That's why there
are things in it that should not be there (say, the part about an economics
of link, SmartXXX names). Please consider it's a draft and treat it
accordingly. I just wanted to stimulate discussion on it. :-)

2. The reason why SR is so, is that I wanted it to be easily deployable on
any web site. It shall work on free hosting packages available out there.
It shall be fully implemented by the average PERL script running with half
of the libraries missing on any hosting package. That's why I write about
self-implemented XML parsers and such (evil) things.

3. The point of SR is not simply permalink autodiscovery given a referer;
it's more obtaining a resource description (be it a title, an abstract, an
image or even a permalink) that can be readily used in a semi or fully
automated way to estabilish a reciprocal link to the resource linking to
our site.

I talk about "resource" and not HTML page because on database-driven sites
maybe the URLs http://foo/bar/page.php?id=74 and
http://foo/bar/printable.php?id=74 may point to the same entry while
differing in HTML presentation.

Understanding what a "resource" is in SR is left to the sender site as it
would be quite hard to generalize it. It's a responsibility of the code
generating a SR XML port to correctly tag resources and return permalinks.

SR allows you to offer an alternative catch-all reply that is probably
better suited for smaller, static sites or sites which content does not
allow permalinks. In this case, instead of returning a description and a
permalink for the specified resource, you return a general description and
URL for the site itself. Or maybe you do not want any other site to deep
link you, so you want everybody to link to your main page. In SR, you are
welcome to do that (this freedom is "paid" for by the fact that you
estabilished a one way link that is an advantage to the linked site).

4. About the legal status of the document, I believe I own natural rights
on it. I can say "I want it to be public domain", "I want it to be GPLed",
"I want to restrict the right to develop commercial software out of it". Of
course, enforcing such rights may be a different pair of sleeves.
As of 0.1.2, I want the whole protocol to be available to GPL developers,
and the sender part of it to be available free even in commercial
environments. The conditions under which commercial developers can embed
the receiver part of it are to be decided.

To facilitate discussion, I post techical stuff in a separate message (I
hope tomorrow). :-)

Thanks again,
l.

At 03.41 28/12/02, you wrote:

>On Fri, 27 Dec 2002, lenz wrote:
> >
> > http://www.oinko.net/smartreferer/
>
>The idea is intruiging. (For those who haven't read it: it's basically a
>referrer sanitisation system: given a requested URI and a referer URI, it
>will effectively give you the permalink of the referring page.)
>
>Comments:
>
>: SmartReferer [...] SmartPort
>
>"Port" is the wrong technical term; and SmartXXXX makes this sound like
>non-technical marketting-speak. I recommend avoiding the invention of new
>trade names in specifications, and sticking with technically accurate
>pre-existing jargon. (e.g. "referrer authentication" or "canonical
>referrer determination".)
>
>
>: If links are the economics of the web, SmartReferer makes it easier and
>: neater to make a precise balance of who is linking you
>
>That sounds like marketting-speak, and doesn't belong in a spec.
>
>
>: The SR autodiscovery process
>
>The autodiscovery process given (relying on a fixed URI) is a very poor
>design. Many sites on the net are limited to subdirectories. Furthermore,
>administrators are very easily annoyed by repeated 404s appearing in their
>logs (witness the fuss behind the favicon.ico or P3P systems).
>
>
>: case-insensitive longest matching subsequence
>
>This is, IMHO, a poor design. URIs are explicitly case sensitive, and two
>URIs that differ only by a trailing slash, e.g.
>
> http://www.example.com/foo
>
>...and
>
> http://www.example.com/foo/
>
>...are NOT the same resource.
>
>
>: writing special-case XML parsers from scratch
>
>That is to be discouraged. XML parsers are complicated things, and
>reducing the total number of them in the world is a good thing.
>
>Any XML resource should be able to use <![CDATA[ ]]> blocks or whatever,
>without having to worry about running into limitations of custom parsers.
>
>
>: If the returned SmartPort URL has a querystring part, it should be left
>: untouched and no res= and from= parameters should be added.
>
>I disagree that "SmartPort URL"s are the way to go here, but if they are,
>then I think this part of the spec contradicts the part of the spec that
>wants this to work well with static backends.
>
>
>: No single SmartPort document shall be longer than 32k in size. If it is,
>: it should be truncated accordingly.
>
>That appears to be an aribitray limitation. Also, truncating an XML file
>makes it illformed, and XML processors MUST refuse to handle illformed XML
>files.
>
>
>: Spam protection
>
>In my opinion, it is in the interest of the market to leave spam
>protection at the informative (non-normative) level, and let different
>implementations develop their own systems. If you explicitly state what
>protections are to be used, then spammers will know exactly what to avoid
>doing.
>
>Also, blacklists and whitelists are a maintenance nightmare.
>
>
>: XML format of SmartPort files
>
>This really shouldn't be an appendix.
>
>
>: <owner> ... <description> ... <title-long> ... <icon>
>
>Specifications should pick one area, and only try to address that one
>area. In this case, metadata should not be addressed by a referrer
>authentication system. Leave metadata to the RDF or Dublin Core guys,
>don't try to mix it in with your own spec. (Trackback made this mistake.)
>
>
>: You have a right to develop software based on this document provided
>: that such software will be distribuited as freeware under the GNU
>: General Public Licence.
>
>The whole point of specifications is that they should be freely
>implementable by anyone. Why limit it to a tiny subset of the population?
>
>
>: In order to implement the SmartReferer 'receiver role' protocol
>: (i.e. the autodiscovery mechanism) in a commercial software or in a
>: paid-for environment, you will have to hold a signed licence for doing
>: so.
>
>I am not a lawyer, but unless you own a patent on this stuff, I think this
>is not an restriction you can levy.
>
>
>Generally, I'm not convinced this is the way to go. As I understand it,
>the problem is this:
>
> Given a requested URI and a referer for that URI, determine the
> canonical URI for the referring resource for the purposes of a link
> back to the referring resource from the requested resource.
>
>In this respect, it appears to be very similar to pingback, where pingback
>is a way for the referring resource to specifically annouce the existence
>of a link on the referring resource to the requested resource.
>
>As I see it there are three types of referring resources:
>
> 1. Those that are dynamically generated.
>
> 2. Those that are static pages but automatically generated, typically
> resulting in having multiple pages that contain a particular link,
> but only one canonical URI for that link.
>
> 3. Those that are static pages with unique URIs.
>
>The first category can cope with any canonical referer discovery system,
>since it can be programmed to respond as required.
>
>The second category poses the most trouble, but it can easily cope with
>any system that only requires modifying the referring pages or providing a
>single static response file.
>
>The third is simple: the canonical referrer is the actual referrer, minus
>any fragment identifier.
>
>
>Note that the most common scenario is where a site's main page, or a
>content aggregator, has grouped many resources under one URI, with the
>result that sites get multiple hits from subtly different URIs, e.g.:
>
> http://example.org
> http://example.org/
> http://www.example.org/
> http://www.example.org/index.html
> http://example.org/?lastModified=2089420986
>
>The last one is especially common, and illustrates one problem, which is
>that many of these are URIs that the site itself doesn't know about.
>
>
>In the problem scenario, we have two URIs:
>
> the requested URI
> the referrer URI
>
>No assumptions can be made; the referrer might not be HTTP, for example,
>so the only possible way of determining the canonical referrer URI is to
>ask the URI we have available.
>
>The logical next step, therefore, is to request the referrer URI.
>
>At this point, we have several options as far as a spec goes. Pingback's
>mechanism is probably the simplest: provide either an HTTP header or a
>specially formatted <link> element pointing to a source for further
>details on the canonical URI for this resource, given the information that
>it should contain a link to the requested URI.
>
>
>Note that there might be several. For example, if
>
> http://ln.hixie.ch/
>
>...links to
>
> http://www.example.net/
>
>...in two blog entries, then there are two canonical versions of the
>referrer URI. (As far as I can tell the current spec doesn't deal with
>this, by the way.)
>
>
>The obvious next step is to make the HTTP header or <link> element point
>to an XML-RPC server, which can then be communicated with to get a list,
>using an interface such as:
>
> pingback.getCanonicalURI(referrerURI, requestedURI) : array of URIs
>
>However, this does not cater for the static case.
>
>
>I don't know how we can truly cope with the static case. My own Web log,
>for example, can be accessed through at least 6 separate domain names,
>with any number of different arguments... and it only has one URI if you
>ignore the query part, since all the permalinks are merely the domain with
>a query part added on the end. And a defaulting rule can't be used,
>because it also has some other files in a /resources/ directory that are
>unrelated to the Web log material. (The current spec doesn't cope with
>this either.)
>
>
>I'll let you know if I can think of a solution for the static case.
>
>
>Note: You may only use these ideas if you agree not to limit their use.
>
>--
>Ian Hickson )\._.,--....,'``. fL
>"meow" /, _.. \ _\ ;`._ ,.
>http://index.hixie.ch/ `._.-(,_..'--(,_..'`-.;.'
>
>Message sent over the Blogite mailing list.
>Archives: http://www.aquarionics.com/misc/archives/blogite/
>Instructions: http://www.aquarionics.com/misc/blogite/

Message sent over the Blogite mailing list.
Archives: http://www.aquarionics.com/misc/archives/blogite/
Instructions: http://www.aquarionics.com/misc/blogite/


Date view Thread view Subject view Author view Attachment view

This archive was generated by hypermail 2.1.5 : Sun Dec 29 2002 - 05:05:02 GMT