Re: [blogite] Pingback misc

Date view Thread view Subject view Author view Attachment view

From: Simon Willison (simon@incutio.com)
Date: Sat Sep 07 2002 - 17:03:12 BST


At 16:38 07/09/2002 +0000, Jim Dabell wrote:
> > My biggest objection to the HTTP header method is this: A very, very
> > large proportion of the sites linked to will not have PingBack of any
> > kind. This means that for most sites you are linking to you will have to
> > perform a HEAD request, see that they don't have a PingBack header, then
> > perform a second GET request and see that they don't have a <link>
> > element either. That's two requests, which is actually a greater overhead
> > than just sending a GET!
>
>Surely the overhead for HEAD in this context is negligable?

The actual HEAD request may be a very small amount of data, but what
concerns me is the overhead of opening another connection to the server.
Thinking about it though you could do a normal GET request but cut it short
the moment you hit a X-PINGBACK header so it is definitely worth
considering. In fact, I see no reason not to include "You may optionally
send an HTTP x-pingback header, but the <link> element is required" in the
spec.

> > Also remember that you can read data from a socket line by line (or
> > character by character if needs be). This means that you can close the
> > socket connection from the GET request the moment you receieve a <link
> > rel="pingback" element OR you hit the </head> tag (as the link element
> > will not appear past that point). This means the overhead isn't actually
> > that bad.
>
>Think about all the screwed up html without </head> or <body>, coupled with
>thousands of keywords etc. Sure, you can drop the connection after 32k or
>whatever, but that's a big difference to a few lines of HTTP headers. You
>could also require that the <link> is the first element of <head>, but I
>think adding parsing requirements on top of standard html is a mistake.

At the end of the day how this is handled is up to the people implementing
their own clients. My client will work like this, but it should not be a
requirement that /all/ clients work like this:

1. Send the GET request
2. Read the headers line-by-line - if an X-PingBack (or whatever we decide
to call it) is found then stop receiving from the socket
3. Start reading the HTML
4. If a <link rel="pingback"> element is found, stop receiving.
5. If a </head> tag is found, stop receiving
6. If a <body> tag is found, stop receiving
7. If we've received 5 KB with no sign of a <link> tag, stop receiving

That way I receive a maximum of 5 KB and I can be 99% certain I will spot
the PingBack information, if it exists. Again, this is how I plan to
implement my client but it is not the required (or even necessarily the
recommended) way of doing things.

> > This doesn't even have to be done via XML-RPC (although that should be
> > the favoured method).
>
>The "correct" approach imho would seem to be <link>ing (or HEADing) to a
>description of the available pingback interfaces instead of directly to the
>service. You could list several:
>
><pingback>
> <interface priority="1" type="xml-rpc">
> http://www.example.com/xml-rpc-address
> </interface>
> <interface priority="1" type="http-get">
> http://www.example.com/submit.php?linker=
> </interface>
> <interface priority="2" type="email">
> pingback@example.com
> </interface>
> <interface priority="10" type="manual-web-form">
> http://www.example.com/feedback/
> </interface>
> <interface priority="10" type="manual-email">
> feedback@example.com
> </interface>
></pingback>

This has been touched on before, and I think it worth some serious
consideration. Pointing to an XML file describing a site's PingBack support
is definitely a more "logical" use of the <link> tag than pointing to the
XML-RPC server directly (especially considering the XML-RPC server can't be
used by a normal browser anyway). It also means that sites can have all of
their PingBack information in one file - this allows for PingBack clients
to cache the file and also lets site authors update the PingBack
information for all of the pages on their (possibly static) site at once -
like having a central style sheet.

There's a bit more overhead involved in grabbing the XML file but as it can
be cached anyway I don't see this as a huge problem.

In fact the more I think about it the more I agree that auto discovery is a
nicer approach than any kind of central server. Auto discovery gets rid of
the need for those URL patterns I suggested earlier (which were a bit
strange). It's also a nice simple concept -an HTML page containing data
about who you should inform if you link to the page.

Cheers,

Simon

Web Developer, www.incutio.com
Weblog: http://www.bath.ac.uk/~cs1spw/blog/

Message sent over the Blogite mailing list.
Archives: http://www.aquarionics.com/misc/archives/blogite/
Instructions: http://www.aquarionics.com/misc/blogite/


Date view Thread view Subject view Author view Attachment view

This archive was generated by hypermail 2.1.5 : Sat Sep 07 2002 - 18:05:00 BST