Re: [blogite] Pingback misc

Date view Thread view Subject view Author view Attachment view

From: Jim Dabell (jim-blogite@jimdabell.com)
Date: Sat Sep 07 2002 - 18:26:31 BST


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Saturday 07 September 2002 4:03 pm, Simon Willison wrote:
> At 16:38 07/09/2002 +0000, Jim Dabell wrote:
> > > My biggest objection to the HTTP header method is this: A very, very
> > > large proportion of the sites linked to will not have PingBack of any
> > > kind. This means that for most sites you are linking to you will have
> > > to perform a HEAD request, see that they don't have a PingBack
> > > header, then perform a second GET request and see that they don't
> > > have a <link> element either. That's two requests, which is actually
> > > a greater overhead than just sending a GET!
> >
> >Surely the overhead for HEAD in this context is negligable?
>
> The actual HEAD request may be a very small amount of data, but what
> concerns me is the overhead of opening another connection to the server.
> Thinking about it though you could do a normal GET request but cut it
> short the moment you hit a X-PINGBACK header so it is definitely worth
> considering. In fact, I see no reason not to include "You may optionally
> send an HTTP x-pingback header, but the <link> element is required" in
> the spec.

I'm a little rusty with my HTTP. I _think_ it would be more efficient to
open a persistent connection and do a HEAD followed by a GET if necessary
(for a start-off, the server doesn't start sending data, I think there
might be cases where the server-side script isn't fully run), but I don't
know how reliable this is regarding proxy bugs etc. Of course, this is
implementation-specific, and not something that needs to be put in the
spec.

[snip]
> At the end of the day how this is handled is up to the people
> implementing their own clients. My client will work like this, but it
> should not be a requirement that /all/ clients work like this:
>
> 1. Send the GET request
> 2. Read the headers line-by-line - if an X-PingBack (or whatever we
> decide to call it) is found then stop receiving from the socket
> 3. Start reading the HTML
> 4. If a <link rel="pingback"> element is found, stop receiving.
> 5. If a </head> tag is found, stop receiving
> 6. If a <body> tag is found, stop receiving
> 7. If we've received 5 KB with no sign of a <link> tag, stop receiving
>
> That way I receive a maximum of 5 KB and I can be 99% certain I will spot
> the PingBack information, if it exists. Again, this is how I plan to
> implement my client but it is not the required (or even necessarily the
> recommended) way of doing things.

That sounds like the best approach to me, perhaps with a slight change:

1. Open HTTP connection
2. HEAD
(if header found, close connection)
3. GET (with suitable Range: header)
4. Start reading HTML
5. Every time '<link' is found, parse it
(if pingback found, close connection)
6. Quit on certain strings ('</head', '<body', '<frame'), and when 5K is up.

It's similar, but with a few optimisations (which, for all I know, you've
implemented, and were just glossing over the details). It also doesn't
handle pingbacks that are commented out, which is important.

I guess I need to dig out my O'Reilly HTTP book and read it over again :)

[snip xml pingback description]
> This has been touched on before, and I think it worth some serious
> consideration. Pointing to an XML file describing a site's PingBack
> support is definitely a more "logical" use of the <link> tag than
> pointing to the XML-RPC server directly (especially considering the
> XML-RPC server can't be used by a normal browser anyway). It also means
> that sites can have all of their PingBack information in one file - this
> allows for PingBack clients to cache the file and also lets site authors
> update the PingBack information for all of the pages on their (possibly
> static) site at once - like having a central style sheet.
[snip]

...and allows you to set caching directives for the xml file - so you can
even set cache parameters differently for public "proxies" (your
centralised server), and private clients (individual blogs).

...and allows a right-click > page properties to get information on
notifying somebody that you are linking to them.

I guess there are quite a few reasons.

- --
Jim Dabell

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.7 (GNU/Linux)

iD8DBQE9ejbI3tJNldoQhi8RAqDlAKCJoEsqdISFlOZhNwyGEMyp8ghxaACdHnoZ
nBG0PZn5xadOYmX2x96JV4E=
=usGY
-----END PGP SIGNATURE-----

Message sent over the Blogite mailing list.
Archives: http://www.aquarionics.com/misc/archives/blogite/
Instructions: http://www.aquarionics.com/misc/blogite/


Date view Thread view Subject view Author view Attachment view

This archive was generated by hypermail 2.1.5 : Sat Sep 07 2002 - 18:05:00 BST