Re: Interesting page headers (Was: [blogite] Before we update the spec...)

Date view Thread view Subject view Author view Attachment view

From: Simon Willison (simon@incutio.com)
Date: Wed Sep 11 2002 - 11:23:31 BST


At 11:11 11/09/2002 +0100, Aquarion wrote:
>Secondly, from my perspective it is pointless. Yes, grabbing the head is
>faster, but not enough to justify it. For example:
>
>Example one, someone uses the X-Pingback:
>
> Open connection.
>
> Get head.
>
> Parse head for X-Pingback.
>
> Suceed. link. Yay.
>
>
>Example two: Someone is using the <link> (Which I suspect will be the
>most popular idea)
>
> Open Connection
>
> Parse Head
>
> Fail
>
> Open connection, read each line:
> if I find link, succeed, Drop connection, Yay.
> Or:
> Read next line.
> (Read fifty lines || EOF) drop connection.
>
>Example Three: Ignoring X-Pingback.
>
> Open connection, read each line:
> if I find link, succeed, Drop connection, Yay.
> Or:
> Read next line.
> (Read fifty lines || EOF) drop connection.
>
> (Which is what I'm doing now. The In Production version scans for
> "X-Pingback" as it's looking for <link>)
>
>My problem with the header is that the advantages it brings do not
>outweigh the disadvantages, which is that if I try to take advantage of
>the advantages, it makes the "standard" method take longer, and that it
>involves playing with either headers or server configs, neither of which
>an ordinary web-page should have to do.

My PingBack client implementation has a "autodetect" method that does the
following:

1. Send GET request
2. Start parsing headers
3. For first line, check status code if other than 200, return false
4. If X-Pingback header found, kill the connection and return the server
5. If Content-Type header found, save content-type
6. If end of headers found, check saved content-type - if it isn't
text/html or application/xml+xhtml, return false
7. Start parsing the HTML document
8. If <link rel="pingback" element found, return server
9. If </head> tag found, return false
10. If <body> tag found, return false
11. If end of document found, return false

I should probably put a size / number of line limiter in there as well so
that it gives up after a certain number of lines / amount of bytes
downloaded. The code is available to anyone who emails me and asks for it,
and I plan to open source it as soon as I'm confident it works well enough.

My point is that I can check for the X-Pingback header AND the <link>
element in a single request. If I find an X-Pingback header I can stop
early, saving on overhead as I don't have to download or parse any further
than the headers. X-Pingback is therefore (at least for my system) a help
rather than a hindrance.

Also remember that X-Pingback support is optional. If you don't want to
build a client implementation to detect it you don't have to - you can
still be sure of picking up any HTML/XHTML documents that have Pingback
enabled from the <link> tag. You will miss out on non-HTML documents that
have Pingback enabled but that probably isn't a concern.

Cheers,

Simon Willison

-- 
Web Developer, www.incutio.com
Weblog: http://simon.incutio.com/ 
Message sent over the Blogite mailing list.
Archives:     http://www.aquarionics.com/misc/archives/blogite/
Instructions: http://www.aquarionics.com/misc/blogite/

Date view Thread view Subject view Author view Attachment view

This archive was generated by hypermail 2.1.5 : Wed Sep 11 2002 - 12:05:01 BST