Re: [blogite] Pingback misc

Date view Thread view Subject view Author view Attachment view

From: Jim Dabell (jim-blogite@jimdabell.com)
Date: Sun Sep 08 2002 - 19:27:13 BST


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Saturday 07 September 2002 5:32 pm, Aquarion wrote:
> On Sat, Sep 07, 2002 at 04:38:04PM +0000, Jim Dabell wrote:
[snip]
> > Surely the overhead for HEAD in this context is negligable?
>
> Our greatest overhead, since we are on webservers with assumably decent
> connections, is looking up the domain. Getting it twice is silly.

Caching domain lookups is fairly commonplace, and you are assuming that the
requests aren't done in the same TCP connection.

> As for the static argument, MT - for example - turns archives into
> static HTML files so it doesn't have to work at /all/ to render them,
> which is a monumentally cool idea. Putting things in the headers of
> these is difficult without playing with apache directives.

Good point.

> > > Also remember that you can read data from a socket line by line (or
> > > character by character if needs be). This means that you can close
> > > the socket connection from the GET request the moment you receieve a
> > > <link rel="pingback" element OR you hit the </head> tag (as the link
> > > element will not appear past that point). This means the overhead
> > > isn't actually that bad.
> >
> > Think about all the screwed up html without </head> or <body>, coupled
> > with thousands of keywords etc. Sure, you can drop the connection
> > after 32k or whatever, but that's a big difference to a few lines of
> > HTTP headers. You could also require that the <link> is the first
> > element of <head>, but I think adding parsing requirements on top of
> > standard html is a mistake.
>
> Er, no.
>
> Epistula's implementation of pingback (In the pinging of servers) does
> this:
>
> Open socket, or die screaming.
> Start reading by line. If we have a <link>, record it, and drop out of
> loop.
> Otherwise, read next line and try again until:
> a) We have a <link> (/\<link rel\=\"pingback\"
> href\=\"http:\/\/(.*)\"(.*)(\/?)>/i)
> b) We have a <body> (Not yet implimented, but planned)
> c) We have an EOF
> d) We have read 50 lines.
> Close connection, Do whatever I want to with loop.
> (PHP Code at http://www.aquarionics.com/src/admin/addentry.phps,

Well that breaks when:

a) you have commented out your pingback element

b) you don't have rel as the first attribute (for instance, the xslt
processor I use reverses the order of the attributes in some instances)

I'd imagine that there are loads of corner-cases where a html parser will
get it right but a fancy regexp won't, even with valid html.

- --
Jim Dabell

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.7 (GNU/Linux)

iD8DBQE9e5aC3tJNldoQhi8RAoWbAJ4+ewmJcLwBRYb2iP+4mEBp4wG+QgCg1aAu
Uygduu+orPCclgoJDv1ybpA=
=WKGE
-----END PGP SIGNATURE-----

Message sent over the Blogite mailing list.
Archives: http://www.aquarionics.com/misc/archives/blogite/
Instructions: http://www.aquarionics.com/misc/blogite/


Date view Thread view Subject view Author view Attachment view

This archive was generated by hypermail 2.1.5 : Sun Sep 08 2002 - 22:05:00 BST