From: Ian Hickson (ian@hixie.ch)
Date: Mon Sep 09 2002 - 21:31:03 BST
On Mon, 9 Sep 2002, Stuart Langridge wrote:
>
> Ian Hickson spoo'd forth:
> > Having implemented it, I don't see that there is much overhead. (And I
> > haven't even bothered with optimisations -- I just slurp the whole target
> > file in one go and perform a regexp search on it to find a link tag.)
>
> Just out of interest, why specify regexps for this? My plan was to use
> Perl's HTML::Parser module (which uses regexps under the bonnet, I
> believe). Was it that then we could be assured of it working
> cross-language, even in the face of a language's native HTML parser
> being non-existent or (worse) wrong?
It was for three reasons.
First of all, to ensure interoperability without requiring that the
pingback author implement an HTML parser, which is distinctly non-trivial
(a LOT harder than an XML parser).
Second, because all of the pingback implementations at the time (except
mine) did not validate, and would therefore stump an HTML parser. (Note
that Simon's has since been corrected, yay.)
Third, because some parser implementations would require the entire file
be parsed before being able to walk the DOM, and that would be a
significant performance hit compared to just doing regexps line by line
until a hit is found.
Basically, since it was easy to simply add requirements on the page
authors, and simple for them to follow them, while it would have been
distinctly more complicated for pingback implementers to parse HTML, I
went for the solution which was simpler overall.
Do you think we should change this?
-- Ian Hickson )\._.,--....,'``. fL "meow" /, _.. \ _\ ;`._ ,. http://index.hixie.ch/ `._.-(,_..'--(,_..'`-.;.' Message sent over the Blogite mailing list. Archives: http://www.aquarionics.com/misc/archives/blogite/ Instructions: http://www.aquarionics.com/misc/blogite/
This archive was generated by hypermail 2.1.5 : Mon Sep 09 2002 - 22:05:00 BST