Use PHP DOM Parser for more robust screen scraping

I’d just like to put this out there, as I just “failed” a “do-at-home” interview assignment which was to implement a screen scraper using Java/PHP. I had previously (1-2 years ago) done screen scrapers in PHP, so I proceeded to do this assignment the same way – using regexes. Little did I know that using regexes would be one of the weak points of my submission – they wanted me to use a DOM parser instead. In hindsight, I guess I should have looked into that, but it just never occured to me because I already used other methods in the past.

So the moral of the story is to use DOM parsers when writing screen scrapers, they should be more robust than regex parsing in most cases. Here is an example tutorial.

Saturday, December 5th, 2009, 2:43 pm

|

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>