[UPHPU] Web site scraping

justin justin at justinhileman.info
Thu Sep 25 17:21:15 MDT 2008


On Thu, Sep 25, 2008 at 8:52 AM, Nathan Lane <nathamberlane at gmail.com> wrote:
> I want to make what in effect is a website scraper using PHP, but it isn't
> obvious how this would best be done. I've tried using DOMDocument and I'm
> not sure if that's the best option or not. I'd really like to use something
> where I could use XPath to get the elements out that I want. Recently I
> wrote a similar program in C# that I call HttpAnalyzer. Could I just use
> that with PHP (i.e. call it from PHP) to get what I'm looking for? Any
> suggestions?
>

I hate to sound like a heretic by mentioning it on a PHP mailing list,
but I always turn to BeautifulSoup (Python) for this sort of thing.
It's absolutely incredible.

You may all now return to your PHP scraping :)

justin
-- 
http://justinhileman.com


More information about the UPHPU mailing list