[UPHPU] Web site scraping
Walt Haas
haas at xmission.com
Thu Sep 25 09:40:14 MDT 2008
If you just want a mirror of the site, any Linux distro should include
wget. "wget -r http://example.com" will make a local mirror of the
whole example.com site.
If you want a tool to select and reformat parts of a page, XSLT might be
worth a look. It's a functional programming language which is
unfamiliar to many but is powerful and worth learning.
-- Walt
Nathan Lane wrote:
> I want to make what in effect is a website scraper using PHP, but it isn't
> obvious how this would best be done. I've tried using DOMDocument and I'm
> not sure if that's the best option or not. I'd really like to use something
> where I could use XPath to get the elements out that I want. Recently I
> wrote a similar program in C# that I call HttpAnalyzer. Could I just use
> that with PHP (i.e. call it from PHP) to get what I'm looking for? Any
> suggestions?
>
>
More information about the UPHPU
mailing list