[UPHPU] Web site scraping
Alvaro Carrasco
alvaro at epliant.com
Thu Sep 25 09:36:14 MDT 2008
In my experience, the easiest way is: run website through tidy, load it
into a DOMDocument, and use xpath.
The xpath patterns are SO much easier to read and write than regex and
more resistant to changes to the website (if you write them correctly).
You can also use regex within xpath if you ever need it.
Alvaro
Nathan Lane wrote:
> I want to make what in effect is a website scraper using PHP, but it isn't
> obvious how this would best be done. I've tried using DOMDocument and I'm
> not sure if that's the best option or not. I'd really like to use something
> where I could use XPath to get the elements out that I want. Recently I
> wrote a similar program in C# that I call HttpAnalyzer. Could I just use
> that with PHP (i.e. call it from PHP) to get what I'm looking for? Any
> suggestions?
>
More information about the UPHPU
mailing list