[UPHPU] PHP parser
Walt Haas
haas at xmission.com
Fri Oct 2 19:25:36 MDT 2009
Part of the problem is that there is a lot of dirty HTML that doesn't
conform to standards out there on the net.
It sounds like you are building a screenscraper. If you want to parse a
few values from a page, you might consider using regular expressions to
find the values.
-- Walt
Aaron Luman wrote:
> As a matter of introduction I should say that I struggle with PHP. I
> do most of my web work in PHP, but probably not how I should be doing
> it, haha. Anyway, I'm trying to learn better practices as well as how
> to use pre-made scripts and tools.
>
> My current struggle with PHP is that i am trying to figure out a
> simple way to parse an html file. My hope is that I will be able to
> copy the source and then submit it via a form using post. I found two
> pre-made scripts that looked promising, but I am not sure how good of
> an option they are.
>
> The first - http://sourceforge.net/projects/simplehtmldom/ - looked
> good but then when I tried using it with the entire file it returned
> memory usage errors while testing on my local machine.
>
> The second - http://php-html.sourceforge.net/ - works without error,
> but takes nearly a minute to parse the full file (nearly 5 MB of text)
>
> I am concerned that when I get my code working and posted to my host
> that they will freak out about the heavy workload imposed by parsing
> the large files.
>
> The end result of parsing the source is that I would like to be able
> to find particular values so that they can be filtered out along with
> their markup (a table of between 1500 and 2000 characters) and
> reposted in a final, results page, which should be fairly easy once I
> get the parsing concerns worked out.
>
> Should I be worried about the memory/processor draw while using these
> (or similar) parses? Do any of you have experience with another
> parsing tool that is more efficient? Is there a better way to go
> about doing this?
>
> Thanks in advance for any help that you might be able to offer.
>
> Aaron
>
> _______________________________________________
>
> UPHPU mailing list
> UPHPU at uphpu.org
> http://uphpu.org/mailman/listinfo/uphpu
> IRC: #uphpu on irc.freenode.net
>
More information about the UPHPU
mailing list