Home
TeamSite
Extraction of HTML tables with Perl
taiyo
I'm trying to write a script that extracts the HTML that makes up a particular table across many different HTML files.
I need the HTML that makes up the table not just the contents of the table cells.
The common characteristic of the table that I need to extract is that the width of the table is 602.
I'm fairly certain that the width attribute is in the same location of each <TABLE> tag across the files. So I need to copy all HTML beginning with:
<table width="602" border="0" cellspacing="0" cellpadding="0" bgcolor="white" height="100%">
and ending with:
</table>
ignoring all nested tables and paste it to a new file.
I thought that HTML::TableExtract might be the way to go because I could pinpoint the location of the table but my code just gives me the contents of the table cells. I'm thinking that it may be possible to use HTML::LinkExtor to get the table I want but I'm not sure how to ignore all the nested tables in the file.
thanks much !
Find more posts tagged with
Component react.asset.postCommentThread had an error.
type is required.