2012-10-05 57 views
1

我對Perl很新,需要一些幫助從網頁中提取表。我能夠弄清楚如何讓今天用perl打印頁面。現在我只需要提取表格中表示溫度的信息。 我當前的代碼是:使用perl從網頁上的表提取信息

# perl 
use strict; 
# use LWP::Simple; 
use LWP::UserAgent; 
my $ua = new LWP::UserAgent; 
$ua->timeout(120); 
my $url='http://MyTempSite/'; 
my $request = new HTTP::Request('GET', $url); 
my $response = $ua->request($request); 
my $content = $response->content(); 
print $content; 

我想從這個HTML文件中的溫度讀數:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"><html><body bgcolor="#C0C0C0" text="#000000" vlink="#800080" link="#0000FF"><P><h1>TempTrax Digital Thermometer</h1><BR><table cellpadding=0 cellspacing=0 border=0><TR><TD>Model:</TD><TD width=10 rowspan=7><BR></TD><TD>E4</TD><TR><TD>Manufacturer:</TD><TD>Sensatronics</TD> </TR><TR><TD>Website:</TD><TD><a href="http://www.sensatronics.com">http://www.sensatronics.com</a></TD></TR><TR><TD>Firmware Version:</TD><TD>1.2</TD></TR><TR><TD>Release Date:</TD><TD>December 16, 2003</TD></TR><TR><TD>Serial Number:</TD><TD>EA8E6L0T121</TD></TR><TR><TD>Unit name:</TD><TD>LiveTempMonitor</TD></TR></TABLE><P><h3>Current temperature readings:</h3><p><table><TR><TD width=200>Probe1:</TD><TD> 75.1</TD></TR> 
<TR><TD width=200>Probe2:</TD><TD>-99.9</TD></TR> 
<TR><TD width=200>Probe3:</TD><TD>-99.9</TD></TR> 
<TR><TD width=200>Probe4:</TD><TD>-99.9</TD></TR> 
</table></body></html> 

我應該怎樣去獲得只有在「當前的溫度讀數的」信息表被拉?提前謝謝大家看看這個。

+1

[HTML :: TableExtract](http://search.cpan.org/perldoc?HTML::TableExtract),[HTML :: TableParser](http://search.cpan.org/perldoc? HTML :: TableParser) – ikegami

回答

1
use 5.010; 
use strict; 
use warnings; 

use XML::LibXML qw(); 

my $html = $response->decoded_content(charset => 'none'); 
my $doc = XML::LibXML->load_html(string => $html); 
my $root = $doc->documentElement(); 
my @readings = 
    map $_->textContent(), 
    $root->findnodes(
     '//table[ position() = 2 ]/tr/td[ position() = 2 ]' 
    ); 

say for @readings;