2
下相鄰單元就在眼前的例子:關聯Perl的網站::刮板
#!/usr/bin/perl
use strict;
use Web::Scraper;
use Data::Dumper;
my $html = q[
<html>
<body>
<div class="mainContainer">
<div class="when">February 20, 2014</div>
<div class="name">Name 1</div>
<div class="desc">Desc 1</div>
<div class="when">February 21, 2014</div>
<div class="name">Name 2</div>
<div class="desc">Desc 2</div>
<div class="name">Name 3</div>
<div class="desc">Desc 3</div>
<div class="when">February 22, 2014</div>
<div class="name">Name 4</div>
<div class="desc">Desc 4</div>
</div>
</body>
</html>
];
my $scraper = scraper {
process ".when", "events[]" => scraper {
my $when = $_->content();
my $hash = {};
$hash->{$when}->{name} = "NAME";
$hash->{$when}->{desc} = "DESC";
return $hash;
};
};
my $result = $scraper->scrape($html);
print Dumper($result);
我所試圖做的是日期,與事件的詳細信息相關聯。正如你所看到的,div並不是嵌套的,所以它不是微不足道的(至少對我而言)。另外每個活動都由name
和desc
組成。我沒有找到一種方法使用css選擇器將所需結構中的相鄰元素相關聯。我想我會需要一個自定義的子程序返回來做這些元素的關聯。我想找回類似於下面的內容:
[
'February 20, 2014' => [
{
'name' => 'Name 1',
'desc' => 'Desc 1'
}
],
'February 21, 2014' => [
{
'name' => 'Name 2',
'desc' => 'Desc 2'
},
{
'name' => 'Name 3',
'desc' => 'Desc 3'
}
],
'February 22, 2014' => [
{
'name' => 'Name 4',
'desc' => 'Desc 4'
}
]
]