我一直在努力從Metacritic中提取信息,但是現在我遇到了不能夠乾淨地提取帶有撇號或破折號的文本的問題。WWW ::機械不處理撇號或破折號
use WWW::Mechanize;
$reviewspage = 'http://www.metacritic.com/movie/a-band-called-death/critic-reviews';
$Review = 'In the end Death triumphs, but its allure and obsession remain a mystery.';
$l = WWW::Mechanize->new();
$l->get($reviewspage);
$k = $l->content;
@Review = $k =~ m{$Review.*?<div class="review_body">(.*?)</div>}s;
print "@Review\n";
輸出:
Too much of the doc takes our taste for granted; Alice Cooper, Henry Rollins and others won’t persuade you that Death could have been huge, nor does a clichéd last-act reunion show. But the film’s alternating inquiry â€」 into family love, slow compromise and, yes, death â€」 resonates strongly.
即使網站上的編碼是:
<div class="review_body">
Too much of the doc takes our taste for granted; Alice Cooper, Henry Rollins and others won’t persuade you that Death could have been huge, nor does a clichéd last-act reunion show. But the film’s alternating inquiry — into family love, slow compromise and, yes, death — resonates strongly.
</div>
我創建類似
這個問題在下面的代碼說明之前的腳本已經使用了WWW :: Mechanize,並且它們都沒有替換掉這樣的字符。