使用Perl從HTML中提取數據

-2

我正處於大學迷你項目的初始階段，並且陷入困境。使用Perl從HTML中提取數據

任何人都可以讓我知道「使用Perl從HTML頁面提取數據」的基本和高級概念和方法以及代碼嗎？

如果不是yjrm，請告訴我通過與概念相關的資源的路徑，以便我可以自己學習。

來源

2014-07-21 Nishi Bangar

的可能重複[grep和Perl中提取數據（HT tp：//stackoverflow.com/questions/2886200/grep-and-extract-data-in-perl） – MarmiK

我不認爲這個問題太廣泛。面對CPAN上的模塊數量，像「2014年7月，我該從哪裏開始？」這樣的問題。是完全合法的。答案縮小了模塊文檔列表的範圍，以讀取與當前社區相關，維護和普遍接受的內容相關的文檔。 – mirod

這應該讓你開始。

#!/usr/bin/perl 

use strict; 
use warnings; 
use autodie; 
use LWP::Simple; #For getting a websites HTML also see LWP::UserAgent 
use HTML::Tree; #Use a parser to parse HTML, read the docs on CPAN 


#Use LWP get a page's contents 
#We'll use the url to this question http://stackoverflow.com/questions/24858906/data-extraction-from-html-using-perl 
my $url = "http://stackoverflow.com/questions/24858906/data-extraction-from-html-using-perl"; 


#All the html will be in content 
my $content = get($url); 

my $p = HTML::Tree->new(); 

#parse the string in $content. You can also parse_from_file or parse_from_url 
#Though for learning sake you should get used to LWP 
$p->parse($content); 

#Check HTML::Element documentation for the data manipulation part 
my $post = $p->find_by_attribute('class', 'post-text'); 

#Should print your question out. 
print $post->as_text();

現在審查的文檔：

2014-07-21 07:17:30 Gabs00

感謝你們Gabs。，讓我嘗試一下。我真正想要的是從HTML頁面中提取特定的字段並將其放入數據庫中。 –

@Nishi你的意思是數據放入輸入字段？你會通過郵寄或獲取請求？ – Gabs00