2014-12-03 43 views
-2

因此,我正在以URL的形式獲取用戶輸入並解析它,然後打印該網站鏈接到的其他頁面。我使用的包是:模式不會刪除網站上的特殊字符

LWP::Simple 

我從命令行使用用戶輸入獲取鏈接並將其存儲在一個變量中。我使用$ ARGV [0]得到它。 然後我着手製作另一個變量,並在存儲網站的變量上使用$ get。 我接着,使數組變量,並應用在可變

/\shref="?([^\s>"]+)/gi; 
,其存儲在包含網站串的變量被使用get函數的結果

的正則表達式。然後我在數組上做了一個foreach循環來打印出結果。

然而,儘管它的打印鏈接之類的東西,同時也結束了印刷只是獨立位置特殊字符,例如/#如果沒有什麼在他們之後。

因此,如果有像/blabalbla這樣的東西,它會打印。但是如果只有獨立的特殊字符(例如/,\#),它也會打印它們。任何方式我可以修改正則表達式,以便如果特殊字符不跟隨一個字符串,他們不應該打印。新學習perl,而不是在正則表達式

+0

我幫不了,除非你顯示你的代碼,*真實的例子*一個URL和相應的輸出。你的正則表達式肯定不符合這樣的孤立字符,我認爲你更可能濫用正則表達式。 – Borodin 2014-12-03 22:07:12

+0

「跟隨一個字符串」是什麼意思? – ikegami 2014-12-03 22:11:49

+0

@Borodin - 這是http://www.google.com/imghp?hl=zh-CN&tab=wi http://maps.google.com/maps?hl=zh-CN&tab=wl https://play.google .com /?hl = en&tab = w8 \有更多的鏈接作爲輸出,但我刪除它們以適應評論。這是使用google.com。見末尾 – user2128074 2014-12-03 22:14:58

回答

1

我不能幫助你沒有進一步的信息你的具體問題,但同時我建議你看看這是爲此目的而編寫的HTML::LinkExtor

下面是一個示例代碼輸出。它僅列出具有href屬性的<a>元素。

use strict; 
use warnings; 
use 5.010; 

use LWP; 
use HTML::LinkExtor; 

my $ua = LWP::UserAgent->new; 
my $resp = $ua->get('http://www.bbc.co.uk/'); 

my $extor = HTML::LinkExtor->new(undef, $resp->base); 
$extor->parse($resp->decoded_content); 

for my $link ($extor->links) { 
    my ($tag, %attr) = @$link; 
    next unless $tag eq 'a' and $attr{href}; 
    say $attr{href}; 
} 

輸出

http://m.bbc.co.uk 
http://www.bbc.co.uk/ 
http://www.bbc.co.uk/#h4discoveryzone 
http://www.bbc.co.uk/accessibility/ 
https://ssl.bbc.co.uk/id/status 
http://www.bbc.co.uk/news/ 
http://www.bbc.com/news/ 
http://www.bbc.co.uk/sport/ 
http://www.bbc.co.uk/weather/ 
http://shop.bbc.com/ 
http://www.bbc.com/earth/ 
http://www.bbc.com/travel/ 
http://www.bbc.com/capital/ 
http://www.bbc.co.uk/iplayer/ 
http://www.bbc.com/culture/ 
http://www.bbc.com/autos/ 
http://www.bbc.com/future/ 
http://www.bbc.co.uk/tv/ 
http://www.bbc.co.uk/radio/ 
http://www.bbc.co.uk/cbbc/ 
http://www.bbc.co.uk/cbeebies/ 
http://www.bbc.co.uk/arts/ 
http://www.bbc.co.uk/ww1/ 
http://www.bbc.co.uk/food/ 
http://www.bbc.co.uk/history/ 
http://www.bbc.co.uk/learning/ 
http://www.bbc.co.uk/music/ 
http://www.bbc.co.uk/science/ 
http://www.bbc.co.uk/nature/ 
http://www.bbc.com/earth/ 
http://www.bbc.co.uk/local/ 
http://www.bbc.co.uk/travel/ 
http://www.bbc.co.uk/a-z/ 
http://www.bbc.co.uk/#orb-footer 
http://search.bbc.co.uk/search 
http://www.bbc.co.uk/privacy/cookies/managing/cookie-settings.html 
http://www.bbc.co.uk/locator/default/desktop/en-GB?ptrt=%2F 
http://www.bbc.co.uk/# 
http://www.bbc.co.uk/# 
http://www.bbc.co.uk/weather/2643743?day=0 
http://www.bbc.co.uk/weather/2643743?day=0 
http://www.bbc.co.uk/weather/2643743?day=1 
http://www.bbc.co.uk/weather/2643743?day=1 
http://www.bbc.co.uk/weather/2643743?day=2 
http://www.bbc.co.uk/weather/2643743?day=2 
http://www.bbc.co.uk/locator/default/desktop/en-GB?ptrt=%2F 
http://www.bbc.co.uk/weather/2643743 
http://www.bbc.co.uk/news/science-environment-30311816 
http://www.bbc.co.uk/news/science-environment-30311822 
http://www.bbc.co.uk/news/science-environment-30311818 
http://www.bbc.co.uk/news/magazine-30282261 
http://www.bbc.co.uk/news/science-environment-30311816 
http://www.bbc.co.uk/news/uk-politics-30291460 
http://www.bbc.co.uk/news/ 
http://www.bbc.co.uk/news/uk-england-kent-30319549 
http://www.bbc.co.uk/news/world-europe-30306106 
http://www.bbc.co.uk/news/world-europe-30306992 
http://www.bbc.co.uk/news/uk-30306145 
http://www.bbc.co.uk/news/local/ 
http://www.bbc.co.uk/news/england/london/ 
http://www.bbc.co.uk/news/uk-england-london-30308694 
http://www.bbc.co.uk/news/uk-england-london-30315650 
http://www.bbc.co.uk/news/uk-england-london-30321504 
http://www.bbc.co.uk/sport/live/football/29959148 
http://www.bbc.co.uk/sport/0/ 
http://www.bbc.co.uk/sport/live/snooker/29618359 
http://www.bbc.co.uk/sport/football/30204433 
http://www.bbc.co.uk/sport/cricket/30308980 
http://www.bbc.co.uk/sport/football/30204434 
http://www.bbc.co.uk/sport/0/football/ 
http://www.bbc.co.uk/sport/football/30204459 
http://www.bbc.co.uk/sport/football/30204511 
http://www.bbc.co.uk/sport/football/28647040 
http://www.bbc.co.uk/?dzf=sport 
http://www.bbc.co.uk/?dzf=entertainment 
http://www.bbc.co.uk/?dzf=bbcnow 
http://www.bbc.co.uk/?dzf=entertainment 
http://www.bbc.co.uk/?dzf=news 
http://www.bbc.co.uk/?dzf=lifestyle 
http://www.bbc.co.uk/?dzf=knowledge 
http://www.bbc.co.uk/?dzf=sport 
http://www.bbc.co.uk/news/ 
http://www.bbc.com/news/ 
http://www.bbc.co.uk/sport/ 
http://www.bbc.co.uk/weather/ 
http://shop.bbc.com/ 
http://www.bbc.com/earth/ 
http://www.bbc.com/travel/ 
http://www.bbc.com/capital/ 
http://www.bbc.co.uk/iplayer/ 
http://www.bbc.com/culture/ 
http://www.bbc.com/autos/ 
http://www.bbc.com/future/ 
http://www.bbc.co.uk/tv/ 
http://www.bbc.co.uk/radio/ 
http://www.bbc.co.uk/cbbc/ 
http://www.bbc.co.uk/cbeebies/ 
http://www.bbc.co.uk/arts/ 
http://www.bbc.co.uk/ww1/ 
http://www.bbc.co.uk/food/ 
http://www.bbc.co.uk/history/ 
http://www.bbc.co.uk/learning/ 
http://www.bbc.co.uk/music/ 
http://www.bbc.co.uk/science/ 
http://www.bbc.co.uk/nature/ 
http://www.bbc.com/earth/ 
http://www.bbc.co.uk/local/ 
http://www.bbc.co.uk/travel/ 
http://www.bbc.co.uk/a-z/ 
http://www.bbc.co.uk/ 
http://www.bbc.co.uk/terms/ 
http://www.bbc.co.uk/aboutthebbc/ 
http://www.bbc.co.uk/privacy/ 
http://www.bbc.co.uk/privacy/cookies/about 
http://www.bbc.co.uk/accessibility/ 
http://www.bbc.co.uk/guidance/ 
http://www.bbc.co.uk/contact/ 
http://www.bbc.co.uk/bbctrust/ 
http://www.bbc.co.uk/complaints/ 
http://www.bbc.co.uk/help/web/links/ 
+0

謝謝,我知道我總是可以指望你解決我的問題。我也在上面提供了進一步的細節,對不起,我把它們放在開頭:) – user2128074 2014-12-03 22:21:01

+0

你將如何被用來獲取用戶輸入?就像用戶必須自己放置一個網站 – user2128074 2014-12-03 22:22:48

+0

@ user2128074:按照通常的方式:用'chomp(my $ url = <>)'從終端獲取URL,然後在'my $ resp = $ ua中使用它 - >獲取($網址)'。你不想讓你的原始程序工作嗎?我相信這對你理解Perl正則表達式會有幫助,而且我相當確定問題出在你的Perl代碼中,而不是正則表達式中,只要你能夠顯示它。 – Borodin 2014-12-03 22:24:59