Perl WWW ::機械化存儲網址，除非它已被發現

我有一個問題，我希望你可以幫忙嗎？Perl WWW ::機械化存儲網址，除非它已被發現

foreach my $url (keys %{$newURLs}) { 
    # first get the base URL and save its content length 
    $mech->get($url); 
    my $content_length = $mech->response->header('Content-Length'); 

    # now iterate all the 'child' URLs 
    foreach my $child_url (@{ $newURLs->{$url} }) { 
    # get the content 
    $mech->get($child_url); 

    # compare 
    if ($mech->response->header('Content-Length') != $content_length) { 
     print "$child_url: different content length: $content_length vs " 
     . $mech->response->header('Content-Length') . "!\n"; 
     #HERE I want to store the urls that are found to have different content 
     #lengths to the base url 
     #only if the same url has not already been stored 
    } elsif ($mech->response->header('Content-Length') == $content_length) { 
     print "Content lengths are the same\n"; 
     #HERE I want to store the urls that are found to have the same content 
     #length as the base url 
     #only if the same url has not already been stored 
    } 
    } 
}

我遇到的問題：

正如你可以在代碼中看到上面我想存儲的URL取決於如果內容長度相同或不同，所以我會最終得到一組具有與其基本URL不同的內容長度的URL，並且最終將得到另一組具有與其基本URL相同內容長度的URL。

我知道如何做到這一點很容易使用數組

push (@differentContentLength, $url); 
push (@sameContentLength, $url);

但我將如何去使用這個散列（或另一種首選方法）？

我仍然得到與哈希交手所以你的幫助將非常感激，

非常感謝

來源

2013-02-13 perl-user

您應該在您的循環中添加右括號。 – simbabque 2013-02-13 12:18:09

@simbabque - 是你的權利，道歉 – 2013-02-13 12:25:07

您可以創建一個hashref到將所有網址存儲在循環之外。我們稱之爲$content_lengths。這是一個標量，因爲它是對散列的引用。在您的$child_url循環中，將內容長度添加到該數據結構。我們將首先使用基礎網址，在$content_lengths->{$url}內部給我們另一個hashref。我們決定是否需要equal或different。在這兩個鍵的內部將會有另一個保存$child_url的hashref。他們反過來將他們的內容長度作爲價值。當然，如果你不想保存長度，我們可以在這裏說++。

my $content_lengths; # this is at the top 
foreach my $url (# ... more stuff 

# compare 
if ($mech->response->header('Content-Length') != $content_length) { 
    print "$child_url: different content length: $content_length vs " 
    . $mech->response->header('Content-Length') . "!\n"; 

    # store the urls that are found to have different content 
    # lengths to the base url only if the same url has not already been stored 
    $content_lengths->{$url}->{'different'}->{$child_url} = $mech->response->header('Content-Length'); 

} elsif ($mech->response->header('Content-Length') == $content_length) { 
    print "Content lengths are the same\n"; 

    # store the urls that are found to have the same content length as the base 
    # url only if the same url has not already been stored 
    $content_lengths->{$url}->{'equal'}->{$child_url} = $mech->response->header('Content-Length'); 
}

來源

2013-02-13 12:25:34 simbabque

當你說'使用++，如果你不想要長度被存儲'，這應該如何寫'$ content_lengths - > {$ url} - > {'不同的'} - > {$ child_url} ++;'？並澄清，究竟是什麼'++'在做什麼？ – 2013-02-13 14:04:56

@ perl-user是的，這就是我的意思。它是增量速記運算符。它向左側的var添加1並分配它。所以他們都有價值1.如果其中一個網站被看到兩次，價值將是2.這是如何計數器和'記住名字，但不關心有多少'實施。你可以用'keys'來訪問它。把它想象成SQL中的「GROUP BY」。 – simbabque 2013-02-13 15:53:45

哦，我看到現在如何防止重複（使用Data :: Dumper），而不是在其中添加另一個重複的url只是通過增加分配給該url的數字來註冊它的存在，這並不重要，因爲我們對該部分不感興趣，感謝你的上面的評論解釋得很好:) – 2013-02-13 16:40:17

請檢查該解決方案：

my %content_length; 

foreach my $url (keys %{$newURLs}) { 
    # first get the base URL and save its content length 
    $mech->get($url); 
    my $content_length = $mech->response->header('Content-Length'); 

    # now iterate all the 'child' URLs 
    foreach my $child_url (@{ $newURLs->{$url} }) { 
    # get the content 
    $mech->get($child_url); 
    my $new_content_length = $mech->response->header('Content-Length'); 
    # store in hash 
    print "New URL! url: $child_url\n" if ! defined $content_length{$child_url}; 
    print "Different content_length! url: $child_url, old_content_length: $content_length, new_content_length: $new_content_length\n" if $new_content_length != $content_length{$child_url}; 
    $content_length{$child_url} = $new_content_length; 
    } 
}

來源

2013-02-13 11:11:24 user1126070

Perl WWW ::機械化存儲網址，除非它已被發現

回答

相關問題