2010-11-21 49 views
1

我對編程非常陌生,我只是閱讀文檔。 對於我閱讀過一些Perl書籍和PHP-Cookbook的小項目。 但我選擇了一些食譜,並相信與否:它在屏幕上看起來很可怕。我想我現在需要一些幫助 -微小可運行www ::機械化初學者示例

用我的小知識是很難做的工作......我需要一些機械化食譜工作,因爲下面的一些例子已經過時:

see the cpan-site for the mechanize examples

我很想了解更多 - 用活生生的例子 - 你有更多....

我喜歡聽你從

+0

嘗試詢問有關您嘗試解決的編程問題的具體問題。很難回答一般要求食譜的問題。 – 2010-11-22 21:25:06

回答

13

你可能會在你究竟是什麼更具體一點之後......對例如,這是一個腳本登錄到一個網站:

#call the mechanize object, with autocheck switched off 
    #so we don't get error when bad/malformed url is requested 
    my $mech = WWW::Mechanize->new(autocheck=>0); 
    my %comments; 
    my %links; 
    my @comment; 

    my $target = "http://google.com"; 
    #store the first target url as not checked 
    $links{$target} = 0; 
    #initiate the search 
    my $url = &get_url(); 

    #start the main loop 
    while ($url ne "") 
    { 
     #get the target url 
     $mech->get($url); 
     #search the source for any html comments 
     my $res = $mech->content; 
     @comment = $res =~ /<!--[^>]*-->/g; 
     #store comments in 'comments' hash and output it on the screen, if there are any found 
     $comments{$url} = "@comment" and say "\n$url \n---------------->\n $comments{$url}" if $#comment >= 0; 
     #loop through all the links that are on the current page (including only urls that are contained in html anchor) 

     foreach my $link ($mech->links()) 
     { 
      $link = $link->url(); 
      #exclude some irrelevant stuff, such as javascript functions, or external links 
      #you might want to add checking domain name, to ensure relevant links aren't excluded 

      if ($link !~ /^(#|mailto:|(f|ht)tp(s)?\:|www\.|javascript:)/) 
      { 
      #check whether the link has leading slash so we can build properly the whole url 
      $link = $link =~ /^\// ? $target.$link : $target."/".$link; 
      #store it into our hash of links to be searched, unless it's already present 
      $links{$link} = 0 unless $links{$link}; 
      } 
     } 

     #indicate we have searched this url and start over 
     $links{$url} = 1; 
     $url = &get_url(); 
    } 

    sub get_url 
    { 
     my $key, my $value; 
     #loop through the links hash and return next target url, unless it's already been searched 
     #if all urls have been searched return empty, ending the main loop 

     while (($key,$value) = each(%links)) 
     { 
      return $key if $value == 0; 
     } 

     return ""; 
    } 

use WWW::Mechanize; 

my $mech = WWW::Mechanize->new(); 
my $url = "http://www.test.com"; 

$mech->cookie_jar->set_cookie(0,"start",1,"/",".test.com"); 
$mech->get($url); 
$mech->form_name("frmLogin"); 
$mech->set_fields(user=>'test',passwrd=>'test'); 
$mech->click(); 
$mech->save_content("logged_in.html"); 

這是執行谷歌的腳本從每一個網頁搜索

use WWW::Mechanize; 
use 5.10.0; 
use strict; 
use warnings; 

my $mech = new WWW::Mechanize; 

my $option = $ARGV[$#ARGV]; 

#you may customize your google search by editing this url (always end it with "q=" though) 
my $google = 'http://www.google.co.uk/search?q='; 


my @dork = ("inurl:dude","cheese"); 

     #declare necessary variables 
     my $max = 0; 
     my $link; 
     my $sc = scalar(@dork); 

     #start the main loop, one itineration for every google search 
     for my $i (0 .. $sc) { 

      #loop until the maximum number of results chosen isn't reached 
      while ($max <= $option) { 
       $mech->get($google . $dork[$i] . "&start=" . $max); 

       #get all the google results 
       foreach $link ($mech->links()) { 
        my $google_url = $link->url; 
        if ($google_url !~ /^\// && $google_url !~ /google/) { 
        say $google_url; 
      } 
        } 
        $max += 10; 
       } 


      } 

簡單的站點爬蟲提取信息(HTML註釋)

這真的取決於你以後的樣子,但如果你想要更多的例子,我會推薦你​​perlmonks.org,在那裏你可以找到很多材料讓你去。

絕對收藏此雖然mechanize module man page,這是最終的資源...

+0

您好Cyber​​-Guard設計!你回答K !!!這比預期的要多。真。我將在當天晚些時候仔細研究一下這個例子。再次 - 非常感謝這一卓越的幫助。你是一天中的人!當然可以!很多問候阿波羅曼 – zero 2010-11-22 00:29:20