使用Shell從頁面獲取隨機鏈接

我正在嘗試編寫一個非常基本的基準測試腳本，該腳本將從主頁開始加載網站中的隨機頁面。使用Shell從頁面獲取隨機鏈接

我將使用curl來抓取頁面的內容，但是隨後我想從中加載一個隨機的下一頁。有人能給我一些Shell代碼，從curl命令的輸出中隨機獲得一個href的URL嗎？

2012-07-12 user1497049

這就是我想出了：

curl <url> 2> /dev/null | egrep "a href=" | sed 's/.*<a href="//' | \ 
cut -d '"' -f 1-1 | while read i; do echo "`expr $RANDOM % 1000`:$i"; done | \ 
sort -n | sed 's/[0-9]*://' | head -1

與您正在試圖從某個鏈接的網址更換位。

編輯：可能更容易使包含腳本調用getrandomurl.sh：

#!/bin/sh 

curl $1 2> /dev/null | egrep "a href=" | sed 's/.*<a href="//' | \ 
cut -d '"' -f 1-1 | while read i; do echo "`expr $RANDOM % 1000`:$i"; done | \ 
sort -n | sed 's/[0-9]*://' | head -1

和像./getrandomurl.sh http://stackoverflow.com什麼的運行。

來源

2012-07-12 18:55:37

這太棒了，謝謝！麻煩是似乎有時也加載JavaScript。有沒有辦法做到這一點，它只會加載鏈接到實際頁面？ – user1497049 2012-07-12 19:01:54

你是什麼意思「加載JavaScript」？ – 2012-07-12 19:09:19

嘗試在http://www.google.co.uk/上運行它，例如。有時輸出將是JS代碼。 – user1497049 2012-07-12 19:14:23

同時使用猞猁和bash數組：

hrefs=($(lynx -dump http://www.google.com | 
sed -e '0,/^References/{d;n};s/.* \(http\)/\1/')) 
echo ${hrefs[$(($RANDOM % ${#hrefs[@]}))]}

來源

2012-07-12 19:26:59 fork0

不是curl解決方案，但我的事情更有效的給定的任務。

我建議使用perlWWW::Mechanize模塊。例如轉儲從頁面用像這樣所有鏈接：

use WWW::Mechanize; 

$mech = WWW::Mechanize->new(); 
$mech->get("URL"); 
$mech->dump_links(undef, 'absolute' => 1);

注URL應與通緝頁面進行更換。

然後要麼內perl繼續，下面跟隨URL頁面上的隨機鏈接：

$number_of_links = "" . @{$mech->links()}; 
$mech->follow_link(n => int(rand($number_of_links)))

或使用以上dump_links版外殼內得到進一步的URL和流程，例如（如果上面的腳本被稱爲get_urls.pl）：

./get_urls.pl | shuf | while read; do 
    # Url is now in the $REPLY variable 
    echo "$REPLY" 
done

來源

2012-07-12 19:35:48 Thor

使用Shell從頁面獲取隨機鏈接

回答

相關問題