使用urlopen打開網址列表

我有一個python腳本來獲取網頁並對其進行鏡像。它適用於一個特定的頁面，但我不能讓它工作多個。我以爲我可以把多個網址變成一個列表，然後哺養的功能，但我得到這個錯誤：使用urlopen打開網址列表

Traceback (most recent call last): 
    File "autowget.py", line 46, in <module> 
    getUrl() 
    File "autowget.py", line 43, in getUrl 
    response = urllib.request.urlopen(url) 
    File "/usr/lib/python3.2/urllib/request.py", line 139, in urlopen 
    return opener.open(url, data, timeout) 
    File "/usr/lib/python3.2/urllib/request.py", line 361, in open 
    req.timeout = timeout 
AttributeError: 'tuple' object has no attribute 'timeout'

這裏是有問題的代碼：

url = ['https://www.example.org/', 'https://www.foo.com/', 'http://bar.com'] 
def getUrl(*url): 
    response = urllib.request.urlopen(url) 
    with urllib.request.urlopen(url) as response, open(file_name, 'wb') as out_file: 
     shutil.copyfileobj(response, out_file) 
getUrl()

我已經用盡谷歌試圖找到如何用urlopen（）打開一個列表。我找到了一種這樣的作品。它需要一個.txt文檔並逐行閱讀，將每一行作爲URL提供，但我正在使用Python 3編寫此文檔，並且由於某種原因twillcommandloop將不會導入。另外，這種方法很笨拙，需要（據說）不必要的工作。

無論如何，任何幫助將不勝感激。

來源

2014-04-24 Eric Lagergren

你爲什麼不簡單地用'for'循環迭代你的URL列表？ –

回覆sheng的評論時纔想起來！它會將特定部分作爲字符串返回，對嗎？ –

在代碼中存在一些誤區：

你定義可變參數列表（在錯誤的元組）getUrls;
您管理getUrls參數作爲一個變量（列表，而不是）

您可以使用此代碼

import urllib2 
import shutil 

urls = ['https://www.example.org/', 'https://www.foo.com/', 'http://bar.com'] 
def getUrl(urls): 
    for url in urls: 
     #Only a file_name based on url string 
     file_name = url.replace('https://', '').replace('.', '_').replace('/','_') 
     response = urllib2.urlopen(url) 
     with open(file_name, 'wb') as out_file: 
     shutil.copyfileobj(response, out_file) 
getUrl(urls)

來源

2014-04-24 20:34:32

謝謝 - 在我的代碼的早些時候，我將文件名設置爲'/ path/to/directory''加上'domain'，其中'domain'是http：// www'和'.com'之間的字符串。 URL。腳本既FTP和推（通過Git到GitHub頁），所以我必須有一個設置文件路徑或否則Git部分將無法正常工作。無論如何，再次感謝！欣賞它！ –

它不支持的元組：

urllib.request.urlopen(url[, data][, timeout]) 
Open the URL url, which can be either a string or a Request object.

而且您的電話是不正確。它應該是：

getUrl(url[0],url[1],url[2])

而在函數內部，使用一個類似於「for u in url」的循環來遍歷所有的URL。

來源

2014-04-24 20:21:34 Sheng

所以我將不得不創建多個變量？不幸的。謝謝。我很感激。但是有沒有更容易通過一個列表？使用for循環而不是var [0]，var [0]等？ –

不，你應該使用'for'循環。 –

@LukasGraf是的，你是對的。忘了吧。 – Sheng

你應該使用for循環只是遍歷你的URL嘗試：

import shutil 
import urllib.request 


urls = ['https://www.example.org/', 'https://www.foo.com/'] 

file_name = 'foo.txt' 

def fetch_urls(urls): 
    for i, url in enumerate(urls): 
     file_name = "page-%s.html" % i 
     response = urllib.request.urlopen(url) 
     with open(file_name, 'wb') as out_file: 
      shutil.copyfileobj(response, out_file) 

fetch_urls(urls)

我假設你想把內容保存到單獨的文件中，所以我用enumerate這裏創建一個uniqe文件名，但是您明顯可以使用hash(),uuid模塊中的任何內容創建slugs。

來源

2014-04-24 20:39:19

使用urlopen打開網址列表

回答

相關問題