屬性錯誤的網絡爬蟲

當運行下面的代碼：屬性錯誤的網絡爬蟲

import urllib 
import re 
from urllib import request 
import webbrowser 

#email pattern 
r'[\w._(),:;<>][email protected][\w._(),:;<>][.]\w+' 

# url pattern 
r'\w\w\w[.]\w+[.]\w+' 

html = urllib.request.urlopen('somelinkthatistoolongforstackoverflow') 

#find all websites 

websites = re.findall(r'http://www[.]\w+[.]\w+',str(html.read())) 
print(websites) 

#find all emails 

emails = re.findall(r'[\w._(),:;<>][email protected][\w._(),:;<>][.]\w+',str(html.read())) 
print(emails) 

#sort through websites and find other links 

for i in websites: 
    y = urllib.request.urlopen(i) 
    x = re.findall(r'http://www[.]\w+[.]\w+',str(y.read())) 
    websites.append(x)

我得到這個錯誤：

AttributeError: 'list' object has no attribute 'timeout'

通知的AttributeError的。我能做些什麼呢？我正在使用urllib模塊和正則表達式（正則表達式）模塊。這是在Python 3.3.0。誰能幫我這個？如果你能幫助我，請在下面發帖。這是一個網絡爬蟲，我可以找到儘可能多的鏈接和電子郵件地址。感謝所有能夠幫助的人。

來源

2013-04-18 user2070615

請包括* full *回溯。 –

你想延長websites：

websites.extend(x)

因爲x本身就是一個列表。

您目前追加匹配網站的列表，所以在某些時候，你會從for垂耳到urllib.request.urlopen()然後試圖把它當作一個Request對象傳遞該列表作爲i，因爲它肯定不是字符串，另一個有效的選項。

來源

2013-04-18 21:01:52

屬性錯誤的網絡爬蟲

回答

相關問題