1
我有我想從中抓取屬性的URL列表。新手到Python,所以請原諒。 Windows 7,64位。 Python 3.2。Python - 傳遞網址與HTTPResponse對象
以下代碼有效。 pblist是由包含關鍵字'short_url'的字典組成的列表。
for j in pblist[0:10]:
base_url = j['short_url']
if hasattr(BeautifulSoup(urllib.request.urlopen(base_url)), 'head') and \
hasattr(BeautifulSoup(urllib.request.urlopen(base_url)).head, 'title'):
print("Has head, title attributes.")
try:
j['title'] = BeautifulSoup(urllib.request.urlopen(base_url)).head.title.string.encode('utf-8')
except AttributeError:
print("Encountered attribute error on page, ", base_url)
j['title'] = "Attribute error."
pass
以下代碼不會 - 例如,代碼聲稱BeautifulSoup對象沒有頭和標題屬性。
for j in pblist[0:10]:
base_url = j['short_url']
page = urllib.request.urlopen(base_url)
if hasattr(BeautifulSoup(page), 'head') and \
hasattr(BeautifulSoup(page).head, 'title'):
print("Has head, title attributes.")
try:
j['title'] = BeautifulSoup(urllib.request.urlopen(base_url)).head.title.string.encode('utf-8')
except AttributeError:
print("Encountered attribute error on page, ", base_url)
j['title'] = "Attribute error."
pass
爲什麼?在BeautifulSoup中傳遞url到urllib.request.urlopen並傳遞urllib.request.urlopen返回的HTTPResponse ojbect有什麼區別?
明白了。謝謝琥珀。 – Zack 2012-03-27 22:16:17