我正在構建一個簡單的程序,以從URL列表中獲取所有頁面標題,然後將它們寫入CSV文件。我已經完成並理解了大部分內容,除了一件事情:無論我如何更改代碼,我都會一遍又一遍地得到Key Error。請看看,並告訴我有什麼不對這些代碼:Wayback Machine scraper上的JSON KeyError
import requests
import json
import urllib2
import csv
from BeautifulSoup import BeautifulSoup
def getsnapshot(domain):
base = 'http://archive.org/wayback/available?url='
r = requests.get(base+domain, verify=False)
j = json.loads(r.text)
if j['archived_snapshots'] == {}:
pass
else:
archive_url = j['archived_snapshots']['closest']['url']
return archive_url
def gettitle(url):
soup = BeautifulSoup(urllib2.urlopen(getsnapshot(url)))
return soup.title.string
def writecsv(domain):
c = csv.writer(open("output.csv", "wb"))
snapshoturl = getsnapshot(domain)
title = gettitle(snapshoturl)
c.writerow([domain,title])
with open('input.txt', 'r') as f:
for line in f.read().splitlines():
writecsv(line)
我的意見的方式URL列表,具體域名。我正在檢查域名歷史記錄,以查看過去是否存在垃圾郵件。
這裏的JSON
{
"archived_snapshots": {
"closest": {
"available": true,
"url": "http://web.archive.org/web/20050408030822/http://www.001music.net:80/",
"timestamp": "20050408030822",
"status": "200"
}
}
}
您好,感謝的建議。我已經用json更新了這個問題。你能再看看嗎? –
哪一行是你從錯誤中得到的?和錯誤消息? – taesu
實際上'getsnapshot(domain)'的條件語句似乎沒有返回正確的URL。如果我將其更改爲 'archive_url = j ['archived_snapshots'] ['closest'] ['url'] if if [[archived_snapshots'] else'' 它產生'KeyError:'closest'' –