我嘗試通過searchlabel遍歷html中的表,然後將找到的值更新爲字典,然後將這些值寫入csv。當前輸出適用於網址和標題,但名稱輸出將爲空或顯示「無」。但是,如果我打印博客[「名稱」]的輸出,它正確地拉取我想要的信息。我懷疑這是一個縮進錯誤,但我無法弄清楚排列在哪裏。但似乎沒有任何工作以獲得名稱分配到循環內工作。如何將列表添加到字典,然後輸出到.csv
import os
from bs4 import BeautifulSoup
import my_csv_writer
def td_finder(tr, searchLabel):
value = ""
index = tr.text.find(searchLabel)
if index>-1:
tds = tr.findAll('td')
if len(tds)>1:
value = tds[1].text
return value
def main():
topdir = 'some_directory'
writer = my_csv_writer.CsvWriter("output.csv")
writer.writeLine(["url", "headline", "name"])
"""Main Function"""
blog = []
for root, dirs, files in os.walk(topdir):
for f in files:
url = os.path.join(root, f)
url = os.path.dirname(url).split('some_file')[1]
if f.lower().endswith((".html")):
file_new = open(os.path.join(root, f), "r").read()
soup = BeautifulSoup(file_new)
blog = {}
#Blog Title
blog["title"] = soup.find('title').text
for table in soup.findAll("table"):
for tr in table.findAll("tr"):
#name
blog["name"] = td_finder(tr, "name:")
seq = [url, unicode(blog["title"]), unicode(blog.get("name"))]
writer.writeLine(seq)
#return ""
if __name__ == '__main__':
main()
print "Finished main"
'blog ['name'] = td_finder(tr,name:'' – Noelkd
')發生了什麼事對不起,錯字 - 我糾正了它,它只是tr_finder函數中searchLabel的一個通用持有者 – user2338089
Your 'main'函數沒有被正確地縮進('def main():'後面應該有一個縮進)。 – TobiMarg