-2
我想提取任何NYTimes文章的內容,並將其放入一個字符串來計算某些單詞。所有文章內容都在HTML'p'標籤中找到。我能得到的段落一一(代碼中的註釋),而是因爲我不斷收到以下錯誤,我不能遍歷變量段落:Python + BeautifulSoup紐約時報網頁文章刮
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-52-ccc2f7cf5763> in <module>()
16
17 for i in paragraphs:
---> 18 article = article + paragraphs[i].get_text()
19
20 print(article)
TypeError: list indices must be integers, not Tag
下面是代碼:
import requests
from bs4 import BeautifulSoup
session = requests.Session()
url = "http://www.nytimes.com/2015/01/02/world/europe/turkey-police-thwart-attack-on-prime-ministers-office.html"
req = session.get(url)
soup = BeautifulSoup(req.text)
paragraphs = soup.find_all('p', class_='story-body-text story-content')
#article = paragraphs[0].get_text()
#article = article + paragraphs[1].get_text()
#article = article + paragraphs[2].get_text()
#article = article + paragraphs[3].get_text()
#article = article + paragraphs[4].get_text()
#article = article + paragraphs[5].get_text()
#article = article + paragraphs[6].get_text()
for i in paragraphs:
article = article + paragraphs[i].get_text()
print(article)
非常感謝您的幫助。我是一名經濟學家,剛開始學習如何編碼。感謝有耐心幫助我解決這個問題。
不要忘記檢查NYT服務條款過多,特別是如果你使用自己的文章比更學習練習。 –