0
該腳本從多個新聞網站抓取標題,並統計新聞標題中出現多少次單詞。替換特定單詞
我收到了像「to」,「for」和類似的詞,我不打算用這個腳本搶奪。
我試着寫一個str.translate(None,「to」)來刪除這個單詞,但它刪除了「貪婪」 - 搶走了華盛頓的一些部分,當我想刪除它的時候就是「to」 。
import pprint
import feedparser
from collections import Counter
def feedGrabber(feed):
parsed = feedparser.parse(feed)
feed1 = []
feed1.append(parsed.entries[0].title)
feed1.append(parsed.entries[1].title)
feed1.append(parsed.entries[3].title)
feed1.append(parsed.entries[4].title)
feed1.append(parsed.entries[5].title)
feed1.append(parsed.entries[6].title)
feed1.append(parsed.entries[7].title)
feed1.append(parsed.entries[8].title)
feed1.append(parsed.entries[9].title)
feed1 = str(feed1)
feedsplit = feed1
feedsplit = feedsplit.translate(None, '\'')
feedsplit = feedsplit.translate(None, 'u')
feedsplit = feedsplit.translate(None, '[')
feedsplit = feedsplit.translate(None, ']')
feedsplit = str.lower(feedsplit)
feedsplit = str.split(feedsplit)
return(feedsplit)
reddit = feedGrabber("https://www.reddit.com/r/news/.rss")
cnn = feedGrabber('http://rss.cnn.com/rss/cnn_topstories.rss')
nyt = feedGrabber('http://rss.nytimes.com/services/xml/rss/nyt/HomePage.xml')
one = Counter(reddit)
two = Counter(cnn)
three = Counter(nyt)
pprint.pprint(one + two + three)
刪除它們爲什麼不刪除的話,如「」從'Counter'對象創建?這比創建正則表達式更容易。另外,你可能想了解'for'循環。 – TigerhawkT3
您正在查找* stopword filtering * [見這篇文章](http://stackoverflow.com/questions/5486337/how-to-remove-stop-words-using-nltk-or-python) – rebeling