2015-09-23 114 views
0

該腳本從多個新聞網站抓取標題,並統計新聞標題中出現多少次單詞。替換特定單詞

我收到了像「to」,「for」和類似的詞,我不打算用這個腳本搶奪。

我試着寫一個str.translate(None,「to」)來刪除這個單詞,但它刪除了「貪婪」 - 搶走了華盛頓的一些部分,當我想刪除它的時候就是「to」 。

import pprint 
import feedparser 
from collections import Counter 

def feedGrabber(feed): 
    parsed = feedparser.parse(feed) 
    feed1 = [] 
    feed1.append(parsed.entries[0].title) 
    feed1.append(parsed.entries[1].title) 
    feed1.append(parsed.entries[3].title) 
    feed1.append(parsed.entries[4].title) 
    feed1.append(parsed.entries[5].title) 
    feed1.append(parsed.entries[6].title) 
    feed1.append(parsed.entries[7].title) 
    feed1.append(parsed.entries[8].title) 
    feed1.append(parsed.entries[9].title) 
    feed1 = str(feed1) 
    feedsplit = feed1 
    feedsplit = feedsplit.translate(None, '\'') 
    feedsplit = feedsplit.translate(None, 'u') 
    feedsplit = feedsplit.translate(None, '[') 
    feedsplit = feedsplit.translate(None, ']') 
    feedsplit = str.lower(feedsplit) 
    feedsplit = str.split(feedsplit) 
    return(feedsplit) 

reddit = feedGrabber("https://www.reddit.com/r/news/.rss") 
cnn = feedGrabber('http://rss.cnn.com/rss/cnn_topstories.rss') 
nyt = feedGrabber('http://rss.nytimes.com/services/xml/rss/nyt/HomePage.xml') 

one = Counter(reddit) 
two = Counter(cnn) 
three = Counter(nyt) 
pprint.pprint(one + two + three) 
+0

刪除它們爲什麼不刪除的話,如「」從'Counter'對象創建?這比創建正則表達式更容易。另外,你可能想了解'for'循環。 – TigerhawkT3

+2

您正在查找* stopword filtering * [見這篇文章](http://stackoverflow.com/questions/5486337/how-to-remove-stop-words-using-nltk-or-python) – rebeling

回答

2

這裏是常見單詞的列表,你可以使用列表理解中從文本

text = [ x for x in text if not isCommon(x)] 


    def isCommon(word): 

    commonWords = ["the", "be", "and", "of", "a", "in", "to", "have", "it", 
     "i", "that", "for", "you", "he", "with", "on", "do", "say", "this", 
     "they", "is", "an", "at", "but","we", "his", "from", "that", "not", 
     "by", "she", "or", "as", "what", "go", "their","can", "who", "get", 
     "if", "would", "her", "all", "my", "make", "about", "know", "will", 
     "as", "up", "one", "time", "has", "been", "there", "year", "so", 
     "think", "when", "which", "them", "some", "me", "people", "take", 
     "out", "into", "just", "see", "him", "your", "come", "could", "now", 
     "than", "like", "other", "how", "then", "its", "our", "two", "more", 
     "these", "want", "way", "look", "first", "also", "new", "because", 
     "day", "more", "use", "no", "man", "find", "here", "thing", "give", 
     "many", "well"] 

    if word in commonWords: 
     return True 
    return False