從列表中計算字符串中元素的出現次數？

我試圖計算我收集的一些演講中出現口頭收縮的次數。一個特殊的演講是這樣的：從列表中計算字符串中元素的出現次數？

speech = "I've changed the path of the economy, and I've increased jobs in our own 
home state. We're headed in the right direction - you've all been a great help."

所以，在這種情況下，我想計算四（4）個收縮。我有宮縮的列表，這裏有一些最初的幾個術語：

contractions = {"ain't": "am not; are not; is not; has not; have not", 
"aren't": "are not; am not", 
"can't": "cannot",...}

我的代碼看起來是這樣的，首先：

count = 0 
for word in speech: 
    if word in contractions: 
     count = count + 1 
print count

我不是這個Anywhere入門但是，因爲代碼遍歷每一個字母，而不是整個單詞。

來源

2015-10-06 blacksite

for word in speech.split（''）： – Monkpit

我沒有得到你的字典中的值在做什麼，你有一個字典順便說一句btw沒有列表 –

我在我的答案中添加了很多東西應該給你一些額外的。 – colidyre

使用str.split()拆就空白的字符串：

for word in speech.split():

這將各執任意空白;這意味着空格，製表符，換行符和一些更具異國情調的空白字符，以及任意數量的連續字符。

您可能需要使用str.lower()小寫你的話（否則Ain't不會被發現，例如），並去掉標點符號：

from string import punctuation 

count = 0 
for word in speech.lower().split(): 
    word = word.strip(punctuation) 
    if word in contractions: 
     count += 1

我使用str.strip() method這裏;它會從單詞的開頭和結尾中刪除在string.punctuation string中找到的所有內容。

來源

2015-10-06 20:28:23

你正在遍歷一個字符串。所以這些項目是字符。爲了從字符串中獲得單詞，你可以使用一些天真的方法，例如str.split()，它可以爲你創建（現在你可以迭代一個字符串列表（在str.split（）的參數上分割的單詞，默認：在空格上分割）。甚至有re.split()，這是更強大。但我不認爲你需要用拆分正則表達式中的文本。

，你所要做的，至少是str.lower()爲小寫的字符串或把所有可能出現次數（也是大寫字母），我強烈推薦第一個替代方案，後者並不是真正可行的，去除標點符號也是一個責任，但這仍然是天真的，如果你需要更復雜的方法，你必須通過詞分詞器分割文本。NLTK是一個很好的起點，請參閱nltk tokenizer。但我強烈地認爲這個問題不是你的主要問題，或者真的影響你解決你的問題。 :)

speech = """I've changed the path of the economy, and I've increased jobs in our own home state. We're headed in the right direction - you've all been a great help.""" 
# Maybe this dict makes more sense (list items as values). But for your question it doesn't matter. 
contractions = {"ain't": ["am not", "are not", "is not", "has not", "have not"], "aren't": ["are not", "am not"], "i've": ["i have", ]} # ... 

# with re you can define advanced regexes, but maybe 
# from string import punctuation (suggestion from Martijn Pieters answer 
# is still enough for you) 
import re 

def abbreviation_counter(input_text, abbreviation_dict): 
    count = 0 
    # what you want is a list of words. str.split() does this job for you. 
    # " " is default and you can also omit this. But if you really need better 
    # methods (see answer text abover), you have to take a word tokenizer tool 
    # or have to write your own. 
    for word in input_text.split(" "): 
     # and also clean word (remove ',', ';', ...) afterwards. The advantage of 
     # using re over `from string import punctuation` is that you have more 
     # control in what you want to remove. That means that you can add or 
     # remove easily any punctuation mark. It could be very handy. It could be 
     # also overpowered. If the latter is the case, just stick to Martijn Pieters 
     # solution. 
     if re.sub(',|;', '', word).lower() in abbreviation_dict: 
      count += 1 

    return count 

print abbrev_counter(speech, contractions) 
2 # yeah, it worked - I've included I've in your list :)

這是一個豆蔻有點沮喪給在作爲的Martijn Pieters的做同樣的時間回答），但我希望我仍然產生了一些價值你。這就是爲什麼我編輯了我的問題，以便爲未來的工作提供一些提示。

來源

2015-10-06 20:24:21 colidyre

感謝您的輸入，但我已經從這個問題轉向了。但是，您的解決方案確實奏效！我只是不想回去重新格式化我的整個'contractions'字典:) – blacksite

是的，這只是一個建議。如果能夠以任何方式提供幫助，我將很樂意爲我的工作得到讚揚。 :) – colidyre

我已經得到你:) – blacksite

A for Python中的循環遍歷迭代中的所有元素。在字符串的情況下，元素是字符。

您需要將字符串拆分爲包含單詞的字符串的列表（或元組）。您可以使用.split(delimiter)。

你的問題是相當普遍的，所以Python有一個快捷方式：speech.split()拆分任何數量的空格/製表符/換行符，所以你只能在列表中獲得你的單詞。

所以，你的代碼應該是這樣的：

count = 0 
for word in speech.split(): 
    if word in contractions: 
     count = count + 1 
print(count)

speech.split(" ")工作過，但只在拆分空格而不是製表符，換行符，如果有雙空格，你會得到你的結果列表空元素。

來源

2015-10-06 20:38:18 cg909

從列表中計算字符串中元素的出現次數？

回答

相關問題