我如何閱讀一行字？

這就是我所做的。問題將在最後。我如何閱讀一行字？

1）I第一次打開使用open().read()到如下運行一個函數爲.txt文件：

def clean_text_passage(a_text_string): 
    new_passage=[] 
    p=[line+'\n' for line in a_text_string.split('\n')] 
    passage = [w.lower().replace('</b>\n', '\n') for w in p] 

    if len(passage[0].strip())>0: 
     if len(passage[1].strip())>0: 
      new_passage.append(passage[0]) 
    return new_passage

2）使用返回的new_passage，我轉換字轉換爲使用下面的命令字的行：

newone = "".join(new_passage)

3）然後，如下跑另一功能：

def replace(filename): 
    match = re.sub(r'[^\s^\w+]risk', 'risk', filename) 
    match2 = re.sub(r'risk[^\s^\-]+', 'risk', match) 
    match3 = re.sub(r'risk\w+', 'risk', match2) 
    return match3

到目前爲止，一切都很好。現在是這個問題。當我打印match3：

i agree to the following terms regarding my employment or continued employment 
with dell computer corporation or a subsidiary or affiliate of dell computer 
corporation (collectively, "dell").

看起來單詞排成一列。但是，

4）我由convert = count_words(match3)跑了最後的功能如下：

def count_words(newstring): 
    from collections import defaultdict 
    word_dict=defaultdict(int) 
    for line in newstring: 
    words=line.lower().split() 
    for word in words: 
     word_dict[word]+=1

當我打印word_dict，它顯示如下：

defaultdict(<type 'int'>, {'"': 2, "'": 1, '&': 4, ')': 3, '(': 3, '-': 4, ',': 4, '.': 9, '1': 7, '0': 8, '3': 2, '2': 3, '5': 2, '4': 2, '7': 2, '9': 2, '8': 1, ';': 4, ':': 2, 'a': 67, 'c': 34, 'b': 18, 'e': 114, 'd': 44, 'g': 15, 'f': 23, 'i': 71, 'h': 22, 'k': 10, 'j': 2, 'm': 31, 'l': 43, 'o': 79, 'n': 69, 'p': 27, 's': 56, 'r': 72, 'u': 19, 't': 81, 'w': 4, 'v': 3, 'y': 16, 'x': 3})

因爲我的代碼的目的是爲了計算一個特定的詞，我需要像'風險'這樣的字眼（即我喜歡冒險）而不是'我'，'l'，'我'

問題：我該如何製作match3包含的單詞與我們通過使用readlines()獲得的單詞相同，以便我可以將單詞計爲一行？

當我將match3另存爲.txt文件時，使用readlines()重新打開它，然後運行count函數，它可以正常工作。我確實想知道如何使用readlines()保存並重新打開它？

謝謝。我希望我能弄清楚這一點，以便我可以睡覺。

來源

2012-09-02 Jimmy

試試這個

for line in newstring意味着ITER一個

def count_words(newstring): 
    from collections import defaultdict 
    word_dict=defaultdict(int) 
    for line in newstring.split('\n'): 
     words=line.lower().split() 
     for word in words: 
      word_dict[word]+=1

來源

2012-09-02 15:40:37 lucemia

TL一個字符;博士，問題是你如何通過拆分行的文本？

然後，它是相當簡單：

>>> text = '''This is a 
longer text going 
over multiple lines 
until the string 
ends.''' 
>>> text.split('\n') 
['This is a', 'longer text going', 'over multiple lines', 'until the string', 'ends.']

來源

2012-09-02 15:40:56 poke

你match3是一個字符串，所以

for line in newstring:

迭代的字符在newstring，而不是線。你可以簡單地寫

words = newstring.lower().split() 
for word in words: 
    word_dict[word]+=1

，或者如果您首選

for line in newstring.splitlines(): 
    words=line.lower().split() 
    for word in words: 
     word_dict[word]+=1

或什麼的。 [我會使用一個Counter自己，但defaultdict(int)是幾乎一樣好。]

注：

def replace(filename):

filename是不是文件名！

來源

2012-09-02 15:41:24 DSM

我明白了。非常感謝！！ – Jimmy

我如何閱讀一行字？

回答

相關問題