如何根據標籤對句子進行分組？

如果我在一個文件中設置的句子，如：如何根據標籤對句子進行分組？

1 let's go shopping 
1 what a wonderful day 
1 let's party tonight 
2 nobody went there 
2 it was a deserted place 
3 lets go tomorrow 
4 what tomorrow 
4 ok sure let's see

我想組一組的這些句子。就像所有屬於標籤'1'的句子應該在一個組中，而在'2'中的那些句子應該在另一個組中。

所以我加載的文件是這樣的：

result=[] 
with open("sentences.txt","r") as filer: 
    for line in filer: 
     result.append(line.strip().split())

，所以我得到這樣的：

[['1', 'let's', 'go', 'shopping'], 
['1', 'what', 'a', 'wonderful', 'day'], 
['1', 'let's', 'party', 'tonight'], 
['2', 'nobody', 'went', 'there']]

現在，我想是這樣的：

for line in result: 
    if line[0]== '1': 
     process(line) 
    elif line[0]=='2': 
     process(line) 
    elif line[0]=='4': 
     process(line) 
    elif line[0]=='3': 
     process(line)

但問題在於它一次只考慮一個句子。我想要一個組中的所有'1'，然後對它們運行過程（函數）。

文件1：

[['1', 'in', 'seattle', 'today', 'the', 'secretary', 'of', 'education', 'richard', 'riley', 'delivered', 'his', 'address', 'on', 'the', 'state', 'of', 'american', 'education'], ['1', 'one', 'of', 'the', 'things', 'he', 'focused', 'on', 'as', 'the', 'president', 'had', 'done', 'in', 'his', 'state', 'of', 'the', 'union', 'was', 'the', 'goal', 'to', 'reduce', 'the', 'size', 'of', 'the', 'average', 'class']]

文件2：

[['1', 'in', 'seattl', 'today', 'the', 'secretari', 'of', 'educ', 'richard', 'riley', 'deliv', 'hi', 'address', 'on', 'the', 'state', 'of', 'american', 'educ'], ['1', 'one', 'of', 'the', 'thing', 'he', 'focus', 'on', 'a', 'the', 'presid', 'had', 'done', 'in', 'hi', 'state', 'of', 'the', 'union', 'wa', 'the', 'goal', 'to', 'reduc', 'the', 'size', 'of', 'the', 'averag', 'class']]

來源

2016-03-08 minks

from collections import defaultdict 

result = defaultdict(list) 
with open("sentences.txt","r") as filer: 
    for line in filer: 
     label, sentence = line.strip().split(' ', 1) 
     result[label].append(sentence)

那麼你就可以對其進行處理：

for label, sentences in result.items(): 
    # bla bla bla

來源

2016-03-08 10:09:25 wong2

你好，請問爲什麼會出現一個1在line.strip.split（''，1）？你是否指的是標籤1，如果是的話，我有一些標籤，所以它將不得不被修改爲每個標籤？ – minks

@minks不，這裏是最大分割時間：https：//docs.python.org/2/library/stdtypes.html#str.split – wong2

''ab c'.split（''，1） '''a'，'b c']'結果'ab'.split（''，2）'產生'['a'，'b'，'c']' – wong2

如何根據標籤對句子進行分組？

回答

相關問題