我的句子,如清單:基於連詞遞歸組句子
Sentence 1.
And Sentence 2.
Or Sentence 3.
New Sentence 4.
New Sentence 5.
And Sentence 6.
我根據「共同標準」,試圖集團這些句子,例如,如果一個句子有一個共同開始(目前只「和」或「或」),那麼我想將它們分組,使得:
Group 1:
Sentence 1.
And Sentence 2.
Or Sentence 3.
Group 2:
New Sentence 4.
Group 3:
New Sentence 5.
And Sentence 6.
我寫了下面的代碼,它在某種程度上檢測到連續的句子,但不是所有的人。
我該如何遞歸編碼呢?我試圖迭代編碼,但有些情況下它不起作用,我無法弄清楚如何在遞歸中編碼。
tokens = ["Sentence 1.","And Sentence 2.","Or Sentence 3.","New Sentence 4.","New Sentence 5.","And Sentence 6."]
already_selected = []
attachlist = {}
for i in tokens:
attachlist[i] = []
for i in range(len(tokens)):
if i in already_selected:
pass
else:
for j in range(i+1, len(tokens)):
if j not in already_selected:
first_word = nltk.tokenize.word_tokenize(tokens[j].lower())[0]
if first_word in conjucture_list:
attachlist[tokens[i]].append(tokens[j])
already_selected.append(j)
else:
break
爲什麼你需要它遞歸?老實說,這是一個傻瓜的差事。 – Veedrac