python中的Bigram

-1

我想把一個句子分成兩部分。例如：python中的Bigram

"My name is really nice. This is so awesome."

- >

["My name","name is", "is really", "really nice.", "This is", "is so", "so awesome."]

任何幫助嗎？

來源

2014-09-21 Abhirup Ghosh

這是沒有辦法涉及到「Python中滾動或滑動窗口迭代器」，inspectorG4dget。 – 2014-09-21 14:05:42

你可以通過積極的先行做到這一點，

>>> import re 
>>> s = "My name is really nice. This is so awesome." 
>>> m = re.findall(r'(?=(\b\w+\b \S+))', s) 
>>> m 
['My name', 'name is', 'is really', 'really nice.', 'This is', 'is so', 'so awesome.']

模式說明：

(?=...)向前看符號是零長度斷言就像開頭和行的結束，單詞的開始和結束。它不會消耗字符串中的字符，但只會聲明是否可以匹配。
()捕捉用於捕捉與（）內的模式匹配的字符的組。
\b字界。它匹配單詞字符和非單詞字符。
\w+匹配一個或多個單詞字符。
\S+匹配空格和下列非空格字符。
findall函數通常打印捕獲組內的字符。如果沒有捕獲組，則它會打印匹配。在我們的例子中，它將打印出現在組索引1中的字符。要匹配重疊的字符，您需要將該模式放在預覽中。

來源

2014-09-21 13:54:51

如果你能解釋你的答案先生 – 2014-09-21 14:07:56

非常感謝！棒極了！ – 2014-09-21 14:17:41

首先，您可以在您的字符串中再次使用split('.')作爲拆分語句split然後每個句子用zip()連接它們！

>>> [' '.join(i) for s2 in s.split('.') for i in zip(s2.split(),s2.split()[1:])] 
['My name', 'name is', 'is really', 'really nice', 'This is', 'is so', 'so awesome'] 
>>>

來源

2014-09-21 13:54:27 Kasramvd

是的，我編輯並分割字符串''。第一！ – Kasramvd 2014-09-21 14:04:08

def ngrams(words, n): 
    return [words[i:i+n] for i in range(len(words)-n+1)]

輸出：

In [67]: ngrams("My name is really nice".split(),2) 
Out[67]: [['My', 'name'], ['name', 'is'], ['is', 'really'], ['really', 'nice']]

來源

2014-09-21 13:55:36 inspectorG4dget

python中的Bigram

回答

相關問題