2014-04-17 101 views
0

我目前正在嘗試基於特定單詞分割字符串。 什麼,我想實現的一個例子是基於單詞分割字符串

string =" Total number of boys is 2020 , Total number of states could be 19? Total number of votes is 400" 

我想無論何時遇到這個詞總要拆分的字符串。 我想分割的結果是以下模式的

results=['Total number of boys is 2020 ,' , 'Total number of states could be 19? ', 'Total number of votes is 400'] 

回答

1
def word_splitter(string, word): 
    my_list = [] 
    for phrase in string.split(word): 
     if len(phrase.strip()) > 0: 
      my_list.append('%s%s' % (word, phrase)) 
    return my_list 

所以

string =" Total number of boys is 2020 , Total number of states could be 19? Total number of votes is 400" 
word_splitter(string, 'Total ') 

回報

['Total number of boys is 2020 , ', 'Total number of states could be 19? ', 'Total number of votes is 400'] 
+0

我會在兩者中使用'總計',所以你確保我整個單詞和你的追加看起來更好。味道的問題雖然... – Diegomanas

+0

寫了一個函數。好多了,你不覺得嗎? :) – Bonifacio2

2

下面將找到句子開頭 '合計'並以標點符號結尾.,,?。你沒有提到要求用標點符號限制提取的字符串,但我懷疑你會發現它很方便。

>>> [m[0] + m[2] for m in re.findall('(Total(.*?))([,?.]|$)', string)] 
['Total number of boys is 2020 ,', 'Total number of states could be 19?', 'Total number of votes is 400'] 
+0

從我所看到的他想要在子串中保留標點符號 – Diegomanas

+0

OK,我只留下了標點符號的版本。 – piokuc

0

下面可以根據需要分割線。 首先我們分割字符串 「字符串」,然後將其添加到 「分隔符」

['Total' + item for index, item in enumerate(string.split('Total')) if index!=0 and item] 

結果:

['Total number of boys is 2020 , ', 'Total number of states could be 19? ', 'Total number of votes is 400'] 
1

另一種解決方案:

re.findall('(?:Total|^).*?(?=(?:Total)|$)', string) 

結果:

[' ', 'Total number of boys is 2020 , ', 'Total number of states could be 19? ', 'Total number of votes is 400']