字符串拆分問題

問題：通過以列表形式傳入的分隔符將字符串拆分爲單詞列表。字符串拆分問題

字符串："After the flood ... all the colors came out."

所需的輸出：['After', 'the', 'flood', 'all', 'the', 'colors', 'came', 'out']

我寫了下面的功能 - 注意，我知道有更好的方法使用一些內置的功能蟒蛇來分割字符串，但爲求學習，我想我會繼續這樣說：

def split_string(source,splitlist): 
    result = [] 
    for e in source: 
      if e in splitlist: 
       end = source.find(e) 
       result.append(source[0:end]) 
       tmp = source[end+1:] 
       for f in tmp: 
        if f not in splitlist: 
         start = tmp.find(f) 
         break 
       source = tmp[start:] 
    return result 

out = split_string("After the flood ... all the colors came out.", " .") 

print out 

['After', 'the', 'flood', 'all', 'the', 'colors', 'came out', '', '', '', '', '', '', '', '', '']

我想不通爲什麼「出籠」不拆分爲「來」和「走出去」作爲兩個單獨的單詞。就好像兩個單詞之間的空白字符被忽略一樣。我認爲其餘的產出是垃圾，這是源於與「出來」問題相關的問題。

編輯：

我跟着@ IVC的建議，並用下面的代碼上來：

def split_string(source,splitlist): 
    result = [] 
    lasti = -1 
    for i, e in enumerate(source): 
     if e in splitlist: 
      tmp = source[lasti+1:i] 
      if tmp not in splitlist: 
       result.append(tmp) 
      lasti = i 
     if e not in splitlist and i == len(source) - 1: 
      tmp = source[lasti+1:i+1] 
      result.append(tmp) 
    return result 

out = split_string("This is a test-of the,string separation-code!"," ,!-") 
print out 
#>>> ['This', 'is', 'a', 'test', 'of', 'the', 'string', 'separation', 'code'] 

out = split_string("After the flood ... all the colors came out.", " .") 
print out 
#>>> ['After', 'the', 'flood', 'all', 'the', 'colors', 'came', 'out'] 

out = split_string("First Name,Last Name,Street Address,City,State,Zip Code",",") 
print out 
#>>>['First Name', 'Last Name', 'Street Address', 'City', 'State', 'Zip Code'] 

out = split_string(" After the flood ... all the colors came out...............", " ." 
print out 
#>>>['After', 'the', 'flood', 'all', 'the', 'colors', 'came', 'out']

來源

2012-05-30 codingknob

你似乎在期待：

source = tmp[start:]

要修改for循環外被遍歷source。它不會 - 那個循環會繼續傳遞給你的字符串，而不是現在使用這個名字的任何對象。這可能意味着您要使用的角色可能不在source左側。

，而不是試圖做的是，跟蹤當前指數的字符串是這樣的：

for i, e in enumerate(source): 
    ...

，並要追加永遠是source[lasti+1:i]什麼，你只需要跟蹤的lasti 。

來源

2012-05-30 02:59:40 lvc

謝謝大家的精彩解決方案。我已經走了這一條，因爲它迫使我學習邏輯，而不是使用預先構建的函數。顯然，如果我要寫商業代碼，我不會重新發明輪子，但爲了學習的目的，我會與這個答案一起去。感謝大家的幫助。 – codingknob

你不需要內環電話。就在這就夠了：

def split_string(source,splitlist): 
    result = [] 
    for e in source: 
      if e in splitlist: 
       end = source.find(e) 
       result.append(source[0:end]) 
       source = source[end+1:] 
    return result

可以消除「垃圾」（即空字符串），通過檢查源[：結束]是一個空字符串，或者不是你把它添加到列表中。

來源

2012-05-30 02:49:38

爲什麼做太多的事情，如此簡單，嘗試..
str.split(strSplitter , intMaxSplitCount)intMaxSplitCount是可選
在你的情況，你必須做一些家務也一樣，如果你想避免... 一個是你可以取代它，像str.replace(".","", 3)3是可選的，它將取代前3點只是

因此，在短期，你必須做以下，
print ((str.replace(".", "",3)).split(" ")) 將打印你想要的是什麼

我做執行，Just Check Here,...

來源

2012-05-30 03:29:10

[x for x in a.replace('.', '').split(' ') if len(x)>0]

這裏「一個」是你輸入的字符串。

來源

2012-05-30 03:45:06 thavan

更簡單的方法，至少看起來簡單..

import string 

    def split_string(source, splitlist): 
     table = string.maketrans(splitlist, ' ' * len(splitlist)) 
     return string.translate(source, table).split()

您可以檢出string.maketrans和string.translate

來源

2012-05-30 04:49:39 xvatar

我想，如果你使用正則表達式，你可以很容易地得到它，如果你只想在的話上面的字符串。

>>> import re 
>>> string="After the flood ... all the colors came out." 
>>> re.findall('\w+',string) 
['After', 'the', 'flood', 'all', 'the', 'colors', 'came', 'out']

來源

2012-06-01 11:53:40

字符串拆分問題

回答

相關問題