用括號分割Python的字符串

前一段時間我問了一個問題（Python splitting unknown string by spaces and parentheses），這很好地工作，直到我不得不改變我的思維方式。我仍然沒有掌握正則表達式，所以我需要一些幫助。用括號分割Python的字符串

如果用戶鍵入此：

new test (test1 test2 test3) test "test5 test6"

我想它看起來像輸出到變量是這樣的：

["new", "test", "test1 test2 test3", "test", "test5 test6"]

換句話說，如果它是一個單詞由一個空格分隔，然後將它與下一個單詞分開，如果它在括號中，則將整個單詞分組在括號中並將其刪除。引號也一樣。

我目前使用此代碼不符合上述標準（從上面的鏈接的答案）：

>>>import re 
>>>strs = "Hello (Test1 test2) (Hello1 hello2) other_stuff" 
>>>[", ".join(x.split()) for x in re.split(r'[()]',strs) if x.strip()] 
>>>['Hello', 'Test1, test2', 'Hello1, hello2', 'other_stuff']

這種運作良好，但有一個問題，如果你有這樣的：

strs = "Hello Test (Test1 test2) (Hello1 hello2) other_stuff"

它將Hello和Test合併爲一個分裂而不是兩個。

它也不允許同時使用圓括號和引號拆分。

來源

2013-06-27 TrevorPeyton

看看貪婪和非貪婪匹配。 – XORcist

@möter你有鏈接可以引導我學習指導嗎？我發現的大多數問題都是關於它的問題，並不能真正幫助我，而且我也無法閱讀python文檔。如果剩下的就只剩下它了。 – TrevorPeyton

對不起，我誤解了這個問題。但是，這裏有一個官方教程的鏈接：http://docs.python.org/2/library/re.html – XORcist

的回答很乾脆什麼：

re.findall('\[[^\]]*\]|\([^\)]*\)|\"[^\"]*\"|\S+',strs)

來源

2013-06-28 20:26:01 TrevorPeyton

您的問題沒有明確定義。

你的規則描述是

換句話說，如果它是一個字由空格分隔然後從下一個字分成它，如果是在括號中然後分開單詞的整個組在括號中並將其刪除。逗號也一樣。

我猜逗逗你的意思是引號引號。

然後用這個

strs = "Hello (Test1 test2) (Hello1 hello2) other_stuff"

你應該得到的是

["Hello (Test1 test2) (Hello1 hello2) other_stuff"]

因爲一切

被引號包圍。最有可能的是，你希望不用關心最大的引號。

我提出這一點，雖然難看

import re, itertools 
strs = raw_input("enter a string list ") 

print [ y for y in list(itertools.chain(*[re.split(r'\"(.*)\"', x) 
     for x in re.split(r'\((.*)\)', strs)])) 
     if y <> '']

一個機器人得到

>>> 
enter a string list here there (x y) thereagain "there there" 
['here there ', 'x y ', ' thereagain ', 'there there']

來源

2013-06-27 21:31:58 octoback

是的，對於逗號和引號以及我的措辭不太好的事實感到抱歉，這是一個漫長的夜晚。上面的代碼除了一件事以外，我試圖在這裏解釋'換句話說，如果它是一個由空格分隔的單詞，那麼將它從下一個單詞中分離出來'就相當於你的'here there'在你的代碼，並且應該被分成兩個不同的單詞「這裏」，「那裏」而不是「聽到」。 – TrevorPeyton

這是在幫你想到

import re, itertools 
strs = raw_input("enter a string list ") 

res1 = [ y for y in list(itertools.chain(*[re.split(r'\"(.*)\"', x) 
     for x in re.split(r'\((.*)\)', strs)])) 
     if y <> ''] 

set1 = re.search(r'\"(.*)\"', strs).groups() 
set2 = re.search(r'\((.*)\)', strs).groups() 

print [k for k in res1 if k in list(set1) or k in list(set2) ] 
    + list(itertools.chain(*[k.split() for k in res1 if k 
    not in set1 and k not in set2 ]))

來源

2013-06-28 07:30:11 octoback

雖然如果我把'new test test（test1 test2）word word'test1 test2 tet3'te st''輸出爲'['test1 test2'，'test1 test2 tet3'，'test1 test2 tet3'新'，'單詞'，'測試'，'單詞'，'單詞'，'te'，'st']'這幾乎是正確的，但新單詞不合適。 – TrevorPeyton

對不起，我錯過了這個訂單實際上很重要 – octoback

我以爲這將是一個給定的，下次我會指定。這個代碼有一個簡單的解決方法嗎？ – TrevorPeyton

這是推動什麼regexps可以做。考慮改用pyparsing。它確實遞歸下降。對於這個任務，你可以使用：

from pyparsing import * 
import string, re 

RawWord = Word(re.sub('[()" ]', '', string.printable)) 
Token = Forward() 
Token << (RawWord | 
      Group('"' + OneOrMore(RawWord) + '"') | 
      Group('(' + OneOrMore(Token) + ')')) 
Phrase = ZeroOrMore(Token) 

Phrase.parseString(s, parseAll=True)

這是對奇怪的空白和處理嵌套括號強大。它比一個大的正則表達式更具可讀性，因此更易於調整。

我知道你早就解決你的問題，但是這是對這類問題的最高谷歌排名的頁面之一，pyparsing是根據著名的圖書館。

來源

2017-04-07 18:16:56 dspeyer

用括號分割Python的字符串

回答

相關問題