2013-02-04 118 views
1

我有一個文件用線條看起來像這樣:集團通過線條線條圖案

saldkfjaslk 
    asdlkfja 
    alsdkfjlk 
aslkda;kdfsdlkfaj 
sladkfjalskdfjlaskd 
    sldkfaj 
    lsadkfj 
qwewrewst 
se0polkjlkj 
lpoerlwoej 
    alskdjf 
    asldkfjljlkjlk 
sadlkfa 

我想組一起與有與空白開始consequitive線字符(不是空格)開始的行。我也想省略下一行不以空格開頭的行。使用上述示例的期望輸出如下所示:

[('saldkfjaslk', 'asdlkfja', 'alsdkfjlk'), 
('sladkfjalskdfjlaskd', 'sldkfaj', 'lsadkfj'), 
('lpoerlwoej', 'alskdjf', 'asldkfjljlkjlk')] 

如何在Python中解析此文件?

+0

貌似配置解析 - 如果是的話,可以考慮準備庫 –

回答

6
>>> regex = re.compile(r"^\S.*(?:\n\s.*)+", re.MULTILINE) 
>>> [tuple(match.split()) for match in regex.findall(s)] 
[('saldkfjaslk', 'asdlkfja', 'alsdkfjlk'), 
('sladkfjalskdfjlaskd', 'sldkfaj', 'lsadkfj'), 
('lpoerlwoej', 'alskdjf', 'asldkfjljlkjlk')] 

說明:

^ # Start of line 
\S # Match a non-whitespace character 
.* # Match the rest of the line 
(?: # Match... 
\n # a newline character 
\s # a whitespace character 
.* # and the rest of the line 
)+ # once or more 
+0

感謝。我試圖使用groupby,出於某種原因沒有考慮正則表達式。 – user1728853