字符串分割上新的生產線，標籤和若干空格

我試圖在一組有些不規則的數據，看起來像執行字符串分割：字符串分割上新的生產線，標籤和若干空格

\n\tName: John Smith 
\n\t Home: Anytown USA 
\n\t Phone: 555-555-555 
\n\t Other Home: Somewhere Else 
\n\t Notes: Other data 
\n\tName: Jane Smith 
\n\t Misc: Data with spaces

我想這個轉換成一個元組/字典，我後來將分裂在冒號:，但首先我需要擺脫所有額外的空白。我猜一個正則表達式是最好的方法，但我似乎無法得到一個有效的，下面是我的嘗試。

data_string.split('\n\t *')

來源

2012-09-21 PopeJohnPaulII

只要使用.strip()，它會刪除所有空白的你，包括製表符和換行符，而分裂。分裂本身然後可以用data_string.splitlines()完成：

[s.strip() for s in data_string.splitlines()]

輸出：

>>> [s.strip() for s in data_string.splitlines()] 
['Name: John Smith', 'Home: Anytown USA', 'Phone: 555-555-555', 'Other Home: Somewhere Else', 'Notes: Other data', 'Name: Jane Smith', 'Misc: Data with spaces']

，你甚至可以內嵌在:分裂以及現在：

>>> [s.strip().split(': ') for s in data_string.splitlines()] 
[['Name', 'John Smith'], ['Home', 'Anytown USA'], ['Phone', '555-555-555'], ['Other Home', 'Somewhere Else'], ['Notes', 'Other data'], ['Name', 'Jane Smith'], ['Misc', 'Data with spaces']]

來源

2012-09-21 15:56:07

奇妙的是，[List comprehension]（http://docs.python.org/tutorial/datastructures.html#list-comprehensions）的語法並不是我以前見過的，所以我認爲我'你必須閱讀它。 – PopeJohnPaulII

工作就像一個魅力！真棒！謝謝 –

您可以使用此

string.strip().split(":")

來源

2012-09-21 15:59:37 Rakesh

>>> for line in s.splitlines(): 
...  line = line.strip() 
...  if not line:continue 
...  ary.append(line.split(":")) 
... 
>>> ary 
[['Name', ' John Smith'], ['Home', ' Anytown USA'], ['Misc', ' Data with spaces' 
]] 
>>> dict(ary) 
{'Home': ' Anytown USA', 'Misc': ' Data with spaces', 'Name': ' John Smith'} 
>>>

來源

2012-09-21 16:01:14

可以一舉兩得一個正則表達式石：

>>> r = """ 
... \n\tName: John Smith 
... \n\t Home: Anytown USA 
... \n\t Phone: 555-555-555 
... \n\t Other Home: Somewhere Else 
... \n\t Notes: Other data 
... \n\tName: Jane Smith 
... \n\t Misc: Data with spaces 
... """ 
>>> import re 
>>> print re.findall(r'(\S[^:]+):\s*(.*\S)', r) 
[('Name', 'John Smith'), ('Home', 'Anytown USA'), ('Phone', '555-555-555'), ('Other Home', 'Somewhere Else'), ('Notes', 'Other data'), ('Name', 'Jane Smith'), ('Misc', 'Data with spaces')] 
>>>

來源

2012-09-21 16:03:04 georg

+1你的說法:) – Yamaneko

相當不錯，但是你的'[\ t] *'沒有做任何事情;如果有的話，'（。+）'將總是吃掉尾隨的空白。你可以這樣做：'（。+？）[\ t] * $'。不情願的量詞允許它提前停止，而'$'確保它仍然消耗整條線。 –

@AlanMoore：正確，張貼編輯。 – georg

正則表達式的是不是真的在這裏工作的最佳工具。正如其他人所說的那樣，使用str.strip()和str.split()的組合是最好的選擇。這裏有一個襯墊做到這一點：

>>> data = '''\n\tName: John Smith 
... \n\t Home: Anytown USA 
... \n\t Phone: 555-555-555 
... \n\t Other Home: Somewhere Else 
... \n\t Notes: Other data 
... \n\tName: Jane Smith 
... \n\t Misc: Data with spaces''' 
>>> {line.strip().split(': ')[0]:line.split(': ')[1] for line in data.splitlines() if line.strip() != ''} 
{'Name': 'Jane Smith', 'Other Home': 'Somewhere Else', 'Notes': 'Other data', 'Misc': 'Data with spaces', 'Phone': '555-555-555', 'Home': 'Anytown USA'}

來源

2012-09-21 16:05:55

如果你看一下the documentation爲str.split：

如果未指定九月或無，一個不同的分割算法應用於：連續的空格的運行是視爲單個分隔符，並且如果字符串具有前導或尾隨空白，則結果將不包含開始或結束處的空字符串。因此，將空字符串或只包含空格的字符串拆分爲無分隔符將返回[]。

換句話說，如果你想找出傳遞給split獲得'\n\tName: Jane Smith'到['Name:', 'Jane', 'Smith']，只是沒有通過（或無）。

這幾乎可以解決您的整個問題。還剩下兩個部分。

首先，你只有兩個字段，其中第二個字段可以包含空格。所以，你只需要一個分割，而不是儘可能多。所以：

s.split(None, 1)

接下來，你仍然有那些討厭的冒號。但你不需要分裂他們。至少給了你已經證明我們的數據，結腸總是出現在第一場結束，與前沒有空間，總是空間後，所以你可以只是將其刪除：

key, value = s.split(None, 1) 
key = key[:-1]

有100萬當然還有其他的方法可以做到這一點。這只是一個看起來最接近你已經嘗試的東西。

來源

2012-09-21 17:43:51 abarnert

字符串分割上新的生產線，標籤和若干空格

回答

相關問題