清理嵌套列表

我有一個嵌套列表的一個巨大的混亂，看起來這樣的事情，只是長：清理嵌套列表

fruit_mess = [['watermelon,0,1.0\n'], ['apple,0,1.0\n'], ['"pineapple",0,1.0\n'], ['"strawberry, banana",0,1.0\n'], ['peach plum pear,0,1.0\n'], ['"orange, grape",0,1.0\n']]

最後，我想要的東西，看起來像這樣：

neat_fruit = [['watermelon',0,1.0], ['apple',0,1.0], ['pineapple',0,1.0], ['strawberry, banana',0,1.0], ['peach plum pear',0,1.0], ['orange, grape',0,1.0]]

但我我不知道如何處理報價中的雙引號，以及如何分割數字中的成果，特別是用分隔一些成果的逗號。我嘗試了一堆東西，但一切似乎都讓它變得更加混亂。任何建議將不勝感激。

來源

2011-07-20 user808545

使用csv模塊（標準庫）來處理雙引號的水果用逗號在他們的名字：

import csv 
import io 

fruit_mess = [['watermelon,0,1.0\n'], ['apple,0,1.0\n'], ['"pineapple",0,1.0\n'], ['"strawberry, banana",0,1.0\n'], ['peach plum pear,0,1.0\n'], ['"orange, grape",0,1.0\n']] 

# flatten the list of lists into a string: 
data='\n'.join(item[0].strip() for item in fruit_mess)  
reader=csv.reader(io.BytesIO(data)) 
neat_fruit=[[fruit,int(num1),float(num2)] for fruit,num1,num2 in reader] 

print(neat_fruit)  
# [['watermelon', 0, 1.0], ['apple', 0, 1.0], ['pineapple', 0, 1.0], ['strawberry, banana', 0, 1.0], ['peach plum pear', 0, 1.0], ['orange, grape', 0, 1.0]]

來源

2011-07-20 13:21:30 unutbu

聰明。讓我想知道這是否是一個念念不忘的csv文件。 – Wilduck

這看起來不錯，但不幸的是我有python 2.5，它沒有io模塊 – user808545

@ user808545：在這種情況下，使用'cStringIO.StringIO'代替'io.BytesIO'。 – unutbu

基於正則表達式的解決方案：

>>> import re 
>>> regex = re.compile(r'("[^"]*"|[^,]*),(\d+),([\d.]+)') 
>>> neat_fruit = [] 
>>> for item in fruit_mess: 
...  match = regex.match(item[0]) 
...  result = [match.group(1).strip('"'), int(match.group(2)), float(match.group(3))] 
...  neat_fruit.append(result) 
... 
>>> neat_fruit 
[['watermelon', 0, 1.0], ['apple', 0, 1.0], ['pineapple', 0, 1.0], ['strawberry, 
banana', 0, 1.0], ['peach plum pear', 0, 1.0], ['orange, grape', 0, 1.0]]

來源

2011-07-20 13:23:55

嗯由於某種原因，這給了我結果= [match.group（1）.strip（'''），int（match.group（2）），float（match.group（3））] AttributeError：'NoneType '對象沒有屬性'組'，不知道我在做什麼錯誤 – user808545

這可能意味着匹配在其中一個字符串上失敗。正則表達式適用於你的問題中的示例數據，但如果你的其他格式實際的數據，正則表達式可能會失敗。 –

一個更簡單解決方案：

fruit_mess = [['watermelon,0,1.0\n'], ['apple,0,1.0\n'], ['"pineapple",0,1.0\n'], ['"strawberry, banana",0,1.0\n'], ['peach plum pear,0,1.0\n'], ['"orange, grape",0,1.0\n']] 
for i,x in enumerate(fruit_mess): 
    data = x[0].rstrip('\n').rsplit(',', 2) 
    fruit_mess[i] = [data[0], int(data[1]), float(data[2])]

來源

2011-07-20 13:54:13

清理嵌套列表

回答

相關問題