2015-06-07 188 views
1

數字字符串分拆名單線讀我有一個具有以下格式的數字字符串值多行csv文件:2行在蟒蛇

CSV樣本:

[['ASA00211063', '2005'], [-0.434358, -0.793407, -1.070576, nan, nan,...(365 values)], [0.354615, -0.108102,nan,...(365 values)]]

[['AFR02516075', '1998'], [-0.434358, -0.7934039, -1.0705767, nan, nan,...(365 values)], [0.3546153, -0.1081022, nan,...(365 values)]]

哪有我CSV文件分割以及加入到列表中,這樣放出來是:

list[0] = ['ASA00211063', '2005'], ['AFR02516075', '1998']... 
list[1] = [-0.434358, -0.793407, -1.070576, nan, nan,..., 0.354615, -0.108102,nan,...(**730** values)] 
list[2] = [-0.434358, -0.7934039, -1.0705767, nan, nan,..., 0.3546153, -0.1081022, nan,...(**730** values)] 
+0

確實的CSV包含'[['和']]'的符號? –

+0

是的,它確實有[[和]]符號並被視爲字符串 – ASG

回答

0

我覺得我滿意這個代碼的要求:

#!/usr/bin/python 

import re 

data = [[]] 

for line in open('in'): 
    line = line.strip() 
    line = re.match(r'\[?(.*)\]', line).group(1) 

    res = re.split(r', (?=\[)', line) 

    data[0].append(res[0]) 
    string = res[1] + res[2] 
    data.append([string]) 

for i, v in enumerate(data): 
    print("{}\n".format(data[i])) 

輸入:

[['ASA00211063', '2005'], [-0.434358, -0.793407, -1.070576, nan, nan,...(365 values)], [0.354615, -0.108102,nan,...(365 values)]] 
[['AFR02516075', '1998'], [-0.434358, -0.7934039, -1.0705767, nan, nan,...(365 values)], [0.3546153, -0.1081022, nan,...(365 values)]] 
[['XXX02516075', '1998'], [-1.434358, -1.7934039, -1.1705767, nan, nan,...(365 values)], [0.7546153, -0.7081022, nan,...(365 values)]] 

輸出:

data[0]: 
["['ASA00211063', '2005']", "['AFR02516075', '1998']", "['XXX02516075', '1998']"] 

data[1]: 
['[-0.434358, -0.793407, -1.070576, nan, nan,...(365 values)][0.354615, -0.108102,nan,...(365 values)]'] 

data[2]: 
['[-0.434358, -0.7934039, -1.0705767, nan, nan,...(365 values)][0.3546153, -0.1081022, nan,...(365 values)]'] 

data[3]: 
['[-1.434358, -1.7934039, -1.1705767, nan, nan,...(365 values)][0.7546153, -0.7081022, nan,...(365 values)]'] 
+0

謝謝@Stevieb。我會嘗試這段代碼,以便學習如何更好地使用正則表達式,因爲我一直在努力應對這些問題。非常感謝 – ASG

0

要讀取一個文本文件中的Python的結構總是使用ast.literal_eval()它只會在Python閱讀結構並防止任何人在輸入文件中嵌入任何令人討厭的內容。

此代碼將遍歷輸入文件中的每一行,並將其附加到列表中,從中可以決定要做什麼。

import ast 

l = [] 
for line in open('inputfile.txt'): 
    edited_line = line.replace('nan','"nan"') 
    l.append(ast.literal_eval(edited_line)) 

這也將與numpy.nan對象替換所有nan

import ast 
from numpy import nan 

l = [] 
for line in open('inputfile.txt'): 
    edited_line = line.replace('nan','"nan"') 
    edited_line = ast.literal_eval(edited_line) 
    edited_line = [[nan if v == 'nan' else v for v in vals] for vals in edited_line] 
    l.append(edited_line) 

# combine elements [1] and [2] in the sublist to a list of len = 730 
# element l[0] is list of ['code', 'yyyy'] 
# element l[1 ... n] is list of data by row of length 730 
l = [[subl[0] for subl in l]] + [subl[1]+subl[2] for subl in l] 

給輸出:

for row in l: print row 
>>> [['ASA00211063', '2005'], ['AFR02516075', '1998']] 
    [-0.434358, -0.793407, -1.070576, nan, nan, 0.354615, -0.108102, nan] 
    [-0.434358, -0.7934039, -1.0705767, nan, nan, 0.3546153, -0.1081022, nan] 
+0

感謝您的指導。我正在跟蹤TypeError。任何進一步的指導。 TypeError:literal_eval()只需要1個參數(0給出) – ASG

+0

@ASG立即嘗試... –

+0

再次感謝。是的,我收到一個「ValueError:格式不正確的字符串」,我懷疑它可能是我的字符串中的南部 – ASG