2014-06-18 40 views
0

下面是myfile.csvCSV讀者從代表名單,其元素列/數組不是一個值

1st  2nd  3rd  4th      5th 
2061100 10638650 -8000  25   [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0] 
2061800 10639100 -8100  26   [2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0] 
2061150 10638750 -8250  25   [3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0] 
2061650 10639150 -8200  25   [4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0] 
2061350 10638800 -8250  3   [5.0, 5.0, 5.0] 
2060950 10638700 -8000  1   [1.0] 
2061700 10639100 -8100  11   [2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0] 
2061050 10638800 -8250  6   [3.0, 3.0, 3.0, 3.0, 3.0, 3.0] 
2061500 10639150 -8200  1   [4.0] 
2061250 10638850 -8150  16   [5.0, 5.0, 5.0, 5.0] 

我當前的代碼內容:

from numpy import genfromtxt 
mydata = genfromtxt('myfile.csv', delimiter=',') 
arr = np.array(mydata) 
col5 = arr[:,4] 

不過,我想讀第5列作爲列表,然後讀取列表中的所有元素以進一步計算。我該怎麼辦?

+0

檢查了這一點:http://stackoverflow.com/questions/20685567/ convert-python-string-to-list – Korem

+1

您顯示的'myfile.csv'與您正在閱讀的格式('delimiter =','')不匹配。如果實際文件在第1-4列以逗號分隔,那麼使用單獨的'numpy.genfromtxt'功能確定第5列的實際邊界時會出現問題。 –

回答

1

我想我會被誘惑,只是手動做到這一點:

with open(fn) as f: 
    header=next(f).strip() 
    print(header) 
    for row in f: 
     row=row.rstrip() 
     lp,_,rp=row.partition('[') 
     rp=rp.strip(']') 
     lp_data=list(map(int, lp.split())) 
     rp_data=list(map(float, rp.split(','))) 
     print(lp_data+[rp_data]) 

打印:

1st  2nd  3rd  4th      5th 
[2061100, 10638650, -8000, 25, [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0]] 
[2061800, 10639100, -8100, 26, [2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0]] 
[2061150, 10638750, -8250, 25, [3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0]] 
[2061650, 10639150, -8200, 25, [4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0]] 
[2061350, 10638800, -8250, 3, [5.0, 5.0, 5.0]] 
[2060950, 10638700, -8000, 1, [1.0]] 
[2061700, 10639100, -8100, 11, [2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0]] 
[2061050, 10638800, -8250, 6, [3.0, 3.0, 3.0, 3.0, 3.0, 3.0]] 
[2061500, 10639150, -8200, 1, [4.0]] 
[2061250, 10638850, -8150, 16, [5.0, 5.0, 5.0, 5.0]] 
1

Pandas可以讀取固定寬度的文件(相對於標籤/逗號分隔的文件)像你:

import pandas as pd 
import ast 

df = pd.read_fwf('test.txt', colspecs=[(41,100)])['5th']\ 
     .apply(lambda x: ast.literal_eval(x)) 

你得到:

>>> df 

0   [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0] 
1   [2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0] 
2   [3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0] 
3   [4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0] 
4        [5.0, 5.0, 5.0] 
5          [1.0] 
6 [2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0] 
7    [3.0, 3.0, 3.0, 3.0, 3.0, 3.0] 
8          [4.0] 
9      [5.0, 5.0, 5.0, 5.0] 
Name: 5th, dtype: object 
2

如果列之間的空白都是標籤:

import csv, ast, pprint 
result = list() 
with open('in.txt') as in_file: 
    reader = csv.reader(in_file, delimiter = '\t') 
    for line in reader: 
     line[:4] = map(int, line[:4]) 
     line[4] = ast.literal_eval(line[4]) 
     result.append(line)  

pprint.pprint(result) 

>>> 
[[2061100, 10638650, -8000, 25, [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0]], 
[2061800, 10639100, -8100, 26, [2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0]], 
[2061150, 10638750, -8250, 25, [3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0]], 
[2061650, 10639150, -8200, 25, [4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0]], 
[2061350, 10638800, -8250, 3, [5.0, 5.0, 5.0]], 
[2060950, 10638700, -8000, 1, [1.0]], 
[2061700, 10639100, -8100, 11, [2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0]], 
[2061050, 10638800, -8250, 6, [3.0, 3.0, 3.0, 3.0, 3.0, 3.0]], 
[2061500, 10639150, -8200, 1, [4.0]], 
[2061250, 10638850, -8150, 16, [5.0, 5.0, 5.0, 5.0]]] 
>>> 

關於這一主題的變化:

with open('in.txt') as in_file: 
    reader = csv.reader(in_file, delimiter = '\t') 
    result = [[ast.literal_eval(item) for item in line] for line in reader] 
相關問題