2017-10-04 58 views
2

有一個.csv文件,這樣我怎麼辦布爾切片的元組的數組中numpy的

vehicle,speed,datetime,x,y 
61C22276,0.0,1.4926212E9,106.33695,11.12652 
60C28912,0.0,1.4926212E9,106.84327166666667,10.90424 
51D06538,0.0,1.4926212E9,106.7806,10.765768333333334 
50LD08650,0.0,1.4926212E9,106.91705,10.746173333333333 
50LD08519,41.0,1.4926212E9,106.95493,10.739623333333334 
50LD07182,0.0,1.4926212E9,106.917225,10.746073333333333 

我通過

導入這些數據轉化爲numpy的
my_data = genfromtxt('data/2017-04-20.csv',names=True,delimiter=',') 

輸出是:

[(b'61C22276', 0., 1.49262120e+09, 106.33695 , 11.12652 ) 
(b'60C28912', 0., 1.49262120e+09, 106.84327167, 10.90424 ) 
(b'51D06538', 0., 1.49262120e+09, 106.7806 , 10.76576833) ..., 
(b'61C18919', 0., 1.49265726e+09, 106.77865833, 11.03690667) 
(b'61C18919', 0., 1.49265729e+09, 106.77865833, 11.03690667) 
(b'61C18919', 0., 1.49265732e+09, 106.77865833, 11.036905 )] 

這是一個元組數組(因爲我的數據由多個類型組成)

如何根據列的值對my_data進行分割? (例如:列出vehicle61C2226的所有行)

回答

2

您已獲得structured array。然後選擇 '行',這裏是這樣的:

boolindex=my_data['vehicle']=='50LD08519' 
selection=my_data[boolindex] 

#array([('50LD08519', 0.0, 1492621184.0, 106.91705322265625, 10.746172904968262), 
#  ('50LD08519', 41.0, 1492621184.0, 106.9549331665039, 10.739623069763184)], 
#  dtype=[('vehicle', '<U'), ('speed', '<f4'), ('datetime', '<f4'), 
#  ('x', '<f4'), ('y', '<f4')]) 

'熊貓' 給你更多kindy I/O和直觀的語法:

In [521]: my_data=pd.read_csv('data.csv') 

     vehicle speed  datetime x y 
0 61C22276  0 1,492,621,200 106 11 
1 60C28912  0 1,492,621,200 107 11 
2 51D06538  0 1,492,621,200 107 11 
3 50LD08519  0 1,492,621,200 107 11 
4 50LD08519  41 1,492,621,200 107 11 
5 50LD07182  0 1,492,621,200 107 11 

In [522]: my_data[my_data['vehicle']=='50LD08519'] 
Out[522]: 
    vehicle speed  datetime x y 
3 50LD08519  0 1,492,621,200 107 11 
4 50LD08519  41 1,492,621,200 107 11