2016-11-12 46 views
0

我有一些數據,看起來像這樣:如何將dstat的輸出導入熊貓?

----system---- ---load-avg--- ----total-cpu-usage---- ------memory-usage----- -dsk/total- --io/total- ---paging-- -net/total- 
    date/time | 1m 5m 15m |usr sys idl wai hiq siq| used buff cach free| read writ| read writ| in out | recv send 
10-11 00:00:01|0.67 0.42 0.31| 2 0 98 0 0 0|25.0G 16.9M 6331M 189M|2101k 901k|30.4 28.3 | 63B 75B| 0  0 
10-11 00:00:03|0.67 0.42 0.31| 4 0 95 0 0 0|25.0G 16.9M 6332M 190M| 50k 1142k|4.00 18.0 | 0  0 | 310k 6765B 
10-11 00:00:05|0.62 0.41 0.31| 4 0 95 0 0 0|25.0G 16.9M 6333M 189M| 116k 2534k|3.50 113 | 0  0 | 484k 27k 
10-11 00:00:07|0.62 0.41 0.31| 7 1 92 0 0 0|25.0G 16.9M 6335M 187M| 154k 2372k|4.00 128 | 0  0 |1159k 24k 
10-11 00:00:09|0.62 0.41 0.31| 5 0 95 0 0 0|25.0G 16.9M 6336M 185M| 0 1556k| 0 38.5 | 0  0 | 396k 4172B 
10-11 00:00:11|0.73 0.44 0.32| 4 1 95 0 0 0|25.0G 16.9M 6336M 184M| 136k 2732k|3.50 139 | 0  0 | 270k 28k 

您可以dstat生成測試數據。

我想將其導入到數據幀像這樣(的Python 3.5.2,熊貓0.18.1):

date/time  1m 5m 15m usr sys idl wai hiq siq used buff cach free read writ read writ in out recv send 
10-11 00:00:01 0.67 0.42 0.31 2 0 98 0 0 0 25.0G 16.9M 6331M 189M 2101k 901k 30.4 28.3 63B 75B 0  0 
10-11 00:00:03 0.67 0.42 0.31 4 0 95 0 0 0 25.0G 16.9M 6332M 190M 50k 1142k 4.00 18.0  0  0 310k 6765B 
10-11 00:00:05 0.62 0.41 0.31 4 0 95 0 0 0 25.0G 16.9M 6333M 189M 116k 2534k 3.50 113  0  0 484k 27k 
10-11 00:00:07 0.62 0.41 0.31 7 1 92 0 0 0 25.0G 16.9M 6335M 187M 154k 2372k 4.00 128  0  0 1159k 24k 
10-11 00:00:09 0.62 0.41 0.31 5 0 95 0 0 0 25.0G 16.9M 6336M 185M 0 1556k 0 38.5  0  0 396k 4172B 
10-11 00:00:11 0.73 0.44 0.32 4 1 95 0 0 0 25.0G 16.9M 6336M 184M 136k 2732k 3.50 139  0  0 270k 28k 

這是我的表達,但不工作:

path='/opt/dstat.2016-11-10'  
dstat=pd.read_table(path,skiprows=1,header=0,sep=r"\|{\s}*|\s+") 

我不想編輯文本文件。

回答

2

試試這個:

import io 

fn = r'D:\temp\.data\data.fwf' 

with open(fn) as f: 
    data = f.read().replace('|', ' ') 

cols = 'date time 1m 5m 15m usr sys idl wai hiq siq used buff cach free ' \ 
     'dsk.read dsk.writ io.read io.writ in out recv send'.split() 
df = pd.read_csv(io.StringIO(data), delim_whitespace=True, skiprows=2, 
       header=None, names=cols) 


In [85]: df 
Out[85]: 
    date  time 1m 5m 15m usr sys idl wai hiq ...  cach free dsk.read dsk.writ io.read io.writ in out recv send 
0 10-11 00:00:01 0.67 0.42 0.31 2 0 98 0 0 ... 6331M 189M 2101k  901k 30.4 28.3 63B 75B  0  0 
1 10-11 00:00:03 0.67 0.42 0.31 4 0 95 0 0 ... 6332M 190M  50k 1142k  4.0 18.0 0 0 310k 6765B 
2 10-11 00:00:05 0.62 0.41 0.31 4 0 95 0 0 ... 6333M 189M  116k 2534k  3.5 113.0 0 0 484k 27k 
3 10-11 00:00:07 0.62 0.41 0.31 7 1 92 0 0 ... 6335M 187M  154k 2372k  4.0 128.0 0 0 1159k 24k 
4 10-11 00:00:09 0.62 0.41 0.31 5 0 95 0 0 ... 6336M 185M  0 1556k  0.0 38.5 0 0 396k 4172B 
5 10-11 00:00:11 0.73 0.44 0.32 4 1 95 0 0 ... 6336M 184M  136k 2732k  3.5 139.0 0 0 270k 28k 

[6 rows x 23 columns] 

PS IMO更妥善的解決辦法是使用pd.read_fwf(),並指定有colspecs參數,但我懶得爲此;-) ...