2017-08-28 69 views
0

下一個代碼計算輸入文件中每列的平均值。它一直運行,直到文件有nan的值扭曲平均值。Python - 在計算之前從文件中過濾行

這裏是我的代碼:

with open(biasfile, 'r') as f: 
    data = [map(float, line.split()) for line in f] 

num_rows = len(data) 
num_cols = len(data[0]) 

totals = num_cols * [0.0] 

for line in data: 
    for index in xrange(num_cols): 
     totals[index] += line[index] 

averages = [total/num_rows for total in totals] 
print averages 

這是文件的一部分:

22.7061 5.4303 
32.2040 5.4364 
22.9982 5.4426 
nan 5.4487 
nan 5.4548 
nan 5.4610 

這是輸出:

[nan, 3.1446607421875] 

我想忽略nan值和計算其餘值的平均值。我怎麼能這樣做?

+0

你應該定義簽出[pandas](https://pandas.pydata.org/pandas-docs/stable/index.html)和[numpy](https://docs.scipy.org/doc/numpy/ index.html) – Quickbeam2k1

回答

1

你可以使用Python列表解析來過濾數據:

with open('file.txt') as file: 
    data = [line.split() for line in file] 
    data = [item for item in data if 'nan' not in item] 
    data = [map(float, item) for item in data] 

totals = len(data[0]) * [0.0] 

for item in data: 
    for k, n in enumerate(item): 
     totals[k] += n 

print([total/len(data) for total in totals]) 

另一種方法:

with open('file.txt') as file: 
    data = [line.split() for line in file] 
    data = [item for item in data if 'nan' not in item] 
    data = [map(float, item) for item in data] 

print([sum(d[k] for d in data)/len(data) for k in range(len(data[0]))]) 
+0

謝謝!現在它似乎是工作! –

+0

如果我想添加其他條件但只在第二列中,我該怎麼辦?例如,當x> 2停止計數並忽略文件的其餘部分? @DanilSperansky –

+0

好吧,我通過在數據中加入條件來解決這個問題,而不是在平均計算中! –

0

不能使用數據幀API和做類似:

dataFrame.map(x => if (!x.isNaN) x).avg 
+0

爲什麼這麼複雜? [平均](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.mean.html)跳過默認012南 – Quickbeam2k1

+0

哦,是的,沒有看到!謝謝 – belka