對熊貓數據框中的單元格進行平均處理

由於難以解釋的原因，我想要平均使用隨機值稀疏填充的熊貓數據框中的單元塊。數據框將始終有sqrt（列數×索引數）值 - 其餘全部爲NaN。這些值大致是均勻分佈的，所以如果我平均分配正確大小的單元格塊，我希望在每個塊內都有一個值。對熊貓數據框中的單元格進行平均處理

這是我的例子。對於100列和100個索引，我有100個值隨機分佈在整個數據框中。我預計每個10x10區塊會有〜1個值，其他都是NaN。我怎樣才能把各10×10塊爲一個單元格（內它平均的10列，10個指數，和值（S）

我的代碼：？

import pandas as pd 
import numpy as np 
import math 

number_of_planes = 100 

thicknesses = np.empty(number_of_planes) 
cos_thetas = np.empty(number_of_planes) 
phis = np.empty(number_of_planes) 
for i in range(0,number_of_planes): 
    r = 1 
    phi = np.random.uniform(0,2*math.pi) 
    theta = math.acos(2*np.random.uniform(0.5,1) - 1) 
    thickness = np.random.uniform(0,0.4) 

    phis[i] = phi 
    cos_thetas[i] = math.cos(theta) 
    thicknesses[i] = thickness 

thick_df = pd.DataFrame(columns=phis, index=cos_thetas) 

for i in range(0, len(thicknesses)): 
    thick_df.set_value(cos_thetas[i], phis[i], thicknesses[i], takeable=False) 

thick_df = thick_df.sort_index(axis=0, ascending=False) 
thick_df = thick_df.sort_index(axis=1)

來源

2016-12-07 Arnold

IIUC你可以重塑成一個四維陣列分裂每個軸爲沿着第二和第四軸線忽略NaNs與np.nanmean長度sqrt(len of each axis)和計算平均的兩個軸 -

arr = thick_df.values.astype(float) 
n = int(np.sqrt(number_of_planes)) 

out = np.nanmean(arr.reshape(n,n,n,n),axis=(1,3)) 

indx = thick_df.index.values.reshape(-1,n).mean(1) 
coln = thick_df.columns.values.reshape(-1,n).mean(1) 
df_out = pd.DataFrame(out, index=indx, columns= coln)

樣品運行 -

In [174]: thick_df # number_of_planes = 4 
Out[174]: 
      4.550477 5.138694 5.411510 6.123163 
0.981987  NaN  NaN 0.393233  NaN 
0.565861 0.186647  NaN  NaN  NaN 
0.193190  NaN  NaN  NaN 0.11626 
0.088382  NaN 0.166189  NaN  NaN 

In [175]: df_out 
Out[175]: 
      4.844586 5.767337 
0.773924 0.186647 0.393233 
0.140786 0.166189 0.116260

來源

2016-12-07 08:02:19 Divakar

好吧，我想我明白，但是當我試圖把它應用到我的代碼沒有奏效。我需要更改哪部分以便爲我的100x100數據幀進行這項工作？ – Arnold

@Rebecca你能否詳細說明沒有工作的部分？那裏的NaNs或價值不匹配？應該沒有任何變化的工作。 – Divakar

我得到運行時警告說「空片的平均值」和結果數據幀看起來沒有變化。 – Arnold

m, n = 10, 10 
row_groups = np.arange(len(thick_df.index)) // m 
col_groups = np.arange(len(thick_df.columns)) // n 

grpd = pd.DataFrame(thick_df.values, row_groups, col_groups) 

val = pd.to_numeric(grpd.stack(), 'coerce').groupby(level=[0, 1]).mean().unstack().values 
idx = thick_df.index.to_series().groupby(row_groups).mean().values 
col = thick_df.columns.to_series().groupby(col_groups).mean().values 

pd.DataFrame(val, idx, col)

來源

2016-12-07 08:29:55 piRSquared

與上述解決方案相同的問題 - 我需要列和索引標籤作爲其先前值的平均值！ – Arnold

@Rebecca我已更新我的帖子 – piRSquared

完美，謝謝！ – Arnold

對熊貓數據框中的單元格進行平均處理

回答

相關問題