0
我有一個包含密度值的DataFrame。我希望按'小時'值進行分組,將密度分類,然後在我的原始df中添加一個新列,其中包含倉位編號。然而,這是失敗的,:加入或合併在分組的熊貓數據框上計算的值
df = pd.DataFrame({
'hours': np.random.randint(0, 24, 10000),
'density' : np.random.sample(10000)})
def func(df):
""""calculates equal intervals of a series or array"""
intervals = pysal.esda.mapclassify.Equal_Interval(df.density, 5)
# yb is an ndarray containing the bin indices, 0 - 4 in this case
return intervals.yb
df['bins'] = df.groupby(df.hours).transform(func)
給人AssertionError: length of join_axes must not be equal to 0
如果我只是組中的對象和適用的間隔功能,它看起來像這樣:
grp = df.groupby(df.hours).apply(func)
grp
Out[106]:
hours
0 [2, 4, 3, 4, 0, 4, 2, 2, 0, 1, 0, 0, 2, 2, 0, ...
1 [4, 1, 0, 4, 0, 2, 2, 3, 2, 3, 0, 3, 4, 3, 2, ...
2 [4, 1, 0, 2, 3, 4, 1, 1, 0, 3, 4, 4, 2, 4, 0, ...
3 [3, 0, 0, 4, 0, 0, 0, 1, 2, 2, 0, 2, 2, 2, 1, ...
4 [0, 1, 1, 2, 1, 3, 1, 3, 2, 2, 1, 4, 0, 4, 2, ...
5 [2, 0, 2, 1, 3, 1, 1, 0, 4, 4, 2, 1, 4, 1, 2, ...
6 [1, 2, 3, 3, 3, 2, 4, 1, 2, 1, 2, 0, 3, 2, 0, ...
7 [3, 0, 3, 1, 3, 1, 2, 1, 4, 2, 1, 2, 1, 1, 1, ...
8 [0, 1, 4, 3, 0, 1, 0, 0, 1, 0, 2, 1, 0, 1, 1, ...
9 [4, 2, 0, 4, 1, 3, 2, 3, 4, 1, 1, 4, 4, 4, 4, ...
10 [4, 4, 3, 3, 1, 2, 3, 0, 2, 4, 2, 4, 0, 2, 2, ...
11 [0, 1, 3, 0, 1, 1, 1, 1, 2, 1, 2, 0, 3, 3, 4, ...
12 [3, 1, 1, 0, 4, 4, 3, 0, 1, 2, 1, 1, 4, 2, 0, ...
13 [1, 1, 0, 2, 0, 1, 4, 1, 2, 2, 3, 1, 2, 0, 3, ...
14 [2, 4, 0, 2, 1, 2, 0, 4, 4, 2, 3, 4, 2, 1, 1, ...
15 [2, 4, 3, 4, 1, 0, 3, 1, 2, 0, 3, 4, 2, 2, 3, ...
16 [0, 4, 2, 3, 3, 4, 0, 3, 2, 0, 1, 0, 0, 2, 0, ...
17 [3, 1, 4, 4, 0, 4, 1, 0, 4, 3, 3, 2, 3, 1, 4, ...
18 [4, 3, 0, 2, 4, 2, 2, 0, 2, 2, 1, 2, 1, 0, 1, ...
19 [3, 0, 3, 1, 1, 0, 1, 1, 3, 3, 2, 3, 4, 0, 0, ...
20 [3, 0, 1, 4, 0, 0, 4, 2, 4, 2, 2, 0, 4, 0, 0, ...
21 [4, 2, 3, 3, 1, 2, 0, 4, 2, 0, 2, 2, 1, 2, 2, ...
22 [0, 4, 1, 1, 3, 1, 4, 1, 3, 4, 4, 0, 4, 4, 4, ...
23 [4, 1, 2, 0, 2, 0, 0, 0, 2, 3, 1, 1, 3, 0, 1, ...
dtype: object
是否有加入的標準方式或合併從分組對象計算的值,或者我應該使用transform
的不同?
我沒有'pysal',但你應該能夠返回一個'pd.Series'並有更好的運氣。 '返回pd.Series(intervals.yb)'。 – Justin
@Justin給了我'ValueError:無法從形狀(431)廣播輸入數組到形狀(431,2)'(431是'0'組中的數值個數) – urschrei
嘗試在這樣的列上進行轉換 - df ['bins'] = df.groupby(df.hours).density.transform(func) – user1827356