2017-04-10 48 views
1

如何壓扁pandas數據框中的兩列?壓扁熊貓數據框中的列並參考返回原始

Task 1 : 

company-asset company-debt wealth 
    GOLD   SILVER  2000.0 
    BRONZE  IRON  4000.0 
    IRON   GOLD  1500.0 

現在我想(其中資產+債務爲負)

GOLD SILVER BRONZE IRON 
500 -2000 4000 -2500 

Task 2: 

Now i want to get the original dataframe with rows where value of 
the columns in dataframe 2 is greater than -1000 and less than +1000. 
So in the case above it will only be GOLD therefore we get this DF 

company-asset company-debt wealth 
    GOLD   SILVER  2000.0 
    IRON   GOLD  1500.0 

回答

4

試試這個:

s = (df.set_index('wealth').stack() 
     .rename('metal') 
     .rename_axis(('wealth', 'type')) 
     .reset_index() 
     .pipe(lambda l: l.assign(wealth=l.wealth.where(l.type.str.endswith('asset'), 
                 -l.wealth))) 
     .groupby('metal').wealth.sum()) 
​ 
s 
#metal 
#BRONZE 4000.0 
#GOLD  500.0 
#IRON  -2500.0 
#SILVER -2000.0 
#Name: wealth, dtype: float64 

metals = s[(s > -1000) & (s < 1000)].index 
df[df['company-asset'].isin(metals) | df['company-debt'].isin(metals)] 

# company-asset company-debt wealth 
#0   GOLD   SILVER 2000.0 
#2   IRON   GOLD 1500.0 
+1

*哨子*這是令人印象深刻。 –

1

我不知道你的第一問題是。

這裏是答案的第二個問題

import numpy as np 
import pandas as pd 
dd = np.array([['GOLD', 'SILVER',2000.0],['BRONZE', 'IRON', 4000.0], ['IRON', 'GOLD', 1500.0]]) 
col = ['company-asset', 'company-debt', 'wealth'] 
a = pd.DataFrame(data = dd,columns = col) 
for i in range (3): 
    a.loc[i][2] = float(a.loc[i][2]) 
a[(a['wealth']>-1000) & (a['wealth'] < 4000)] 

這是輸出

Out[1]: 
    company-asset company-debt wealth 
0   GOLD  SILVER 2000 
2   IRON   GOLD 1500