操作

2016-11-04 34 views
2

我用多指標數據框中工作,要做到這一點,我掙扎着幾個操作:操作

一)我想不使用對幾個操作應用到列表(逐元素)循環

b)我想提取我的DataFrame的索引值並比較這些值;之前,他們已經從對象被轉換爲整數或浮點數

c)中我要比較的數據幀(內的值,而不依賴於該比較

的值使用用於循環)和選擇值從任一列= ================================================== =====================

import pandas as pd 
import numpy as np 

idx = pd.IndexSlice 
ix = pd.MultiIndex.from_product(
    [['2015', '2016', '2017', '2018'], 
    ['2016', '2017', '2018', '2019', '2020'], 
    ['A', 'B', 'C']], 
    names=['SimulationStart', 'ProjectionPeriod', 'Group'] 
) 

df = pd.DataFrame(np.random.randn(60, 1), index=ix, columns=['Origin']) 
origin = df.loc[idx[:, :, :], 'Origin'].values 

increase_over_base_percent = 0.3 
increase_over_base_abs = 10 
abs_level = 1 
min_increase = 0.001 

'Is there a way to do this comparison without using for loops?' 
# The truth value of an array with more than one element is ambiguous. Use a.any() or a.all() 
change = pd.Series(np.nan) 
i = 0 
for element in origin: 
    change[i] = max(
     min(element * (1 + increase_over_base_percent), 
      element + increase_over_base_abs, 
      abs_level), 
     element + min_increase) 
    i += 1 

print(change) 


# Write results to a new column in the DataFrame ('Change') 
df.loc[idx[:, :, :], 'Change'] = change 

# Add data on 'Group' level 
group_qualifier = [0, 0, 1] 

# Is there a way to apply the group_qualifier to the group level without having to slice each index? 
# Note: the formula does not work yet (results are to be reported in a new column of the DataFrame) 
df.loc[idx[:], 'GroupQA'] = group_qualifier 

'This is the part I am struggling with most (my index values are objects, not integers or floats;' 
'and the comparison of values within the DataFrame does not work either)' 
# Create new column 'Selected'; use origin values for all combinations where 
# projectionPeriod < simulationStart & group_qualifier value == 0; 
# use change values for all other combinations 
values = df.index.get_level_values 
mask = (values('ProjectionPeriod') - values('SimulationStart')) <= 1 
mask = mask * df.loc[idx[:], 'GroupQA'].values 
selected = df.loc[mask] 
df.loc[idx[:, :, :], 'Selected'] = selected 
+0

關於(a),我沒有看到你想避免的for循環。 – IanS

+0

@IanS - 抱歉讓我感到困惑。我編輯了代碼以反映我正在討論的for循環。 – Andreas

+0

感謝您接受我的回答。你需要其他物品的幫助嗎? – IanS

回答

2

用於部分答案一):

df['Change'] = pd.concat([ 
    pd.concat([ 
     df.loc[:, 'Origin'] * (1 + increase_over_base_percent), 
     df.loc[:, 'Origin'] + increase_over_base_abs, 
    ], axis=1).min(axis=1).clip(upper=abs_level), 
    df.loc[:, 'Origin'] + min_increase 
], axis=1).max(axis=1) 

的想法是我們熊貓'minmax直接在Origin系列上起作用(稍加扭曲,使用clipabs_level)。

由於熊貓操作保留了索引,因此可以直接將結果分配給列。


編輯:如果你願意,你可以使用combine方法在this question結束解釋。