2017-03-27 38 views
3

我有一個如下所示的DataFrame。假設這些是銷售人員的銷售額。將查找表應用於DataFrame的垃圾箱或範圍

enter image description here

另外,我有一個由金額包含佣金的查找表。這看起來像下面。所以,$ 0 - $ 50,000 = 5%,$ 50,001- $ 250,000個= 4%,等等

enter image description here

我想要做的就是應用查找表的銷售表產生以下數據幀。

enter image description here

嘗試1:

In [66]: a 
Out[66]: 
    Sales_1 Sales_2 Sales_3 
0 200000 300000 100000 
1 100000 500000 500000 
2 400000 1000000 200000 

In [67]: b 
Out[67]: 
      Commission 
Sales     
50000    0.05 
250000   0.04 
750000   0.03 
9999999999  0.02 

In [68]: c = b['Commission'][a <= b.index.values] 
Traceback (most recent call last): 

    File "<ipython-input-68-d229bce29f01>", line 1, in <module> 
    c = b['Commission'][a <= b.index.values] 

    File "C:\WinPython64bit\python-3.5.2.amd64\lib\site-packages\pandas\core\ops.py", line 1184, in f 
    res = self._combine_const(other, func, raise_on_error=False) 

    File "C:\WinPython64bit\python-3.5.2.amd64\lib\site-packages\pandas\core\frame.py", line 3555, in _combine_const 
    raise_on_error=raise_on_error) 

    File "C:\WinPython64bit\python-3.5.2.amd64\lib\site-packages\pandas\core\internals.py", line 2911, in eval 
    return self.apply('eval', **kwargs) 

    File "C:\WinPython64bit\python-3.5.2.amd64\lib\site-packages\pandas\core\internals.py", line 2890, in apply 
    applied = getattr(b, f)(**kwargs) 

    File "C:\WinPython64bit\python-3.5.2.amd64\lib\site-packages\pandas\core\internals.py", line 1132, in eval 
    result = get_result(other) 

    File "C:\WinPython64bit\python-3.5.2.amd64\lib\site-packages\pandas\core\internals.py", line 1103, in get_result 
    result = func(values, other) 

ValueError: operands could not be broadcast together with shapes (3,3) (4,) 

嘗試2:

In [59]: a 
Out[59]: 
    Sales_1 Sales_2 Sales_3 
0 200000 300000 100000 
1 100000 500000 500000 
2 400000 1000000 200000 

In [60]: b 
Out[60]: 
      Commission 
Sales     
50000    0.05 
250000   0.04 
750000   0.03 
9999999999  0.02 

In [61]: c = b.lookup(a['Sales_1'],['Commission']) 
Traceback (most recent call last): 

    File "<ipython-input-61-99e8134e826c>", line 1, in <module> 
    c = b.lookup(a['Sales_1'],['Commission']) 

    File "C:\WinPython64bit\python-3.5.2.amd64\lib\site-packages\pandas\core\frame.py", line 2649, in lookup 
    raise ValueError('Row labels must have same size as column labels') 

ValueError: Row labels must have same size as column labels 

誰能幫我申請一個查找表的數據幀?它不必完全像這樣,但是這說明了我的一般需求。

回答

8

要使用範圍,pd.cut是你的朋友。根據您目前的b數據框,您只需修改您作爲參數傳遞的bin列表以定義最低範圍。在這裏,我把0負的銷售不存在,但你可以把任何負數太多,如果需要的話,甚至處理-np.infnp.inf,而不是1E14您的下限和上限:

pd.cut(a.stack(), [0] + b.Sales.tolist(), labels=b.Commission).unstack() 
Out[39]: 
    Sales_1 Sales_2 Sales_3 
0 0.04 0.03 0.04 
1 0.04 0.03 0.03 
2 0.03 0.02 0.04 

我發現b像下面成爲然後

  Sales Commission 
0   -inf   NaN 
1   50000  0.05 
2  250000  0.04 
3  750000  0.03 
4   inf  0.02 

參數:

pd.cut(a.stack(), b.Sales, labels=b.Commission[1:]).unstack() 
+0

pd.cut'的'大用途! +1 :-) – pansen

+0

謝謝。 PS:我真的會重新安排'b'來達到我可以通過b。銷售額和'b.Commission [1:]'以獲得更好的清晰度 – Boud

+0

'b'應該如何安排@Boud?我很樂意讓這個更好/更容易/更清晰。 – Kyle

2

@Boud ALR更清晰與切割而使用伊迪把這一個打到了公園外面。但這裏是我使用searchsorted

pd.DataFrame(
    b.Commission.values[ 
     b.index.values.searchsorted(a.values.ravel()) 
    ].reshape(a.values.shape), 
    a.index, a.columns) 

    Sales_1 Sales_2 Sales_3 
0  0.04  0.03  0.04 
1  0.04  0.03  0.03 
2  0.03  0.02  0.04 

pandas

使用pd.merge_asof
我會stacka以及也移邊界定義需要

numpy

a_ = a.stack().sort_values().to_frame('Sales') 
b_ = pd.DataFrame(dict(
     Sales=np.append(0, b.index[:-1]), 
     Commissions=b.Commission.values 
    )) 

print(a_) 
print() 
print(b_) 

      Sales 
0 Sales_3 100000 
1 Sales_1 100000 
0 Sales_1 200000 
2 Sales_3 200000 
0 Sales_2 300000 
2 Sales_1 400000 
1 Sales_2 500000 
    Sales_3 500000 
2 Sales_2 1000000 

    Commissions Sales 
0   0.05  0 
1   0.04 50000 
2   0.03 250000 
3   0.02 750000 

現在我們可以使用pd.merge_asof

pd.merge_asof(a_, b_).set_index(a_.index).Commissions.unstack() 

    Sales_1 Sales_2 Sales_3 
0  0.04  0.03  0.04 
1  0.04  0.03  0.03 
2  0.03  0.02  0.04 

天真的時間測試

enter image description here

+0

是的,numpy真的很快;) – jezrael

+0

:D - 謝謝;) – jezrael