2017-03-01 60 views
-1
子集

例如:我怎麼能在大熊貓由多個條件

Cat INVOICE_REF_NUMBER OPEN_ITEM_AMOUNT(Netted Amt) AMOUNT_ COLLECTED(Original Amt) COMPANY_CODE OPERATING_UNIT count 
invoice 0992541158 115606.38 578031.91 4380 6238 2 
payment 0992541158 0   -462425.53 4380 6238 2 
invoice 0090010917 1519   87803.4 2700 4315 2 
payment 0090010917 0   -86284.4 2700 4315 2 
invoice 0090007022 2039.55  13517  2700 4315 2 

我需要獨立的5日線作爲它不具有任何付款, -

+1

你到目前爲止嘗試過什麼? – DyZ

+0

我在excel中做了基於「cat」的countifs,並且如果任何密鑰只有發票和付款都需要,所以需要在python中執行 –

+0

您可能需要說明您想要執行的操作。但你總是可以這樣做'df2 = df1 [df1 ['Column_Name'] =='Condition']''。對於多個條件,你應該使用'〜'來代替OR,'&'使用OR和'&' – MattR

回答

0

開始通過分組所涉及相同的所有行發票。有組合狀態將根據發票是否已經納了或沒有不同:

status = df.groupby("INVOICE_REF_NUMBER")['Cat'].sum() 
#INVOICE_REF_NUMBER 
#0090007022   invoice 
#0090010917 invoicepayment 
#0992541158 invoicepayment 
#Name: Cat, dtype: object 

現在,unpayed發票提取原線路:

unpayed = df.join(status[status=='invoice'], rsuffix='_', how='right', 
        on='INVOICE_REF_NUMBER') 
#  Cat INVOICE_REF_NUMBER OPEN_ITEM_AMOUNT(Netted Amt)  Cat_ 
#4 invoice   0090007022      2039.55 invoice 

您可以刪除重複的「Cat_」列,如果你想:

del unpayed['Cat_'] 
#  Cat INVOICE_REF_NUMBER OPEN_ITEM_AMOUNT(Netted Amt) 
#4 invoice   0090007022      2039.55 
0

這是我最大的努力:

# Assume nothing has a payment 
df['payment_count'] = 0 

# For each invoice, count the related payments by applying 
# a lambda function on each row (hence the axis=1) 
df.loc[df.Cat=='invoice', 'payment_count'] =  
    df.loc[df.Cat=='invoice'].apply(lambda x: \  
    df.loc[(df['INVOICE_REF_NUMBER']==x['INVOICE_REF_NUMBER']) \ 
    & df.Cat=='payment')], 'Cat').count(), axis=1) 

# Filter on the invoices without payments 
print((df[df.Cat=='invoice') & (df.payment_count==0)])