無法子集df.apply

我有一個數據幀中數據幀，姑且稱之爲trim_df，由USER_ID索引像這樣：無法子集df.apply

  d_timestamp_dt    flagged 
user_id           
1234567890  2015-04-30     False 
0987654321  2015-04-30     False

我試圖創建一個使用df.apply的「ACCUM」變量（），像這樣：

df['new_col'] = df.apply(lambda row: my_func(row, time_period1), axis=1)

這裏是my_func，並將是如何定義的？意見表明，當我運行應用（）什麼執行：

def my_func(row, time_period): 
    print type(row) # <class 'pandas.core.series.Series'> 

    user_id   = row['user_id'] # 123456789 
    row_time  = row['d_timestamp_dt'] # 2015-04-16 23:05:00 
    user_rows  = trim_df.loc[user_id] 
    print type(user_rows) # <class 'pandas.core.series.Series'> WHY??? shouldn't it be a DataFrame? 

    user_rows_of_interest = user_rows[((user_rows['flagged'] == True) & 
             ((row_time - user_rows['d_timestamp_dt']) > time_period0) & 
             ((row_time - user_rows['d_timestamp_dt']) < time_period))] 
    print type(user_rows_of_interest) # <class 'pandas.tslib.Timestamp'> ...expecting this to be a DataFrame 
    return len(user_rows_of_interest) # breaks, because Timestamp doesn't have len()

真正令我困惑的是，當我嘗試單步執行函數（不使用apply）時，我得到了我期望的DataFrame，即不是Series，然後是Timestamp。真的很感謝任何有關正在發生的事情！

來源

2015-04-30 Natalie Arellano

time_period1 =？另外，你的函數依賴於全局的time_period0。 – Alexander

time_period1被定義爲datetime.timedelta（天數= 1）。另外，我遺漏了一些非常重要的東西 - 我應用此函數的數據框df與trim_df不同，即它具有user_id列，並且由row_id而不是user_id索引。 –

我相信你需要將user_id設置爲行的索引值。 lambda expressio將DataFrame的每一行作爲一個Series傳遞，並且DataFrame沒有'user_id'作爲列（它是索引列）。

user_id = row.index

來源

2015-04-30 22:43:01 Alexander

看來trim_df.loc [user_id]是罪魁禍首......它正在返回一個Series。我不太清楚爲什麼，也許是因爲user_id（儘管索引）不是唯一的？ trim_df.loc [trim_df.index == user_id]似乎正在工作。

來源

2015-05-01 03:05:36

無法子集df.apply

回答

相關問題