如何在不使用for-loop的情況下基於來自另一個Dataframe的值來分割pandas DataFrame？

我有一個數據幀df1：如何在不使用for-loop的情況下基於來自另一個Dataframe的值來分割pandas DataFrame？

df1.head() =  
      id type position 
dates 
2000-01-03 17378 600  400 
2000-01-03 4203 600  150 
2000-01-03 18321 600  5000 
2000-01-03 6158 600  1000 
2000-01-03 886 600  10000 
2000-01-03 17127 600  800 
2000-01-03 18317 1300  110 
2000-01-03 5536 600  207 
2000-01-03 5132 600  20000 
2000-01-03 18191 600  2000

和第二數據幀df2：

df2.head() = 

       dt_f  dt_l 
id_y id_x 
670 715 2000-02-14 2003-09-30 
704 2963 2000-02-11 2004-01-13 
886 18350 2000-02-09 2001-09-24 
1451 18159 2005-11-14 2007-03-06 
2175 8648 2007-02-28 2007-09-19 
2236 18321 2001-04-05 2002-07-02 
2283 2352 2007-03-07 2007-09-19 
     6694 2007-03-07 2007-09-17 
     13865 2007-04-19 2007-09-19 
     14348 2007-08-10 2007-09-19 
     15415 2007-03-07 2007-09-19 
2300 2963 2001-05-30 2007-09-26

我需要切片df1用於id_x每個值，間隔dt_f:dt_l內計數的行數。對於id_y的值，這必須再次完成。最後的結果必須在df2被合併，得到作爲輸出以下數據幀：

df_result.head() = 

       dt_f  dt_l  n_x n_y 
id_y id_x 
670 715 2000-02-14 2003-09-30 8  10 
704 2963 2000-02-11 2004-01-13 13 25 
886 18350 2000-02-09 2001-09-24 32 75 
1451 18159 2005-11-14 2007-03-06 48 6

n_x哪裏（n_y）對應於包含在用於id_x(id_y各值）的間隔dt_f:dt_l的行數。

下面是for循環我用：

idx_list = df2.index.tolist() 
k = 1 
for j in idx_list: 
    n_y = df1[df1.id == j[0]][df2['dt_f'].iloc[k]:df2['dt_l'].iloc[k]]['id'].count() 
    n_x = df1[df1.id == j[1]][df2['dt_f'].iloc[k]:df2['dt_l'].iloc[k]]['id'].count()

有沒有可能做到這一點，而無需使用一個for循環？ DataFrame df1包含大約30000行，恐怕一個循環會減慢過程的速度，因爲這是整個腳本的一小部分。

來源

2016-06-10 Michael_O

爲什麼'n_y'與'n_x'不同？你能告訴我們你的「for」循環嗎？ – IanS

你滿意目前的答案嗎？作爲一個附註，你應該檢查如何發佈[MCVE]。理想情況下，你的輸入將導致你想要的輸出，這將使人們更容易檢查他們的答案（並且理解問題）。 – IanS

感謝IanS發表您的評論 –

你想是這樣的：

#Merge the tables together - making sure we keep the index column 
mg = df1.reset_index().merge(df2, left_on = 'id', right_on = 'id_x') 

#Select only the rows that are within the start and end 
mg = mg[(mg['index'] > mg['dt_f']) & (mg['index'] < mg['dt_l'])] 

#Finally count by id_x 
mg.groupby('id_x').count()

你需要事後收拾列並重復id_y。

來源

2016-06-10 12:27:34 Matthew

非常感謝！它完美的工作！（我使用mg ['dates']而不是mg ['index']來重新設置df和and的索引，從而將其調整爲我的代碼） –

如何在不使用for-loop的情況下基於來自另一個Dataframe的值來分割pandas DataFrame？

回答

相關問題