根據具有重複值的向量選擇數據幀的行

我想要的可以描述爲：給一個數據框，包含所有的病例控制對。在以下示例中，y是病例控制對的ID。我的數據集中有3對。我正在對y的不同值進行重新採樣（該對將既被選中也不被選中）。根據具有重複值的向量選擇數據幀的行

sample_df = data.frame(x=1:6, y=c(1,1,2,2,3,3)) 
> sample_df 
    x y 
1 1 1 
2 2 1 
3 3 2 
4 4 2 
5 5 3 
6 6 3 
select_y = c(1,3,3) 
select_y 
> select_y 
[1] 1 3 3

現在，我已經計算出的載體含有欲重新取樣，這是上述的select_y成對。這意味着病例對照編號1將會出現在我的新樣本中，編號3也會出現在我的新樣本中，但會出現2次，因爲有兩個3.期望的輸出將爲：

我找不出比寫一個for循環其他的有效方式...

解決方案：基於@HubertL，經過一些修改，一個「矢量」的方法是這樣的：

sel_y <- as.data.frame(table(select_y)) 
> sel_y 
    select_y Freq 
1  1 1 
2  3 2 
sub_sample_df = sample_df[sample_df$y%in%select_y,] 
> sub_sample_df 
    x y 
1 1 1 
2 2 1 
5 5 3 
6 6 3 
match_freq = sel_y[match(sub_sample_df$y, sel_y$select_y),] 
> match_freq 
    select_y Freq 
1   1 1 
1.1  1 1 
2   3 2 
2.1  3 2 
sub_sample_df$Freq = match_freq$Freq 
rownames(sub_sample_df) = NULL 
sub_sample_df 
> sub_sample_df 
    x y Freq 
1 1 1 1 
2 2 1 1 
3 5 3 2 
4 6 3 2 
selected_rows = rep(1:nrow(sub_sample_df), sub_sample_df$Freq) 
> selected_rows 
[1] 1 2 3 3 4 4 
sub_sample_df[selected_rows,] 
    x y Freq 
1 1 1 1 
2 2 1 1 
3 5 3 2 
3.1 5 3 2 
4 6 3 2 
4.1 6 3 2

來源

2016-05-20 Jiang Du

做同樣沒有環的另一種方法：

sample_df = data.frame(x=1:6, y=c(1,1,2,2,3,3)) 

row_names <- split(1:nrow(sample_df),sample_df$y) 

select_y = c(1,3,3) 

row_num <- unlist(row_names[as.character(select_y)]) 

ans <- sample_df[row_num,]

來源

2016-05-21 01:59:24

我不得不說這是一個更好的解決方案。這是一個很好的利用分裂。 –

我找不到沒有循環的方法，但至少它不是for循環，並有每個頻率只有一個迭代：

sample_df = data.frame(x=1:6, y=c(1,1,2,2,3,3)) 
select_y = c(1,3,3) 
sel_y <- as.data.frame(table(select_y)) 
do.call(rbind, 
     lapply(1:max(sel_y$Freq), 
       function(freq) sample_df[sample_df$y %in% 
           sel_y[sel_y$Freq>=freq, "select_y"],])) 

    x y 
1 1 1 
2 2 1 
5 5 3 
6 6 3 
51 5 3 
61 6 3

來源

2016-05-20 22:42:01 HubertL

是的，就是類似於我在做什麼現在。對於大型數據集，速度依然關注。我在想一個更好的方法是創建一個頻率變量，它表示頻率對。例如，我可以首先決定對1和3需要選擇，然後我決定3需要選擇兩次，然後freq = rep（c（1,3），c（1,2）），類似於這個。 sample_df [freq，]將完成這項工作。但是，我沒有這樣做的有效方法。 –

查看我的編輯@JiangDu，根據你的提示，這應該是更快的方式 – HubertL

，我想我想出了一個更好的。往上看。謝謝，一個團隊的工作。 @HubertL –

根據具有重複值的向量選擇數據幀的行

回答

相關問題