2015-07-28 45 views
2

我確實有一個數據集N我想加入參考表REF。問題是,在數據集中,我沒有合適的主鍵。我的想法是用周圍的工作來承認它的缺點。因此,我將使用數字變量來查找近似匹配並將其加入數據集。 我試過Merging two datasets on approximate values並試圖適應它但失敗。棘手位似乎數據,並在類似1倍的值的參考表中隨機選擇:用參考表中的隨機選擇近似加入

N <- data.table(NR = c("999", "999", "999", "999", "999", "999", "999", "999", "999", "999", "999", "999", "999", "999", "999"), 
    year = c("2012", "2012", "2012", "2012", "2012", "2012", "2012", "2012", "2012", "2012", "2012", "2012", "2012", "2012", "2012"), 
    los = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1)) 

REF <- data.table(nr =c("A60D", "A91Z", "B70H", "B78C", "E64D", "F49F", "I66E", "I68E", "J68Z", "K63C", "L70A", "L70B", "L71Z", "O64B", "P60A", "P60C", "R65A", "R65B", "S60Z", "U60A", "U60B", "W60Z", "Y63Z"), 
    alos = c(1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.5, 1.4, 1.0, 1.0, 1.0, 1.0, 1.0, 1.3, 1.0)) 

這個例子產生必然更多的數據 - 但我不能左右的適當選擇得到最重要的解決方案採摘參考隨機爲1。

REF[, los := alos] 
setkey(N, los) 
setkey(REF, alos) 
NEW <- N[REF, roll='nearest'] 

Desired output one row per row in N: 

NR year los nr alos 
999 2012 1  A60D 1.0 
999 2012 1  A91Z 1.0 
999 2012 1  A91Z 1.0 
999 2012 1  W60Z 1.3 
999 2012 1  P60C 1.4 
999 2012 1  A91Z 1.0 
+2

你是如何描述你想要的輸出? –

+0

我添加了一個可能的摘錄 - 分發可以是隨機的 – chrischi

回答

0

這可能適合你。我試圖玩弄輥加入,但我不認爲你可以得到隨機的行爲:

setkey(REF,alos) 

N[, dif := min(abs(los - REF[, alos])), by = row.names(N)] 

set.seed(123) 
N[ , nr := REF[J(los-dif,los+dif),list(sample(nr,1))], by = row.names(N)] 
N 

    NR year los row dif nr 
1: 999 2012 1 1 0 F49F 
2: 999 2012 1 2 0 R65B 
3: 999 2012 1 3 0 J68Z 
4: 999 2012 1 4 0 U60A 
5: 999 2012 1 5 0 U60B 
6: 999 2012 1 6 0 A60D 
7: 999 2012 1 7 0 L70A 
8: 999 2012 1 8 0 U60A 
9: 999 2012 1 9 0 L70B 
10: 999 2012 1 10 0 K63C 
11: 999 2012 1 11 0 Y63Z 
12: 999 2012 1 12 0 K63C 
13: 999 2012 1 13 0 O64B 
14: 999 2012 1 14 0 L70B 
15: 999 2012 1 15 0 B70H 

所有這些代碼所做的就是找出在REF哪些值[,阿洛斯]最接近的N鍵值然後在nr中從該值取一個隨機樣本。我已經離開rowdif,但你可以擺脫這個分開

+0

感謝您的努力。我四處演奏,並與表演鬥爭。只要我會找到改進,我會發布它。 – chrischi