2012-02-09 61 views
4

我知道我的問題很簡單,但不適合我。這裏是小數據集。長,如果其他循環和重新編碼在R

mark1 <- c("AB", "BB", "AB", "BB", "BB", "AB", "--", "BB") 
mark2 <- c("AB", "AB", "AA", "BB", "BB", "AA", "--", "BB") 
mark3 <- c("BB", "AB", "AA", "BB", "BB", "AA", "--", "BB") 
mark4 <- c("AA", "AB", "AA", "BB", "BB", "AA", "--", "BB") 
mark5 <- c("AB", "AB", "AA", "BB", "BB", "AA", "--", "BB") 
mark6 <- c("--", "BB", "AA", "BB", "BB", "AA", "--", "BB") 
mark7 <- c("AB", "--", "AA", "BB", "BB", "AA", "--", "BB") 
mark8 <- c("BB", "AA", "AA", "BB", "BB", "AA", "--", "BB") 
mymark <- data.frame (mark1, mark2, mark3, mark4, mark5, mark6, mark7, mark8) 
tmymark <- data.frame (t(mymark)) 
names (tmymark) <- c("P1", "P2","I1", "I2", "I3", "I4", "KL", "MN") 

因此數據集變爲:

 P1 P2 I1 I2 I3 I4 KL MN 
mark1 AB BB AB BB BB AB -- BB 
mark2 AB AB AA BB BB AA -- BB 
mark3 BB AB AA BB BB AA -- BB 
mark4 AA AB AA BB BB AA -- BB 
mark5 AB AB AA BB BB AA -- BB 
mark6 -- BB AA BB BB AA -- BB 
mark7 AB -- AA BB BB AA -- BB 
mark8 BB AA AA BB BB AA -- BB 

欲分類MARK1:基於所述P1和P2對比8,並提供一個代碼,這將使得一個新的變量:

loctype <- NULL 

if (tmymark$P1 == "AB" & tmymark$P2 == "AB"){ 
     loctype = "<hkxhk>" 
     } else { 
if (tmymark$P1== "AB" & tmymark$P2 == "BB") { 
     loctype = "<lmxll>" 
     } else { 
     if (tmymark$P1 == "AA" & tmymark$P2 == "AB") { 
     loctype = "<nnxnp>" 
     } else { 
     if (tmymark$P1 == "AA" & tmymark$P2 == "BB") { 
     loctype = "MN" 
     } else { 
     if (tmymark$P1 == "BB" & tmymark$P2 == "AA"){ 
     loctype = "MN" 
     } else { 
     if (tmymark$P1 == "--" & tmymark$P2 == "AA"){ 
     loctype = "NR" 
     } else { 
if (tmymark$P1 == "AA" & tmymark$P2 == "--"){ 
      loctype = "NR" 
} else { 
    cat ("error wrong input in P1 or P2") 
    }} }}}}} 

在這裏,我試圖做它比較P1和P2值並生成一個新的變量。 for example,if tmymark $ P1 ==「AB」& tmymark $ P2 ==「AB」loctype應該是「」。如果不是第二個條件將是應用程序等。

這是我的錯誤消息。

Warning messages: 
1: In if (tmymark$P1 == "AB" & tmymark$P2 == "AB") { : 
    the condition has length > 1 and only the first element will be used 
2: In if (tmymark$P1 == "AB" & tmymark$P2 == "BB") { : 
    the condition has length > 1 and only the first element will be used 

一旦LOCTYPE向量生成我要重新編碼與該變量的信息tmymark:

tmymark1 <- data.frame (loctype, tmymark)  
require(car) 
for(i in 2:length(tmymark)){ 

     if (loctype = "<hkxhk>") { 
     tmymark[[i]] <- recode (x, "AB" = "hk", "BA" = "hk", "AA" = "hh", "BB" = "kk") 
     } else { 
     if (loctype = "<lmxll>") { 
     tmymark[[i]] <- recode ((x, "AB" = "lm", "BA" = "lm", "AA" = "--", "BB" = "kk") 
     } else { 

     if (loctype = "<nnxnp>") { 
     tmymark[[i]] <- recode ((x, "AB" = "np", "BA" = "np", "AA" = "nn", "BB" = "--") 
      } else { 
     if (loctype = "MN") { 
     tmymark[[i]] <- "--" 
     } esle { 
     if (loctype = "NR") { 
     tmymark[[i]] <- "NA" 
     } else { 
     cat ("error wrong input code") 
     } } }}} 

上午我在正確的軌道?

編輯:預期輸出

 loctype P1 P2 I1 I2 I3 I4 KL MN 
mark1 <lmxmm> lm mm lm mm mm lm -- mm 
mark2 <hkxhk> hk hk hh kk kk hh -- kk 
mark3 <nnxnp> nn np nn -- -- nn -- -- 
and so on 

回答

1

match肯定是要走的路我會想辦法讓兩個數據幀作爲鍵,這樣的:

key <- data.frame(
      P1=c("AB", "AB", "AA", "AA", "BB", "--", "AA"), 
      P2=c("AB", "BB", "AB", "BB", "AA", "AA", "--"), 
      loctype=c("<hkxhk>", "<lmxll>", "<nnxnp>", "MN", "MN", "NR", "NR")) 

key2 <- cbind(
    `<hkxhk>` = c("hk","hk","hh","kk"), 
    `<lmxll>` = c("lm", "lm", "--", "kk"), 
    `<nnxnp>` = c("np", "np", "nn", "--"), 
    MN = rep("--", 4), 
    NR = rep("NA", 4)) 
rownames(key2) = c("AB","BA", "AA", "BB") 

然後用matchkey1得到loctype(賈斯汀也建議) ,以及在key2的rownames和列上以獲得期望的替換,使用矩陣索引來從密鑰中獲得期望的值。

loctype <- key$loctype[match(with(tmymark, paste(P1, P2, sep="\b")), 
          with(key, paste(P1, P2, sep="\b")))] 
ii <- match(as.vector(as.matrix(tmymark)), rownames(key2)) 
jj <- rep(match(loctype, colnames(key2)), nrow(tmymark)) 
out <- as.data.frame(matrix(key2[cbind(ii,jj)], nrow=nrow(tmymark))) 
colnames(out) <- colnames(tmymark) 
rownames(out) <- rownames(tmymark) 
out$loctype <- loctype 

結果然後看起來像這樣,其中缺少的值是因爲我沒有這些組合的值在我的鍵。

> print(out, na="") 
     P1 P2 I1 I2 I3 I4 KL MN loctype 
mark1 lm kk lm kk kk lm kk <lmxll> 
mark2 hk hk hh kk kk hh kk <hkxhk> 
mark3         
mark4 nn np nn -- -- nn -- <nnxnp> 
mark5 hk hk hh kk kk hh kk <hkxhk> 
mark6         
mark7         
mark8 -- -- -- -- -- -- --  MN 
1

發生的第一錯誤,因爲要想(計算結果爲一個或表達)的單個邏輯值。您可以使用ifelse(),而不是這是一個「矢量」 if

ifelse(tmymark$P1 == "AB" & tmymark$P2 == "AB", loctype = "<hkxhk>", else clauses...) 

爲了避免長if() else()結構(或ifelse()因爲它是),你可以使用匹配。使P1和P2的您的預計組合的數據幀和期望LOCTYPE添加一列:

matches <- data.frame(p1p2 = c('AB AB', 'AB BB', 'AA AB', 'AA BB', 'BB AA', '-- AA', 'AA --'), 
         loctype = c('<hkxhk>', '<lmxll>', '<nnxnp>', 'MN', 'MN', 'NR', 'NR')) 
loctype <- matches$loctype[match(paste(tmymark$P1, tmymark$P2), matches$p1p2),] 

第二部分可以做多種方式,但我畫一個空白上的「整潔。整潔」一個