2015-09-08 32 views
0

我想創建一個有3列和許多行的矩陣,如果條件滿足,則分配1或0。根據條件將值分配給矩陣

我具有存儲在3個變量

數據
df1 <- data.frame(names=c("A","B","C","D","E","F")) 
df2 <- data.frame(names=c("A","B","C","F")) 
df3 <- data.frame(names=c("E","F","H")) 

輸出將是

 df1 df2 df3 
    A 1 1 0 
    B 1 1 0 
    C 1 1 0 
    D 1 0 0 
    E 1 1 1 
    F 1 0 1 
    H 0 0 1 

在第一行,如果A爲存在於數據集然後我將分配各1列下和0如果A不目前在數據集中

這是我曾嘗試

DF <- rbind(df1,df2,df3) 
for (i in DF) { 
    for (j in 1:length(df1$names)) { 
       if(i == df1$names[j]){ 
        A3 <-data.frame(paste0("",i),paste0(1),paste0(0),paste0(0)) 
        names(A3) <- NULL 
       } 
       else{ 
        A3 <-data.frame(paste0("",i),paste0(0),paste0(0),paste0(0)) 

       } 
    } 
} 

我已經爲df1編寫了此代碼,但它非常慢,因爲我的信號數據集中有超過1500行。什麼是最快的方法來做到這一點?

回答

3

分組變量添加到每個數據幀:

df1 <- data.frame(names=c("A","B","C","D","E","F"),group="df1") 
df2 <- data.frame(names=c("A","B","C","F"),group="df2") 
df3 <- data.frame(names=c("E","F","H"),group="df3") 
DF <- rbind(df1,df2,df3) 

那麼做到這一點:

res <- table(DF) 

> res 
    group 
names df1 df2 df3 
    A 1 1 0 
    B 1 1 0 
    C 1 1 0 
    D 1 0 0 
    E 1 0 1 
    F 1 1 1 
    H 0 0 1 

或者,如果你想有一個數據幀:在%運營商

library(reshape2) 
dcast(names~group, data=DF,fun.aggregate = length) 
+0

該方法適用於您提供的exampledata,如果數據中有多個相同字符串的實例,我現在不會這樣做嗎? – Heroka

+0

非常感謝Heroka。我的數據中沒有重複項 – hash

0

%讓你檢查字符串向量中是否存在字符串。這也是矢量化,所以它的工作原理很簡單:

x=c(LETTERS[c(1:6,8)]) 
df=data.frame(x=x,df1=as.numeric(x %in% df1$names), 
      df2=as.numeric(x %in% df2$names), 
      df3=as.numeric(x %in% df3$names)) 
df 

如果速度是至關重要的,{} data.table包給人以%下巴%運營商一點點速度提升:

library(data.table) 
x=c(LETTERS[c(1:6,8)]) 
dt=data.table(x=x,df1=as.numeric(x %chin% as.character(df1$names)), 
      df2=as.numeric(x %chin% as.character(df2$names)), 
      df3=as.numeric(x %chin% as.character(df3$names))) 
dt 
0

下面的代碼比其他答案稍微普遍一些。另外,我覺得這是很有必要知道如何動態創建命令...... 我用的數據幀爲您準備了他們:

df1 <- data.frame(names = c("A", "B", "C", "D", "E", "F")) 
df2 <- data.frame(names = c("A", "B", "C"," F")) 
df3 <- data.frame(names = c("E", "F", "H")) 

DF <- rbind(df1, df2, df3) 
nDF <- unique(DF) #we don't want to duplicate tests. 

然後主循環就是這樣的:

n_ <- 3 
for(ii in 1 : n_){ 
nDF[ paste0("df", ii) ] <- as.logical(NA) #dynamically creates a new variable in your data frame 

cmnd <- paste0("nDF$names %in% df",ii,"$names") #dynamically creates the appropriate command (in this case you want to test e.g. whether "nDF$names %in% df1$names". 

nDF[ paste0("df",ii)] <- eval(parse(text = cmnd)) #evaluates the dynamically created command and saves it into the previously created variable. 
} 

應該比較快。但是,如果您的數據中沒有重複內容,那麼heroka's對這個問題的建議可能是最好的選擇。

1

當使用在data.table包的rbindlistidcol參數,也沒有必要單獨地創建爲每個數據幀的一個分組列的:

library(data.table) # I used v1.9.5 for this 
DT <- rbindlist(list(df1, df2, df3), idcol="id") 
dcast(DT[, .N , by=.(id,names)], names ~ id, fill=0) 

其給出:

names 1 2 3 
1:  A 1 1 0 
2:  B 1 1 0 
3:  C 1 1 0 
4:  D 1 0 0 
5:  E 1 0 1 
6:  F 1 1 1 
7:  H 0 0 1