性能問題

我所試圖做的事： 1-閱讀文件內容爲矩陣（有兩個特點/列：ID和文本） 2 - 具有相同ID收起行，或，如果沒有可能，與疊加的數據創建一個新的矩陣 3-輸出在具有ID的名稱和文本作爲內容性能問題

這裏WD .txt文件是我做過什麼：

#set working directory and get file_list 
myvar <- matrix(0,nrow=0,ncol=2) 
colnames(myvar) <- c("PID","Seq") 

for(file in file_list) 
{ 
    print(file) 
    Mymatrix <- as.matrix(read.table(file)) 

    for(i in 1:length(Mymatrix[,1])) 
    { 
     if(Mymatrix[i,1] %in% myvar[,1]) 
     { 
      myvar[which(myvar[,1] == Mymatrix[i,1]) ,2] <- paste(myvar[which(myvar[,1] == Mymatrix[i,1]),2],Mymatrix[i,2]) 
     }else{ 
      myvar <- rbind(myvar,c(Mymatrix[i,1],Mymatrix[i,2])) 
     } 
    } 
}

性能問題，比照這裏profvis輸出： profvis results

這是一個可重複的代碼：

#Input: 
a <- matrix(0,ncol=2, nrow=0) 
colnames(a) <- c("id","text") 

#possible data in the matrix after reading one file 
a <- rbind(a,c(1,"4 5 7 7 8 1")) 
a <- rbind(a,c(1,"5 5 1 3 7 5 1")) 
a <- rbind(a,c(7,"5 5 1 3 7 5 1")) 
a <- rbind(a,c(5,"1 3 2 25 5 1 3 7 5 1")) 

#expected output after processing 

    > a 
    id text      
[1,] "1" "4 5 7 7 8 1 5 5 1 3 7 5 1" 
[2,] "7" "5 5 1 3 7 5 1"    
[3,] "5" "1 3 2 25 5 1 3 7 5 1"

注：崩潰行後的文本的順序保持：（4 5 7 7 8 1其次5 5 1 3 7 5 1爲ID=1）

正如前面提到的最大的問題是性能：我現在做的方式需要很多時間。有什麼解決方案像聚合或應用？

來源

2016-06-21 Imlerith

請參閱[此通用QA]（http://stackoverflow.com/questions/3505701/r-grouping-functions-sapply-vs-lapply-vs-apply-vs-lapply-vs-by-vs-aggrega） ;看起來你需要應用'paste（text，collapse =「」）'''id''作爲組。 –

下面是使用paste虛脫使用aggregate的方法=」「如通過@亞歷-LAZ提示：

convert matrix to data.frame and aggregate by id 
dfAgg <- aggregate(text ~ id, data=data.frame(a), FUN=paste, collapse=" ") 

# coerce dfAgg to matrix 
as.matrix(dfAgg) 
    id text      
[1,] "1" "4 5 7 7 8 1 5 5 1 3 7 5 1" 
[2,] "5" "1 3 2 25 5 1 3 7 5 1"  
[3,] "7" "5 5 1 3 7 5 1"

注意的是，使用as.data.frame沒有必要在該示例中，作爲R將執行自動強制。看起來好的編程實踐使得強制是明確的。

來源

2016-06-21 13:15:02 lmo

我猜聚合不接受矩陣作爲輸入，這就是爲什麼你使用data = data.frame（a）？我會嘗試一下，看看它是否會提高性能。 – Imlerith

我從來沒有在矩陣上使用過'聚集體'，但只是嘗試過它，它工作。 – lmo

一個問題是你正在一個循環中增長一個對象。這往往會對性能產生很大的影響，因爲R必須在每次迭代中將對象重複複製到新位置，以添加額外的行（或列或元素）。使用循環時，最好先用零或空字符串預先分配對象，然後填充它。 – lmo

回答

相關問題