For循環到plyr函數

我有一個字符數組，它包含數據框中某一行的列名和值。不幸的是，如果特定條目的值爲零，則列名和值不會列在數組中。我用這些信息創建了我想要的數據框，但我依賴於「for循環」。For循環到plyr函數

我想利用plyr來避免下面工作代碼中的for循環。

types <- c("one", "two", "three")  # My data 
entry <- c("one(1)", "three(2)")  # My data 


values <- function(entry, types) 
{ 
    frame<- setNames(as.data.frame(matrix(0, ncol = length(types), nrow = 1)), types) 

    for(s1 in 1:length(entry)) 
    { 
    name <- gsub("\\(\\w*\\)", "", entry[s1])      # get name 
    quantity <- as.numeric(unlist(strsplit(entry[s1], "[()]"))[2]) # get value 

    frame[1, which(colnames(frame)==name)] <- quantity    # store 

    } 
    return(frame) 
} 

values(entry, types)    # This is how I want the output to look

我曾嘗試以下分裂數組，但我無法弄清楚如何讓adply返回單行。

types <- c("one", "two", "three")  # data 
entry <- c("one(1)", "three(2)")   # data 

frame<- setNames(as.data.frame(matrix(0, ncol = length(types), nrow = 1)), types)  

array_split <- function(entry, frame){ 

    name <- gsub("\\(\\w*\\)", "", entry)       # get name 
    quantity <- as.numeric(unlist(strsplit(entry, "[()]"))[2]) # get value 
    frame[1, which(colnames(frame)==name)] <- quantity   # store 
    return(frame) 
} 

adply(entry, 1, array_split, frame)

有沒有像cumsum我應該考慮的東西？我想快速完成操作。

來源

2014-11-21 Walter

選擇以'plyr'爲速度通常不可取。如果你聽說R中的循環效率低下，那麼你一直在聽錯誤的顧問。哈德利最近在開發「dplyr」時考慮到了性能，但我認爲這不是「plyr」的主要設計目標。 'plyr'，因爲我認爲這是努力開發一個統一的轉換語法。 – 2014-11-21 18:44:07

恐怕我經常陷入寫得不好的循環，也許使用plyr讓我重新思考。 http://cran.r-project.org/doc/Rnews/Rnews_2008-1.pdf在高效的代碼編寫方面有相當不錯的篇幅，我將嘗試更頻繁地記住這些內容。謝謝你的dplyr提示，它看起來很有趣。 – Walter 2014-11-21 19:07:44

我不知道爲什麼你不只是做更多的東西是這樣的：

frame <- setNames(rep(0,length(types)),types) 
a <- as.numeric(sapply(strsplit(entry,"[()]"),`[[`,2)) 
names(a) <- gsub("\\(\\w*\\)", "", entry) 
frame[names(a)] <- a

兩個gsub和strsplit已經向量化，因此沒有任何地方明確的循環沒有真正的需要。您只需要sapply即可提取strsplit結果的第二個元素。其餘的只是定期索引。

來源

2014-11-21 17:14:38 joran

這太好了。它並沒有發生在我身上，因爲我對所有不同的數據類型都不太舒服。謝謝！另外，如果其他人想要使用這個解決方案，爲了得到它在我的最終格式，我需要添加「as.data.frame（t（幀））。 – Walter 2014-11-21 18:36:06

For循環到plyr函數

回答

相關問題