r分割功能 - 不包括在新的數據集

分組變量我通常喜歡用lapply()代替for循環：r分割功能 - 不包括在新的數據集

lx <- split(x, x$hr) #with the next step being lapply(lx, function(x) ...)).

但是，現在的lx每個元素包括列hr，這是低效的，因爲這信息已在names(lx)。

所以現在我必須做到：

lx <- lapply(lx, function(X) select(X, -hr))

（另一種方法是：

HR <- unique(x$hr) 
lx <- select(lx, -hr) 
lx <- split(x, HR)

）

的lapply()在for環整點是有效率如此這些額外的線打擾我。這似乎是一個常見的用例，而且我的經驗表明，通常R的效率更高，或者我錯過了一些東西。

這可以在單個函數調用或單線程中實現嗎？

編輯：具體例

DF <- data.frame(A = 1:2, B = 2:3, C = 3:4) 
DF <- split(DF, factor(DF$A)) # but each list element still contains the column A which is 
           # redundant (because the names() of the list element equals A 
           # as well), so I have to write the following line if I want 
           # to be efficient especially with large datasets 
DF <- lapply(DF, function(x) select(x, -A)) # I hate always writing this line!

來源

2014-11-22 StatSandwich

好的，我做到了。不知道是否有更好的解決方案。這似乎很常見.. – StatSandwich 2014-11-22 02:48:10

第一拆下開口柱：

split(DF[-1], DF[[1]])

或

split(subset(DF, select = -A), DF$A)

更新：添加最後一行。

來源

2014-11-22 02:48:47

太棒了 - 謝謝！ – StatSandwich 2014-11-22 02:50:10

現在很明顯，當然在第一個參數中修改DF並不會影響第二個參數。 – StatSandwich 2014-11-22 02:56:50

r分割功能 - 不包括在新的數據集

回答

相關問題