2012-02-14 40 views
3

假設我有以下數據框:如何將數據框分成與R中列名相關的數據框列表?

df <- data.frame(BR.a=rnorm(10), BR.b=rnorm(10), BR.c=rnorm(10), 
USA.a=rnorm(10), USA.b = rnorm(10), FRA.a=rnorm(10), FRA.b=rnorm(10)) 

我想創建dataframes名單,由列名,即第一部分將它們分開,以「BR」開始的列是一個元素的列表中,以「USA」開頭的列將是另一列,依此類推。

我可以得到列名並使用strsplit將它們分開。不過,我不知道如何迭代它並分離數據框是最好的方式。

strsplit(names(df), "\\.") 

給了我一個清單,其頂層元素是列和第二級的名稱本薩姆斯由"."分裂。

我該如何迭代這個列表才能獲得以相同子字符串開頭的列的索引號,並將這些列作爲另一個列表的元素進行分組?

回答

3

達誠打我給它,但這裏有一個不同的風味相同的概念方法的:「」

library(plyr) 

# Use regex to get the prefixes 
# Pulls any letters or digits ("\\w*") from the beginning of the string ("^") 
# to the first period ("\\.") into a group, then matches all the remaining 
# characters (".*"). Then replaces with the first group ("\\1" = "(\\w*)"). 
# In other words, it matches the whole string but replaces with only the prefix. 

prefixes <- unique(gsub(pattern = "^(\\w*)\\..*", 
         replace = "\\1", 
         x = names(df))) 

# Subset to the variables that match the prefix 
# Iterates over the prefixes and subsets based on the variable names that 
# match that prefix 
llply(prefixes, .fun = function(x){ 
    y <- subset(df, select = names(df)[grep(names(df), 
              pattern = paste("^", x, sep = ""))]) 
}) 

我想這些正則表達式應該還是給你,即使有正確的結果後來在變量名:

unique(gsub(pattern = "^(\\w*)\\..*", 
      replace = "\\1", 
      x = c(names(df), "FRA.c.blahblah"))) 

或者,如果一個前綴後出現在變量名:

# Add a USA variable with "FRA" in it 
df2 <- data.frame(df, USA.FRANKLINS = rnorm(10)) 

prefixes2 <- unique(gsub(pattern = "^(\\w*)\\..*", 
         replace = "\\1", 
         x = names(df2))) 

llply(prefixes2, .fun = function(x){ 
    y <- subset(df2, select = names(df2)[grep(names(df2), 
              pattern = paste("^", x, sep = ""))]) 
}) 
3

如果列名始終以您擁有的形式(基於「。」分割),並且您希望根據第一個「。」之前的標識符進行分組,那麼這將僅適用。

df <- data.frame(BR.a=rnorm(10), BR.b=rnorm(10), BR.c=rnorm(10), 
USA.a=rnorm(10), USA.b = rnorm(10), FRA.a=rnorm(10), FRA.b=rnorm(10)) 

## Grab the component of the names we want 
nm <- do.call(rbind, strsplit(colnames(df), "\\."))[,1] 
## Create list with custom function using lapply 
datlist <- lapply(unique(nm), function(x){df[, nm == x]})