2012-06-24 54 views
2

我有數據幀數據幀列置換及摘要中的R

names <- c("doe.jane", "doe.john", "smith.bob") 
number <- c(3, 5, 1) 
site <- c("A1", "A1", "A2") 
df <- as.data.frame(matrix(c(site, names, number), 3)) 
names(df) <- c("site", "names", "number") 

我只需要以姓氏來替代全名,然後摺疊數據幀所以輸出

names <- c("doe", "smith") 
number <- c(8, 1) 
site <- c("A1", "A2") 
df <- as.data.frame(matrix(c(site, names, number), 2)) 
names(df) <- c("site", "names", "number") 

回答

3

你'd想做這樣的事情:

last.names <- function(names) { 
    names <- as.character(names) 
    split.names <- strsplit(names, split='.', fixed=TRUE) 
    sapply(split.names, function(x) x[1]) 
} 

df <- within(df, names <- last.names(names)) 
df <- with(df, aggregate(as.numeric(number), by=list(site=site, names=names), sum)) 

我會指出你的定義df是有點誤導。你真的只需要說df <- data.frame(names, number, site)。你這樣做的方式會導致在data.frame中產生三個factor列。

1

這是一個使用正則表達式來獲取名稱部分的版本。 由於數據被保存爲因素,我重新創建了數據 - 感謝mplourde指出了這一點。

#set up the data 
names <- c("doe.jane","doe.john","smith.bob") 
number <- c(3,5,1) 
site <- c("A1","A1","A2") 
df <- data.frame(site,names,number) 

#get the first part of the name 
df$names <- gsub("([[:alpha:]]+)\\.([[:alpha:]]+)","\\1",df$names) 
#aggregate the data by site and name 
dfnew <- aggregate(df["number"],df[c("site","names")],sum) 

> dfnew 
    site names number 
1 A1 doe  8 
2 A2 smith  1