2014-06-18 27 views
0
dput(x) 

structure(list(State = structure(c(1L, 1L, 2L, 3L, 2L, 4L, 2L, 
5L, 5L, 2L), .Label = c("Illinois", "Texas", "California", "Louisiana", 
"Michigan"), class = "factor"), Lat = structure(1:10, .Label = c("41.627", 
"41.85", "32.9588", "33.767", "33.0856", "30.4298", "29.7633", 
"42.4687", "43.0841", "29.6919"), class = "factor"), 
Long = structure(1:10, .Label = c("-88.204", 
"-87.65", "-96.9812", "-118.1892", "-96.6115", "-90.8999", "-95.3633", 
"-83.5235", "-82.4905", "-95.6512"), class = "factor")), .Names = c("State", 
"Lat", "Long"), row.names = c(NA, 10L), class = "data.frame") 

我需要有另一列說明總數,這將是每個國家的總數。你如何計算獨特的因素,並將它們插入到相同的數據框R

x$Total<-1 

然後

library(data.table 
x<-data.table(x) 
x<-x[,total:=sum(Total),by=State] 

是否有更好的/短/高效的數據幀計數因素的辦法:我可以通過創建另一列總這樣做呢?

+2

你嘗試'X <-x [,總數:= N,按=國]'? (不需要首先初始化Total) –

+0

'tabulate(x $ State)[x $ State]'看起來也是有效的 –

回答

1

您可以使用dplyr像這樣(無需創建Total列):

(編輯:感謝@beginneR啓發我的n()的存在,這可能是更簡潔)

library('dplyr') 
mutate(group_by(x, State), total = n()) 

@ beginneR的解決方案group_by(x, State) %>% mutate(total = n())也特別適合您,如果您需要繼續對您的數據進行其他操作。同樣,

x %>% 
    group_by(State) %>% 
    mutate(total = n()) 

也會起作用。

+2

我建議將其重寫爲:'group_by(x,State)%>%mutate(total = N())'。 (請注意,在你的例子中你會得到一個名爲'sum(length(State))'的列。順便說一句,'dplyr'也接受'data.table's的工作。 –

+0

@KaraWoo,謝謝 – user1471980

+0

我編輯了我的答案,以命名新的total列。但我更喜歡@ beginneR的解決方案。我不知道'n()',這真的很方便! –

0

您還可以使用R基本aggregate

> aggregate(.~State, FUN=length, data=x) 
相關問題