2016-11-13 50 views
0

我擁有關於幾個大都市地區的數據,並應用了其他數據,其中一個行是該地區的評級。我遇到的問題是行中的NA值。獲得某個大城市地區的平均評級

的數據看起來有點像這樣:

"ID", "Name", "Type", "Amount", "Rating", "Date" 
1,"Location A", "SomeType", 8000, 9.2, "2015-04-10" 
2,"Location B", "SomeType", 2300, 7.4, "2015-04-10" 
3,"Location C", "SomeType", 5400, NA, "2015-04-10" 
4,"Location A", "SomeType", 4300, 8.5, "2015-04-10" 
5,"Location B", "SomeType", 8670, 6.9, "2015-04-10" 
6,"Location A", "SomeType", 7600, NA, "2015-04-10" 
7,"Location A", "SomeType", 3400, 8.2, "2015-04-10" 
8,"Location B", "SomeType", 6500, NA, "2015-04-10" 
9,"Location C", "SomeType", 7800, 9.2, "2015-04-10" 

最後,我想有像這樣

Name   Average Rating 
Location A {average rating} 
Location B {average rating} 
Location C {average rating} 
與每個位置的收視率

很明顯,但它一直與NA值NULL會。數據直接從CSV中讀取。如何獲得每個地點的平均評分(不包括NA值)?

我與plyr嘗試過,但現在NULL返回:

mean_ratings = ddply(data, .(Name), summarize, Rating=mean(Rating)) 
+1

有被譽爲「na.rm參數=「在mean()中。將其設置爲TRUE –

回答

1
library(data.table) 
dt = data.table("Name"=c("Location A","Location B","Location C","Location A","Location B", 
        "Location A","Location A","Location B","Location C"), 
      "Rating"=c(9.2, 7.4, NA, 8.5,6.9,NA,8.2,NA,9.2)) 

> dt 
     Name Rating 
1: Location A 9.2 
2: Location B 7.4 
3: Location C  NA 
4: Location A 8.5 
5: Location B 6.9 
6: Location A  NA 
7: Location A 8.2 
8: Location B  NA 
9: Location C 9.2 

dt[, mean(Rating, na.rm = T),by = "Name"] 
     Name  V1 
1: Location A 8.633333 
2: Location B 7.150000 
3: Location C 9.200000 

的plyr解決方案:

ddply(dt, "Name", function(x) mean(x$Rating,na.rm = T)) 
相關問題