2015-03-08 34 views
1

有關示例數據幀:包括在彙總表的NA

migration <- structure(list(area.old = structure(c(2L, 2L, 2L, 2L, 2L, 2L, 
               2L, 1L, 1L, 1L, 1L, 1L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 2L, 1L, 
               3L, NA, NA, NA), .Label = c("leeds", "london", "plymouth"), class = "factor"), 
         area.new = structure(c(7L, 13L, 3L, 2L, 4L, 7L, 6L, 7L, 6L, 
               13L, 5L, 8L, 7L, 11L, 12L, 9L, 1L, 10L, 11L, NA, NA, NA, 
               NA, 7L, 6L, 6L), .Label = c("bath", "bristol", "cambridge", 
                      "glasgow", "harrogate", "leeds", "london", "manchester", 
                      "newcastle", "oxford", "plymouth", "poole", "york"), class = "factor"), 
         persons = c(6L, 3L, 2L, 5L, 6L, 7L, 8L, 4L, 5L, 6L, 3L, 4L, 
            1L, 1L, 2L, 3L, 4L, 9L, 4L, 5L, 7L, 9L, 10L, 15L, 4L, 7L)), .Names = c("area.old", 
                              "area.new", "persons"), class = c("data.table", "data.frame"), row.names = c(NA, 
                                                 -26L), .internal.selfref = <pointer: 0x0000000000220788>) 

我希望將數據概括成一對夫婦使用的代碼dataframes的:

moved.from <- migration[as.character(area.old)!=as.character(area.new), 
       .(persons = sum(persons)), 
       by=.(moved.from = area.old)] 

moved.to <- migration[as.character(area.old)!=as.character(area.new), 
       .(persons = sum(persons)), 
       by=.(moved.to = area.new)] 

這將產生兩個簡表,首先,詳細說明從'area.old'區域移動的人員總數。第二個表格列出了人們轉移到的目的地(在'area.new'中)。此代碼在此處建議(Producing smmary tables for very large datasets)。

當我對自己的數據進行測試時,出現了一個問題,因爲我沒有告訴R如何處理'area.old'或'area.new'列中的NA。我怎樣才能修改這個代碼來添加所有的NAs(即把它們包含在moving.from底部的一行中,並將其移動到數據框架中,以增加NAs中的總人數)?

對此的任何幫助將非常感激。

回答

1

只是每個過濾器

migration[as.character(area.old) != 
      as.character(area.new) | 
      is.na(area.old), 
      .(persons = sum(persons)), 
      by = .(moved.from = area.old)] 

# moved.from persons 
# 1:  london  24 
# 2:  leeds  17 
# 3: plymouth  19 
# 4:   NA  26 

而且

migration[as.character(area.old) != 
      as.character(area.new) | 
      is.na(area.new), 
      .(persons = sum(persons)), 
      by = .(moved.to = area.new)] 

#  moved.to persons 
# 1:  york  9 
# 2: cambridge  2 
# 3: bristol  5 
# 4: glasgow  6 
# 5:  leeds  8 
# 6:  london  5 
# 7: harrogate  3 
# 8: manchester  4 
# 9:  poole  2 
# 10: newcastle  3 
# 11:  bath  4 
# 12:  oxford  9 
# 13:   NA  31 

作爲一個側面說明中添加| is.na作爲附加條件,我建議你兩列轉換爲字符類和避免在每個操作中調用as.character。下面應該做

migration[, names(migration)[-3L] := lapply(.SD, as.character), .SDcols = -"persons"] 

現在你可以比較area.oldarea.new無需調用as.character