用查詢彙總數據集中的選定條目

我對R仍然很陌生，並嘗試以特定方式總結數據。爲了在此說明，我使用了nasaweather包中的天氣數據。例如，我想獲得特定日期的平均溫度，並顯示此數據集中包含的3個起點和12個月的平均溫度。用查詢彙總數據集中的選定條目

我想我可以使用下面的代碼來完成它，我指定了我感興趣的那一天，創建一個空數據框來填充，然後運行一個for循環來計算平均值的月份每個原點的溫度，將它們與月份聯繫起來，然後將它們與數據框聯繫起來。最後，我調整了列名，並打印出結果：

library(nasaweather) 
library(magrittr) 
library(dplyr) 

query_day = 15 
data_output <- data.frame(month = numeric(), 
       EWR = numeric(), 
       JFK = numeric(), 
       LGA = numeric()) 

for (i in 1:12) { 
    data_subset <- weather %>% 
    filter(day == query_day, month == i) %>% 
    summarize(
     EWR = mean(temp[origin == "EWR"]), 
     JFK = mean(temp[origin == "JFK"]), 
     LGA = mean(temp[origin == "LGA"])) 
    data_output <- rbind(data_output, cbind(i, data_subset)) 
    rm(data_subset) 
} 

names(data_output) <- c("month", "EWR", "JFK", "LGA") 
print(data_output)

在我手中這會產生如下：

month  EWR  JFK  LGA 
1  1 39.3725 39.0875 38.9150 
2  2 42.1625 39.3425 42.9050 
3  3 37.4150 36.7775 37.3025 
4  4 50.1275 48.1550 49.2050 
5  5 58.8725 55.7150 59.1575 
6  6 70.7825 70.2950 71.5700 
7  7 86.9900 85.1225 87.2000 
8  8 69.2075 69.0725 69.9425 
9  9 60.6350 61.2125 61.7375 
10 10 59.8850 58.3850 60.5150 
11 11 45.7475 45.1700 49.0700 
12 12 32.4950 38.0975 34.0325

這正是我想要的。我只是想，我的代碼似乎太複雜了，想問問是否有更簡單的方法來完成這項工作？

來源

2016-11-15 Moose on Mars

您可以只使用聚合函數，然後重塑'一個< - 集料（溫〜月+起點，天氣，平均值）; reshape（a，id ='month'，...）' –

謝謝@Dirk，但是如果我正確地得到它，這將產生整個月的平均溫度，而不是特定日的平均溫度。有沒有一種方法來指定在聚合函數內？ –

啊錯過了'a < - 集合（temp〜month + origin，weather [weather $ day == query_day，]，mean）;重塑（a，id ='month'，...）' –

你的代碼存在各種各樣的問題......但最主要的是你沒有先group_by。只要你包括這一點，這變得容易俗氣。看看我的意見，你的代碼，然後再在底部的定稿代碼：

library(nasaweather) ## Wrong package 
# library(magrittr) ## not needed, it's called by dplyr 
library(dplyr) 

query_day = 15 
# data_output <- data.frame(month = numeric(), ## We won't need to specify this explicitly 
## (but you are right that you should specify this in a for loop. Go one step 
## further by actually telling the data.frame how many rows to expect. 
## But not in this case cause we won't use for loop) 
         # EWR = numeric(), 
         # JFK = numeric(), 
         # LGA = numeric()) 

for (i in 1:12) { ## You don't need to do a for loop... you can do it with the summarize_by function. 
    data_subset <- weather %>% 
    filter(day == query_day, month == i) %>% 
    summarize(  ## Before doing summarize, you need a group_by to say what to summarize_by 
     EWR = mean(temp[origin == "EWR"]), 
     JFK = mean(temp[origin == "JFK"]), 
     LGA = mean(temp[origin == "LGA"])) 
    data_output <- rbind(data_output, cbind(i, data_subset)) ## If you're doing the group_by, this step isn't required. 
    # rm(data_subset) ## You don't have to remove temporary datasets... 
## When the for loop ends, they are automatically removed. 
} 

names(data_output) <- c("month", "EWR", "JFK", "LGA") 
print(data_output) 

################### Better code: 
library(nycflights13) ## your the package you waant is nycflights13... not nasaweather 
library(dplyr) 

query_day = 15 

weather %>% 
    filter(day == query_day) %>% 
    group_by(month) %>% 
    summarize(
     EWR = mean(temp[origin == "EWR"]), 
     JFK = mean(temp[origin == "JFK"]), 
     LGA = mean(temp[origin == "LGA"])) -> data_output 

data_output

產量：

> data_output 
# A tibble: 12 × 4 
    month  EWR  JFK  LGA 
    <dbl> <dbl> <dbl> <dbl> 
1  1 39.3725 39.0875 38.9150 
2  2 42.1625 39.3425 42.9050 
3  3 37.4150 36.7775 37.3025 
4  4 50.1275 48.1550 49.2050 
5  5 58.8725 55.7150 59.1575 
6  6 70.7825 70.2950 71.5700 
7  7 86.9900 85.1225 87.2000 
8  8 69.2075 69.0725 69.9425 
9  9 60.6350 61.2125 61.7375 
10 10 59.8850 58.3850 60.5150 
11 11 45.7475 45.1700 49.0700 
12 12 32.4950 38.0975 34.0325

來源

2016-11-15 12:49:16

感謝@Amit對所有這些有用的評論，非常感謝！我先用「group_by」嘗試過，但從來沒有得到它的工作，猜測我一定做了一些非常錯誤的事情。但是，當我運行改進版本的11行（##更好的代碼:)時，我不回收12×4的粗體，但只有一行：'1 54.47438 53.86937 55.12938'，每個值爲'EWR JFK LGA '，我想這是所有12個月的平均值。任何想法我（又一次）在這裏做錯了？ –

聽起來很奇怪...清除控制檯，甚至重新啓動RStudio，然後重試？我只是重新嘗試，它正常工作。 –

重新啓動RStudio完成了這項工作，現在它也適用於我，謝謝！ –

用查詢彙總數據集中的選定條目

回答

相關問題