2017-10-07 17 views
0

考慮:操縱數據集以解決重複測量

df <- data.frame(
        CompanyID=c("Drinkers","Drinkers","Drinkers","Drinkers","Drinkers","Drinkers","Drinkers","Drinkers" 
          ,"Drinkers","Drinkers", "Liquders","Liquders","Liquders","PelletCoffeeCo","PelletCoffeeCo"), 
        Email= c("[email protected]", "[email protected]","[email protected]","[email protected]", "[email protected]", 
          "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", 
          "[email protected]","[email protected]","[email protected]","[email protected]", 
         "[email protected]"), 
        Day= c("1","2","3","4","5","6","7","8","9","10","1","2","3","1","2"), 
       var1= c(4,5,5,5,2,3,2,7,6,5,7,6,6,2,3)) 

我需要弄清楚如何獲得:

df2 <- data.frame(CompanyID=c("Drinkers","Drinkers","Drinkers","Drinkers","Drinkers","Drinkers","Drinkers","Drinkers" 
          ,"Drinkers","Drinkers", "Liquders","Liquders","Liquders","Liquders","Liquders","Liquders", 
          "Liquders","Liquders","Liquders","Liquders", "PelletCoffeeCo","PelletCoffeeCo","PelletCoffeeCo", 
          "PelletCoffeeCo","PelletCoffeeCo","PelletCoffeeCo","PelletCoffeeCo","PelletCoffeeCo", 
          "PelletCoffeeCo","PelletCoffeeCo"), 
        Email= c("[email protected]", "[email protected]","[email protected]","[email protected]", "[email protected]", 
          "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", 
          "[email protected]","[email protected]","[email protected]","[email protected]","[email protected]", 
          "[email protected]","[email protected]","[email protected]","[email protected]","[email protected]","[email protected]", 
          "[email protected]","[email protected]","[email protected]","[email protected]", 
          "[email protected]","[email protected]","[email protected]","[email protected]", 
          "[email protected]"), 
        Day= c("1","2","3","4","5","6","7","8","9","10","1","2","3","4","5","6","7","8","9","10", 
         "1","2","3","4","5","6","7","8","9","10"), 
        var1= c(4,5,5,5,2,3,2,7,6,5,7,6,6, NA,NA,NA,NA,NA,NA,NA, 2,3,NA,NA,NA,NA,NA,NA,NA,NA)) 

說明: 我有,我在接受調查的人,每天一次數據10天的課程。在一個完美的世界中,我會從每個參與者那裏收到10個答覆,記爲day1:day10。然而,由於不答覆,一些參與者給出了3個響應,其他參與者6和其他參與者10等。我將數據設置爲運行增長模型,因此我需要列「Day」以始終讀取Day1 - 第10天,不管這些回覆是否有數據。我試圖通過向沒有全部10天數據的行添加NA來證明這一點。

我該怎麼辦?

感謝先進!

回答

2

嘗試這種情況:

library(tidyr) 

df %>% 
    complete(nesting(CompanyID,Email), Day = seq(min(Day), max(Day), 1L)) %>% 
    data.frame() 

輸出:

 CompanyID     Email Day var1 
1  Drinkers  [email protected] 1 4 
2  Drinkers  [email protected] 2 5 
3  Drinkers  [email protected] 3 5 
4  Drinkers  [email protected] 4 5 
5  Drinkers  [email protected] 5 5 
6  Drinkers  [email protected] 6 2 
7  Drinkers  [email protected] 7 3 
8  Drinkers  [email protected] 8 2 
9  Drinkers  [email protected] 9 7 
10  Drinkers  [email protected] 10 6 
11  Liquders  [email protected] 1 7 
12  Liquders  [email protected] 2 NA 
13  Liquders  [email protected] 3 6 
14  Liquders  [email protected] 4 6 
15  Liquders  [email protected] 5 NA 
16  Liquders  [email protected] 6 NA 
17  Liquders  [email protected] 7 NA 
18  Liquders  [email protected] 8 NA 
19  Liquders  [email protected] 9 NA 
20  Liquders  [email protected] 10 NA 
21 PelletCoffeeCo [email protected] 1 2 
22 PelletCoffeeCo [email protected] 2 NA 
23 PelletCoffeeCo [email protected] 3 3 
24 PelletCoffeeCo [email protected] 4 NA 
25 PelletCoffeeCo [email protected] 5 NA 
26 PelletCoffeeCo [email protected] 6 NA 
27 PelletCoffeeCo [email protected] 7 NA 
28 PelletCoffeeCo [email protected] 8 NA 
29 PelletCoffeeCo [email protected] 9 NA 
30 PelletCoffeeCo [email protected] 10 NA 

編輯:

上述代碼填充每個組節列值與一組完整日值的由現有值的最小值和最大值定義在列(即1和10)。這些Day值填充的組可以根據需要重新定義,但我選擇在這裏將它們定義爲Company + Email,並使用「nesting(CompanyID,Email)」行。 data.frame()行就在那裏將輸出轉換爲data.frame而不是tibble。如果data.frame輸出不是必需的,請隨時更換或刪除該行。

+0

太棒了!非常感謝。它像一個魅力。 我有一些其他變量,x1:x10,我希望它的工作原理是一樣的。 你能解釋一下功能嗎?我看到它是如何工作的,但不知道完成和嵌套是如何協同工作的 - 然後爲什麼需要在最後添加data.frame參數? – D500

+0

@ D500 - 沒問題。請參閱上面添加的說明。 – www

0

首先,創建唯一公司ID的數據框。 接下來,創建所需日期的數據框。

交叉將這些加在一起。

然後加入您的原始數據集以填寫表格。

comp <- data.frame(CompanyID = unique(df$CompanyID)) 
Day <- data.frame(Day = c("1","2","3","4","5","6","7","8","9","10")) 

compDay <- merge(comp, Day, all = TRUE) 

dfday <- merge(df, compDay, by = c("CompanyID", "Day"), all = TRUE) 
+0

太棒了!非常感謝。它像一個魅力。 – D500