在data.frame創建通過填充組重複的行

這裏我的例子data.frame：在data.frame創建通過填充組重複的行

df = read.table(text = 'ID Day Count Count_group 
       18 1933 6 15 
       33 1933 6 15 
       37 1933 6 15 
       18 1933 6 15 
       16 1933 6 15 
       11 1933 6 15 
       111 1932 5 9 
       34 1932 5 9 
       60 1932 5 9 
       88 1932 5 9 
       18 1932 5 9 
       33 1931 3 4 
       13 1931 3 4 
       56 1931 3 4 
       23 1930 1 1 
       6 1800 6 12 
       37 1800 6 12 
       98 1800 6 12 
       52 1800 6 12 
       18 1800 6 12 
       76 1800 6 12 
       55 1799 4 6 
       6 1799 4 6 
       52 1799 4 6 
       133 1799 4 6 
       112 1798 2 2 
       677 1798 2 2 
       778 888  4 8 
       111 888  4 8 
       88 888  4 8 
       10 888  4 8 
       37 887  2 4 
       26 887  2 4 
       8 886  1 2 
       56 885  1 1 
       22 120  2 6 
       34 120  2 6 
       88 119  1 6 
       99 118  2 5 
       12 118  2 5 
       90 117  1 3 
       22 115  2 2 
       99 115  2 2', header = TRUE)

Count列顯示的ID觀測一個Day內的數目; Count_group顯示Day及其前4天內的ID觀測值的數量。

我需要擴大df才能擁有每個Count_group集中的所有日子。

預期輸出：

ID Day Count Count_group 
18 1933 6 15 
33 1933 6 15 
37 1933 6 15 
18 1933 6 15 
16 1933 6 15 
11 1933 6 15 
111 1932 5 15 
34 1932 5 15 
60 1932 5 15 
88 1932 5 15 
18 1932 5 15 
33 1931 3 15 
13 1931 3 15 
56 1931 3 15 
23 1930 1 15 
6 1800 6 12 
37 1800 6 12 
98 1800 6 12 
52 1800 6 12 
18 1800 6 12 
76 1800 6 12 
55 1799 4 12 
6 1799 4 12 
52 1799 4 12 
133 1799 4 12 
112 1798 2 12 
677 1798 2 12 
111 1932 5 9 
34 1932 5 9 
60 1932 5 9 
88 1932 5 9 
18 1932 5 9 
33 1931 3 9 
13 1931 3 9 
56 1931 3 9 
23 1930 1 9 
778 888 4 8 
111 888 4 8 
88 888 4 8 
10 888 4 8 
37 887 2 8 
26 887 2 8 
8 886 1 8 
56 885 1 8 
55 1799 4 6 
6 1799 4 6 
52 1799 4 6 
133 1799 4 6 
112 1798 2 6 
677 1798 2 6 
22 120 2 6 
34 120 2 6 
88 119 1 6 
88 119 1 6 
99 118 2 6 
12 118 2 6 
99 118 2 6 
12 118 2 6 
90 117 1 6 
90 117 1 6 
22 115 2 6 
99 115 2 6 
99 118 2 5 
12 118 2 5 
90 117 1 5 
22 115 2 5 
99 115 2 5 
33 1931 3 4 
13 1931 3 4 
56 1931 3 4 
23 1930 1 4 
37 887 2 4 
26 887 2 4 
8 886 1 4 
56 885 1 4 
90 117 1 3 
22 115 2 3 
99 115 2 3 
112 1798 2 2 
677 1798 2 2 
8 886 1 2 
56 885 1 2 
22 115 2 2 
99 115 2 2 
23 1930 1 1 
56 885 1 1

輸出的說明：

1）1933日就這一精確天得到了6點的ID（計數COL）和總共15點的ID從1933年日到1929年日（ Count_group col）。值15來自6（1933年）+5（1932）+3（1931）+ 1（1930）+0（1929）。因此，在輸出中，我添加了Count_group = 15集內的所有剩餘天數。

2）下一天按降序排列是1932年。在這個精確的日期有5個ID，從1932年到1928年的總共有9個ID。值9從5（1932）+3（1931）+1（ 1930）+ 0（1929）+ 0（1928）。在輸出（第28行）中，您將看到第1932天完成（5天）劇集，共有9行。

3）接着日是1931..etc等。

輸出data.frame由Count_group和日，既降低= TRUE排名。

我想創建一個代碼，不僅適用於5天窗口（如上所述），而且適用於n天的任何時間窗口。

你有什麼建議嗎？

感謝

來源

2017-06-02 aaaaa

ok..could你試試？ – aaaaa

我不完全理解你是如何從數據到預期的輸出，但你可能會使用['tidyr :: complete（）']（http://tidyr.tidyverse.org/reference/complete.html）。也許看到這個[問題]（https://stackoverflow.com/questions/44271398/for-loops-including-rows-in-a-dataframe-by-the-missing-values-of-factor-levels/44271839#44271839 ）或[this one]（https://stackoverflow.com/questions/10438969/fastest-way-to-add-rows-for-missing-values-in-a-data-frame/44272077#44272077）。 – austensen

我有點困惑。我們如何爲您創建新的行。什麼規則簡單明瞭？寫下你無法弄清楚如何編碼的過程。我們如何計算每個列中的新值以幫助您？請將回復張貼爲您的問題的編輯。 –

嘗試了這一點，並告訴我，如果這是你在想：

# First I split the dataframe by each day using split() 
duplicates <- lapply(split(df, df$Day), function(x){ 
    if(nrow(x) != x[1,"Count_group"]) { # check if # of rows != the number you want 
    x[rep(1:nrow(x), length.out = x[1,"Count_group"]),] # repeat them until you get it 
    } else { 
    x 
    } 
}) 

df2 <- do.call("rbind.data.frame", duplicates) # turn the list back into a dataframe 
df3 <- df2[order(df2[,"Count_group"], df2[,"Day"], decreasing = T), ] # orderby Day & count 
rownames(df3) <- NULL # names back to 1:X instead of the generated ones 
df3 # the result

來源

2017-06-02 17:53:58

在data.frame創建通過填充組重複的行

回答

相關問題