刪除沒有全年數據的行

-3

我收到了一個包含給定股票的月度回報的大數據集。我想刪除沒有全年數據的行。下面以數據子集爲例：刪除沒有全年數據的行

Date  Return Year   
9/1/2009 0.71447 2009 
10/1/2009 0.48417 2009 
11/1/2009 0.90753 2009 
12/1/2009 -0.7342 2009 
1/1/2010 0.83293 2010 
2/1/2010 0.18279 2010 
3/1/2010 0.19416 2010 
4/1/2010 0.38907 2010 
5/1/2010 0.37834 2010 
6/1/2010 0.6401 2010 
7/1/2010 0.62079 2010 
8/1/2010 0.42128 2010 
9/1/2010 0.43117 2010 
10/1/2010 0.42307 2010 
11/1/2010 -0.1994 2010 
12/1/2010 -0.2252 2010

理想情況下，代碼將刪除前四個觀察值，因爲他們沒有一整年的觀察。

來源

2017-08-14 Roger

請提供您目前使用的代碼，以便幫助。 –

嘗試dplyr，'df％>％group_by（year）％>％dplyr :: mutate（count = n（））％>％filter（count == 12）' – Wen

@ C8H10N4O2你是對的〜 – Wen

該OP已要求刪除每月價值不超過一整年的大型數據集中的所有行。雖然solution suggested by Wen似乎是working for the OP我想建議一個更強大的方法。

溫的解決方案計算每年的行數，假設每月有一排。如果在生產數據集中存在重複條目，則每月計算獨特個月的數量會更加穩健。（根據我的經驗，在處理生產數據和檢查所有假設時不能小心）。

library(data.table) 
# count number of unique months per year, 
# keep only complete years, omit counts 
# result is a data.table with one column Year 
full_years <- DT[, uniqueN(month(Date)), by = Year][V1 == 12L, -"V1"] 
full_years

Year 
1: 2010

# right join with original table, only rows belonging to a full year will be returned 
DT[full_years, on = "Year"]

  Date Return Year 
1: 2010-01-01 0.83293 2010 
2: 2010-02-01 0.18279 2010 
3: 2010-03-01 0.19416 2010 
4: 2010-04-01 0.38907 2010 
5: 2010-05-01 0.37834 2010 
6: 2010-06-01 0.64010 2010 
7: 2010-07-01 0.62079 2010 
8: 2010-08-01 0.42128 2010 
9: 2010-09-01 0.43117 2010 
10: 2010-10-01 0.42307 2010 
11: 2010-11-01 -0.19940 2010 
12: 2010-12-01 -0.22520 2010

注意，這種方法避免了一個count列添加到一個潛在的大型數據集的每一行。

的代碼可以更簡潔的書面：

DT[DT[, uniqueN(month(Date)), by = Year][V1 == 12L, -"V1"], on = "Year"]

也可以檢查數據的任何重複的幾個月，例如，

stopifnot(all(DT[, .N, by = .(Year, month(Date))]$N == 1L))

此代碼計算出現的次數爲每年和每月，並在有多個時停止執行。

來源

2017-08-28 15:09:59 Uwe

刪除沒有全年數據的行

回答

相關問題