比方說,我有以下的數據表(如data
):如何用R刪除少數據的重複行?
row,or,d,ddate,rdate,changes,class,price,fdate,company,number,minutes,added,source
1,VA1,VA2,2014-05-24,,0,0,2124,2014-05-22 15:50:16,,,,2014-05-22 12:20:03,tp
2,VA1,VA2,2014-05-26,,0,0,2124,2014-05-22 15:03:44,,,,2014-05-22 12:20:03,tp
3,VA1,VA2,2014-05-26,,0,0,2124,2014-05-22 15:03:44,A1,,,2014-05-22 12:20:03,tp
4,VA1,VA2,2014-06-05,,0,0,2124,2014-05-22 15:48:24,,,,2014-05-22 12:20:03,tp
5,VA1,VA2,2014-06-09,,0,0,2124,2014-05-22 15:37:35,,,,2014-05-22 12:20:03,tp
6,VA1,VA2,2014-06-16,,0,0,2124,2014-05-22 14:17:33,,,,2014-05-22 12:20:03,tp
7,VA1,VA2,2014-06-16,,0,0,2124,2014-05-22 14:17:33,,,,2014-05-22 12:20:03,tp
我想刪除重複的行。如果我做data <- unique(data, by = NULL)
,那麼只有最後一行(第7行)被刪除,但是我想刪除第2行。我可以setkey()
定義鍵:
setkey(data, row,or,d,ddate,rdate,changes,class,price,fdate,number,minutes,added,source)
,它會刪除任何一列2或第3行,但我想刪除行,它具有較少的數據,並保持與行更多的數據。即在上面的情況下,第2行應該被刪除,但第3行應該保留,因爲它在第company
列中具有附加值。我該怎麼做?
我這個問題看的問題是,如果某行具有比另一個更小的數據,它_isn't_重複。至少不是如果你使用每一列作爲唯一性的關鍵。 – 2015-03-30 19:59:46