R刪除字符向量中的重複元素，而不是重複行

我有存儲在一個字符向量一些文檔ID和日期的數據幀（日期）：

Doc  Dates 
1 12345 c("06/01/2000","08/09/2002") 
2 23456 c("07/01/2000", 09/08/2003", "07/01/2000") 
3 34567 c("09/06/2004", "09/06/2004", "12/30/2006") 
4 45678 c("06/01/2000","08/09/2002")

我試圖刪除的日期重複的元素，得到這樣的結果：

Doc  Dates 
1 12345 c("06/01/2000","08/09/2002") 
2 23456 c("07/01/2000", 09/08/2003") 
3 34567 c("09/06/2004", "12/30/2006") 
4 45678 c("06/01/2000","08/09/2002")

我曾嘗試：

R>unique(dates$dates)

但按日期刪除重複的行：

Doc  Dates 
1 12345 c("06/01/2000","08/09/2002") 
2 23456 c("07/01/2000", 09/08/2003") 
3 34567 c("09/06/2004", "12/30/2006")

任何關於如何在日期中只刪除重複元素的幫助，而不是刪除日期重複的行嗎？

* *更新數據

# Match some text string (dates) from some text: 

df1$dates <- as.character(strapply(df1[[2]], "((\\D\\d{1,2}(/|-)\\d{1,2}(/|-)\\d{2,4})| ([^/]\\d{1,2}(/|-)\\d{2,4})|((JAN|FEB|MAR|APR|MAY|JUN|JUL|AUG|SEP|OCT|NOV){1}[\\s|-]{0,2}\\d{1,4}(\\D[\\s|-]{0,}\\d{2,4}){0,}))")) 

# Drop first 2 columns from dataframe 
df2<-df1[ -c(1,2)] 

# List data 
>df2 
872      7/23/2007 
873 c(" 11/4/2007", " 11/4/2007") 
874 c(" 4/2/2008", " 8/2/2007") 
880     11/14/2006 

> class(df2) 
[1] "data.frame" 

> class(df2$dates) 
[1] "character" 

> dput(df2) 
structure(list(dates = c("NULL", "NULL", " 7/23/2007", "c(\" 11/4/2007\", \" 11/4/2007\")", 
"c(\" 4/2/2008\", \" 8/2/2007\")", "NULL", "NULL", "NULL", "NULL", 
"NULL", " 11/14/2006")), .Names = "dates", class = "data.frame", row.names = 870:880)

所以我的問題是如何在行873擺脫了重複的日期？

來源

2013-07-03 user2547308

請提供'dput（日期）'的輸出。它只會複製/粘貼而不是重新創建數據。 – Arun

試試這個：

within(dates, Dates <- lapply(Dates, unique))

來源

2013-07-03 16:00:48

Arun - 不能從我使用的系統中複製/過去（提出要求？非常困難）。我會嘗試內部的，並且除非任何成功，否則將創建一個我可以在系統之外使用並重新發布的數據集。謝謝。 – user2547308

僅供參考 - 我解決了這個問題： – user2547308

僅供參考 - 我解決了這個問題：緊裹lapply（strapply（），唯一的）圍繞strapply：DF1 $日期< - as.character（lapply（（strapply（DF1 [[2]] ，（（\\ D \\ d {1,2}（/ | - ）\\ d {1,2}（/ | - ）\\ d {2,4}）|（[^ /] \\ d {1,2}（/ | - ）\\ d {2,4}）| （（JAN | FEB | MAR | APR | MAY | JUN | JUL | AUG | SEP | OCT | NOV）{1} [ \\ S | - ] {0,2} \\ d {1,4}（\\ d [\\ S | - ] {0，} \\ d {2,4}）{0，}））」）），唯一）） – user2547308

我會gsub了c(和)的日期，然後每一行我會使用,

UNTESTED但也許像調用unique在它的一個strsplit： sapply(dates$dates, function(x){ new.x=gsub("c(|)","",x) new.x=strsplit(new.x, ",") unique(new.x) })

來源

2013-07-03 16:01:34

我認爲'Dates'這個列實際上是一個'list'，而不是字符串。 –

我解決了我在刪除重複的問題來自一個字符矢量的值 - 包裹一個lapply（strapply（），unique）：

df1$date <- as.character(lapply((strapply(df1[[2]], "((\\D\\d{1,2}(/|-)\\d{1,2}(/|- )\\d{2,4})|(\\s\\d{1,2}(/|-)\\d{2,4})|((JAN|FEB|MAR|APR|MAY|JUN|JUL|AUG|SEP|OCT|NOV){1}[\\s|-]{0,2}\\d{1,4}(\\D[\\s|-]{0,}\\d{2,4}){0,}))")),unique))

感謝您的幫助。

來源

2013-07-10 14:59:42 user2547308

您可能正在尋找類似的東西。

df 

    Doc          Dates 
1 12345    c("06/01/2000","08/09/2002") 
2 23456 c("07/01/2000", "09/08/2003", "07/01/2000") 
3 34567 c("09/06/2004", "09/06/2004", "12/30/2006") 
4 45678    c("06/01/2000","08/09/2002") 

Eval and Parse 
x <- t(sapply(df[,"Dates"],function(x){unique(eval(parse(text = x)))})) 
df$Dates <- paste(x[,1],x[,2],sep=",") 

df 
     Doc     Dates 
    1 12345 06/01/2000,08/09/2002 
    2 23456 07/01/2000,09/08/2003 
    3 34567 09/06/2004,12/30/2006 
    4 45678 06/01/2000,08/09/2002 


Same can be achieved using Regex: 

paste(unique(unlist(strsplit(gsub("c\\(|\\)","",'c("24/07/2012","22/01/2012","24/07/2012")'),","))),sep = "") 

[1] "\"24/07/2012\"" "\"22/01/2012\"" 

Haven't tried on data but works

來源

2016-08-05 09:22:25 Sandesh

R刪除字符向量中的重複元素，而不是重複行

回答

相關問題