2016-10-13 50 views
0

相同的對象刪除行我有大約800萬行數據幀的看起來象下面這樣:與數據幀

Trevor Brown Chris Coghlan Starlin Castro Kelby Tomlinson Brandon Crawford Brandon Crawford Kelby Tomlinson Brandon Crawford 

Buster Posey Chris Coghlan Starlin Castro Kelby Tomlinson Brandon Crawford Brandon Crawford Kelby Tomlinson Brandon Crawford 

. 
. 
. 
. 

Trevor Brown Brandon Crawford Starlin Castro Kelby Tomlinson Brandon Crawford Brandon Crawford Kelby Tomlinson Brandon Crawford 

很多行有重複的名字,我想它刪除。我發現很難對每行進行矢量化,然後檢查是否有重複,因爲數據幀有800萬行,因此需要花費很長時間。有沒有更快的方法來完成這項任務?

+0

難道每每行一個字符串? – akrun

+0

每行16個字符串。它是8 x 800萬數據幀。每行八個全名 – James

+0

你可以嘗試'apply'和'unique' – parksw3

回答

0

從我可以從問題和意見中收集的信息,我提出了這個解決方案。

require(gtools) 
a <- LETTERS[1:8] 
data <- permutations(n = 8, r = 8, v = a) 
tail(data) 

#   [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] 
# [40315,] "H" "G" "F" "E" "D" "A" "B" "C" 
# [40316,] "H" "G" "F" "E" "D" "A" "C" "B" 
# [40317,] "H" "G" "F" "E" "D" "B" "A" "C" 
# [40318,] "H" "G" "F" "E" "D" "B" "C" "A" 
# [40319,] "H" "G" "F" "E" "D" "C" "A" "B" 
# [40320,] "H" "G" "F" "E" "D" "C" "B" "A" 

這是否解決了問題? (它沒有字母的任何行重複兩次創建8!組合)

0
df$unique_names <- " " 

for(i in 1:nrow(df)){ 
    df$unique_names[i]<- paste0(unique(unlist(strsplit(df$names[i]," "))),collapse=" ") 

} 

df$unique_names 
[1] "Trevor Brown Chris Coghlan Starlin Castro Kelby Tomlinson Brandon Crawford" 
[2] "Buster Posey Chris Coghlan Starlin Castro Kelby Tomlinson Brandon Crawford" 

數據

df <- data.frame(names=c("Trevor Brown Chris Coghlan Starlin Castro Kelby Tomlinson Brandon Crawford Brandon Crawford Kelby Tomlinson Brandon Crawford" 
,"Buster Posey Chris Coghlan Starlin Castro Kelby Tomlinson Brandon Crawford Brandon Crawford Kelby Tomlinson Brandon Crawford" 
),stringsAsFactors = F)