2016-07-05 42 views
1

我有一個像下面如何使基於字符串

Column1   Column2   Column3 
Q9Y6Y8    P28074   Q9Y6A4 
Q9Y6W5    P28066   Q9Y623 
Q9Y6H1    P27695   Q9Y5W9 
Q5T1J5    P25786;Q9Y623 
Q9Y6A4 
Q9Y623;P27695;Q9Y623 
Q9Y5W9 
Q9Y6Y8 

所以我想用多列數據幀的組合,首先把它們放在一起,並得到他們的獨特像下面

Q9Y6Y8       
Q9Y6W5      
Q9Y6H1      
Q5T1J5    
Q9Y6A4 
Q9Y623 
P27695 
Q9Y623 
Q9Y5W9 
Q9Y6Y8 
P25786 
P28074 
P28066 

然後我想要所有字符串的組合,如下所示:

Q9Y6Y8 Q9Y6W5 
Q9Y6Y8 Q9Y6H1      
Q9Y6Y8 Q9Y6A4       
Q9Y6Y8 Q5T1J5    
Q9Y6Y8 Q9Y6A4 
Q9Y6Y8 Q9Y623 
Q9Y6Y8 P27695 
Q9Y6Y8 Q9Y623 
    . 
    . 
    . 
Q9Y6W5 Q9Y6H1 
Q9Y6W5 Q9Y6A4 
Q9Y6W5 Q5T1J5 
    . 
    . 
    . 

直到所有字符串都在巴黎onc Ë

回答

3

我們可以通過unlist荷蘭國際集團的data.frame做到這一點(如data.frame是list)到vector,通過;拆分,然後unlistlist輸出(從strsplit),並得到了unique元素作爲vector

Un1 <- unique(unlist(strsplit(unlist(df1), ";"))) 

從這一點,我們可以使用expand.grid

expand.grid(Un1, Un1) 

得到所有的組合或者,如果我們只需要有限的組合,可以使用combn

t(combn(Un1, 2)) 
#  [,1]  [,2]  
# [1,] "Q9Y6Y8" "Q9Y6W5" 
# [2,] "Q9Y6Y8" "Q9Y6H1" 
# [3,] "Q9Y6Y8" "Q5T1J5" 
# [4,] "Q9Y6Y8" "Q9Y6A4" 
# [5,] "Q9Y6Y8" "Q9Y623" 
# [6,] "Q9Y6Y8" "P27695" 
# [7,] "Q9Y6Y8" "Q9Y5W9" 
# [8,] "Q9Y6Y8" "P28074" 
# [9,] "Q9Y6Y8" "P28066" 
#[10,] "Q9Y6Y8" "P25786" 
#[11,] "Q9Y6W5" "Q9Y6H1" 
#[12,] "Q9Y6W5" "Q5T1J5" 
#[13,] "Q9Y6W5" "Q9Y6A4" 
#[14,] "Q9Y6W5" "Q9Y623" 
#[15,] "Q9Y6W5" "P27695" 
#[16,] "Q9Y6W5" "Q9Y5W9" 
#[17,] "Q9Y6W5" "P28074" 
#[18,] "Q9Y6W5" "P28066" 
#[19,] "Q9Y6W5" "P25786" 
#[20,] "Q9Y6H1" "Q5T1J5" 
#[21,] "Q9Y6H1" "Q9Y6A4" 
#[22,] "Q9Y6H1" "Q9Y623" 
#[23,] "Q9Y6H1" "P27695" 
#[24,] "Q9Y6H1" "Q9Y5W9" 
#[25,] "Q9Y6H1" "P28074" 
#[26,] "Q9Y6H1" "P28066" 
#[27,] "Q9Y6H1" "P25786" 
#[28,] "Q5T1J5" "Q9Y6A4" 
#[29,] "Q5T1J5" "Q9Y623" 
#[30,] "Q5T1J5" "P27695" 
#[31,] "Q5T1J5" "Q9Y5W9" 
#[32,] "Q5T1J5" "P28074" 
#[33,] "Q5T1J5" "P28066" 
#[34,] "Q5T1J5" "P25786" 
#[35,] "Q9Y6A4" "Q9Y623" 
#[36,] "Q9Y6A4" "P27695" 
#[37,] "Q9Y6A4" "Q9Y5W9" 
#[38,] "Q9Y6A4" "P28074" 
#[39,] "Q9Y6A4" "P28066" 
#[40,] "Q9Y6A4" "P25786" 
#[41,] "Q9Y623" "P27695" 
#[42,] "Q9Y623" "Q9Y5W9" 
#[43,] "Q9Y623" "P28074" 
#[44,] "Q9Y623" "P28066" 
#[45,] "Q9Y623" "P25786" 
#[46,] "P27695" "Q9Y5W9" 
#[47,] "P27695" "P28074" 
#[48,] "P27695" "P28066" 
#[49,] "P27695" "P25786" 
#[50,] "Q9Y5W9" "P28074" 
#[51,] "Q9Y5W9" "P28066" 
#[52,] "Q9Y5W9" "P25786" 
#[53,] "P28074" "P28066" 
#[54,] "P28074" "P25786" 
#[55,] "P28066" "P25786" 

注意:在這裏,我假設列都是character類。

+0

@nik您的專欄是「因素」。所以'strsplit(as.character(unlist(df1)),「,」)' – akrun

+1

我喜歡你的答案,但我必須等待2分鐘,然後接受它 – nik

+0

你可以請添加一些描述嗎?你爲什麼要兩次使用unlist? – nik