重構從長篇值對稱矩陣

我有一個TSV，看起來像這樣（長篇）：重構從長篇值對稱矩陣

one two value 
    a  b  30 
    a  c  40 
    a  d  20 
    b  c  10 
    b  d  05 
    c  d  30

我試圖進入的R（或熊貓）一個數據幀此

a b c d 
a 00 30 40 20 
b 30 00 10 05 
c 40 10 00 30 
d 20 05 30 00

問題是，在我的tsv我只有a，b定義而不是b，a。所以我在我的數據框中獲得了很多的NAs。

最終目標是獲得距離矩陣以用於聚類。任何幫助，將不勝感激。

來源

2015-06-28 jwillis0720

您可能會發現在'功能daisy'如果你需要聚類分類變量，'cluster'軟件包很有用，例如'cutree（hclust（daisy（data）），k = 2）'返回矢量'1 1 1 2 2 1'。 –

請使用'dput（）'發佈可再現的代碼示例。 – smci

確保您的數據進行排序tsv=tsv[with(tsv,order(one,two)),]，並嘗試這個辦法：

n=4 
B <- matrix(rep(0,n*n), n) 
dimnames(B) <- list(letters[1:n],letters[1:n]) 
B[lower.tri(B)] <- tsv$value 
B[upper.tri(B)]=tsv$value 
B

來源

2015-06-28 02:10:23 Robert

您可以嘗試

un1 <- unique(unlist(df1[1:2])) 
df1[1:2] <- lapply(df1[1:2], factor, levels=un1) 
m1 <- xtabs(value~one+two, df1) 
m1+t(m1) 
# two 
#one a b c d 
#a 0 30 40 20 
#b 30 0 10 5 
#c 40 10 0 30 
#d 20 5 30 0

或者你使用row/col指數

m1 <- matrix(0, nrow=length(un1), ncol=length(un1), 
           dimnames=list(un1, un1)) 
    m1[cbind(match(df1$one, rownames(m1)), 
       match(df1$two, colnames(m1)))] <- df1$value 
    m1+t(m1) 
    # a b c d 
    #a 0 30 40 20 
    #b 30 0 10 5 
    #c 40 10 0 30 
    #d 20 5 30 0

來源

2015-06-28 03:49:11 akrun

的igraph解決方案，您在數據框中讀取值，並將該值假定爲邊緣權重。然後，您可以將它轉換爲鄰接矩陣

dat <- read.table(header=T, text=" one two value 
    a  b  30 
    a  c  40 
    a  d  20 
    b  c  10 
    b  d  05 
    c  d  30") 

library(igraph) 

# Make undirected so that graph matrix will be symmetric 
g <- graph.data.frame(dat, directed=FALSE) 

# add value as a weight attribute 
get.adjacency(g, attr="value", sparse=FALSE) 
# a b c d 
#a 0 30 40 20 
#b 30 0 10 5 
#c 40 10 0 30 
#d 20 5 30 0

來源

2015-06-28 04:43:53 user20650

另一種方法是reshape::cast

df.long = data.frame(one=c('a','a','a','b','b','c'), 
        two=c('b','c','d','c','d','d'), 
        value=c(30,40,20,10,05,30)) 

# cast will recover the upper/lower-triangles... 
df <- as.matrix(cast(df.long, one ~ two, fill=0)) 
# b c d 
# a 30 40 20 
# b 0 10 5 
# c 0 0 30

所以我們構建矩陣全索引，並插入：

df <- matrix(nrow=length(indices), ncol=length(indices),dimnames = list(indices,indices))  
diag(df) <- 0 
# once we assure that the full upper-triangle is present and in sorted order (as Robert's answer does), then we 
df[upper.tri(df)] <- as.matrix(cast(df.long, one ~ two, fill=0)) 
df[lower.tri(df)] <- df[upper.tri(df)]

UPDATE ：原始草圖中包含這些手動卡012

那麼同樣的方法來添加缺少的行「d」和列「A」，並通過加入轉置T（DF）填充下三角：

df <- cbind(a=rep(0,4), rbind(df, d=rep(0,3))) 
# a b c d 
# a 0 30 40 20 
# b 0 0 10 5 
# c 0 0 0 30 
# d 0 0 0 0 

df + t(df) 
# a b c d 
# a 0 30 40 20 
# b 30 0 10 5 
# c 40 10 0 30 
# d 20 5 30 0

來源

2015-06-28 05:05:45 smci

我知道，這只是一個素描，@羅伯特的答案很好。很容易從'levels（unlist（df.long [，1：2]））' – smci

'完成索引，我已經這麼做了。填補空白並不難。我指出'重塑::鑄造'。 – smci

完成。我意識到你在開始我的工作之前更加普遍。爲了社區，我只是指出'reshape :: cast'的用處。 – smci

重構從長篇值對稱矩陣

回答

相關問題