在兩個數據幀之間合併相似的行

仍然得到了R的要點。我有兩個數據框，其中的行使用不同的座標命名（例如x_1013y_41403;請參見下文）。座標形式爲五個集合，如果繪製到網格上，則每個集合都進行交叉。中心座標位於一個數據框中，四個外圍座標位於另一個數據框中。在兩個數據幀之間合併相似的行

Center     A  B  C  D  E  F 
x_723y_6363.txt  554  NA  604  NA  645  NA 
x_749y_41403.txt  14  NA  6  NA  13  NA 

Peripheral    A  B  C  D  E  F 
x_1013y_41403.txt  NA  1  NA  0  NA  0 
x_459y_6363.txt  NA  2  NA  1  NA  4 
x_485y_41403.txt  NA  0  NA  0  NA  0 
x_723y_6100.txt  NA  1  NA  0  NA  3 
x_723y_6627.txt  NA  1  NA  0  NA  1 
x_749y_41139.txt  NA  1  NA  0  NA  0 
x_749y_41667.txt  NA  2  NA  0  NA  0 
x_987y_6363.txt  NA  1  NA  0  NA  0

以形成一組，外圍座標將具有相同的x或y位置爲中心座標。例如，中心座標x_723y_6363將與x_723y_6100和x_723y_6627（x位置相同）以及x_459y_6363和x_987y_6363（y位置相同）相關聯。

我想將座標合併到它們各自的集合中，並將集合命名爲中心座標。對於上面的情況，我最終會得到兩行，其中每行是一個集合的總和。

     A  B  C  D  E  F 
x_723y_6363.txt  554  5  604  1  645  8 
x_749y_41403.txt  14  4  6  0  13  0

我不確定這是怎麼做到的。我曾想過創建正則表達式來分別選取x和y座標，然後在兩個數據框之間進行比較。任何幫助將不勝感激！

來源

2013-07-05 user2554798

您可以編輯您的問題，包括dput的'輸出（頭（中心））'和'dput（頭（外設））'？ – Thomas

我希望有人提出更好的答案，因爲這很醜陋。我首先將.txt名稱分解爲x和y值，然後遍歷每個中心爲NA的變量，然後對所有與該中心共享x或y值的值進行求和。 編輯：更改了sapply使其稍好一點。

center <- read.table(textConnection("                                       
A B C D E F                                              
x_723y_6363.txt  554  NA  604  NA  645  NA                                
x_749y_41403.txt  14  NA  6  NA  13  NA"), 
        header = TRUE) 

peripheral <- read.table(textConnection("                                      
A  B  C  D  E  F                                      
x_1013y_41403.txt  NA  1  NA  0  NA  0                                
x_459y_6363.txt  NA  2  NA  1  NA  4                                
x_485y_41403.txt  NA  0  NA  0  NA  0                                
x_723y_6100.txt  NA  1  NA  0  NA  3                                
x_723y_6627.txt  NA  1  NA  0  NA  1                                
x_749y_41139.txt  NA  1  NA  0  NA  0                                
x_749y_41667.txt  NA  2  NA  0  NA  0                                
x_987y_6363.txt  NA  1  NA  0  NA  0"), 
         header = TRUE) 

xpat <- "^([^y]+).*" 
ypat <- ".*(y_[0-9]+)\\.txt" 
center$x <- gsub(xpat, "\\1", rownames(center)) 
center$y <- gsub(ypat, "\\1", rownames(center)) 
peripheral$x <- gsub(xpat, "\\1", rownames(peripheral)) 
peripheral$y <- gsub(ypat, "\\1", rownames(peripheral)) 


vars <- c("B", "D", "F") 

center[vars] <- sapply(peripheral[vars], function(col) 
    apply(center, 1, function(row) sum(col[peripheral$x %in% row["x"] | peripheral$y %in% row["y"]])) 
) 

R> center 
        A B C D E F  x  y 
x_723y_6363.txt 554 5 604 1 645 8 x_723 y_6363 
x_749y_41403.txt 14 4 6 0 13 0 x_749 y_41403

來源

2013-07-05 20:21:55

+1，工作正常！ –

另一種選擇：

# function to split coordinates x and y: 

f <- function(DF) structure(
    t(sapply(strsplit(row.names(DF), "[_y.]"), `[`, c(2,4))), 
    dimnames=list(NULL, c("x", "y"))) 

# get x and y for peripheral data: 

P <- cbind(Peripheral, f(Peripheral)) 

# get x and y for centers, and mark ids: 

C <- cbind(Center, f(Center), id=1:nrow(Center)) 

# matching: 

Q <- merge(merge(P, C[,c("x","id")], all=TRUE), C[,c("y","id")], by="y", all=TRUE) 

# prepare for union: 

R <- within(Q, {id <- ifelse(is.na(id.y), id.x, id.y); id.x <- NULL; id.y <- NULL}) 

# join everything and aggregate: 

S <- rbind(R, C) 

aggregate(S[,3:8], by=list(id=S$id), FUN=sum, na.rm=TRUE)

結果：

id A B C D E F 
1 1 554 5 604 1 645 8 
2 2 14 4 6 0 13 0

來源

2013-07-05 20:45:23

在兩個數據幀之間合併相似的行

回答

相關問題