2016-02-15 100 views
-1

我想基於列合併文件。這些文件沒有相似的行數。輸出應包含所有行,如果某個文件不存在,則計數應爲0合併文件rowwise

我嘗試類似:

file_list <- list.files(pattern = "*.mature") 

    > dataset_tumor <- do.call("cbind",lapply(file_list, 
+ FUN=function(files){read.table(files, 
+ header=TRUE, sep="")})) 
Error in data.frame(..., check.names = FALSE) : 
    arguments imply differing number of rows: 497, 642, 692, 694, 699, 515, 707, 740, 605, 568, 602, 512, 624, 634, 551, 662, 750, 442, 615, 557, 466, 638, 560, 576, 851, 705, 614, 547, 670, 752, 586, 671, 754, 603, 666, 587, 601, 572, 550, 573, 621, 650, 701, 622, 735, 434, 742, 737, 809, 661, 540, 645, 722, 594, 681, 659, 781, 613, 641, 756, 595, 966, 658, 539, 520, 619, 564, 732, 679, 596, 536, 518, 631, 691, 708, 625, 630, 589, 639, 538 


> head(a.mature) 
       X4 
hsa-let-7a-5p 12342 
hsa-let-7b-3p 27 
hsa-let-7b-5p 47413 
hsa-let-7c-5p 2825 
hsa-let-7d-3p 1162 
hsa-let-7d-5p 219 
> head(b.mature) 
       X15 
hsa-let-7a-5p 28868 
hsa-let-7b-3p 41 
hsa-let-7b-5p 62259 
hsa-let-7c-5p 4468 
hsa-let-7k-3p 2027 
hsa-let-7f-5p 938 

   X4  X15 
hsa-let-7a-5p 12342  28868 
hsa-let-7b-3p 27   41 
hsa-let-7b-5p 47413  62259 
hsa-let-7c-5p 2825  4468 
hsa-let-7d-3p 1162  0 
hsa-let-7d-5p 219  0 
hsa-let-7k-3p 0   2027 
hsa-let-7f-5p 0   938 
+0

你看過'?merge'嗎? –

+0

是的,但我沒有找到 – user2300940

回答

0

就像在primary keyforeign key數據庫,需要兩個數據集之間的公共列於兩個數據集相結合。從合併功能的例子

authors <- data.frame(
    surname = I(c("Tukey", "Venables", "Tierney", "Ripley", "McNeil")), 
    nationality = c("US", "Australia", "US", "UK", "Australia"), 
    deceased = c("yes", rep("no", 4))) 

books <- data.frame(
    name = I(c("Tukey", "Venables", "Tierney", 
       "Ripley", "Ripley", "McNeil", "R Core")), 
    title = c("Exploratory Data Analysis", 
      "Modern Applied Statistics ...", 
      "LISP-STAT", 
      "Spatial Statistics", "Stochastic Simulation", 
      "Interactive Data Analysis", 
      "An Introduction to R"), 
       other.author = c(NA, "Ripley", NA, NA, NA, NA, 
       "Venables & Smith")) 

在這裏,我們有兩個dataframes,我們在作者surname列是相同的書籍數據幀name列。因此,我們可以利用這些字段使用合併數據集:

m1 <- merge(authors, books, by.x = "surname", by.y = "name") 

如果你想保留所有的書在合併數據框中,您可以使用合併功能all.yall.x參數,無論你保持第一。

m1 <- merge(authors, books, by.x = "surname", by.y = "name", all.y =TRUE) 

OR

m1 <- merge(books, authors, by.x = "name", by.y = "surname", all.x =TRUE) 

同樣的,你也可以在plyr包,它可以合併兩個以上的文件使用join_all功能。

+0

Join_all似乎工作,但是,如何包含在所有文件中找不到的行?我需要包括他們作爲不適用 – user2300940

+0

@ user2300940:看到這個答案http://stackoverflow.com/a/21438584/476907 ...有如何做到這一點的細節。 – discipulus