2012-11-28 140 views
1

,如果我有一個表,看起來像這樣(函數read.table與閱讀):結合具有相同名稱的項目到數據幀

http://i.stack.imgur.com/gaef6.png

(中數字0是佔位符,所以我有一個一致的數行)

有沒有一種方法,我可以添加具有相同的相應的名稱的值(這兩個值爲1,所有4個值爲l)並將其輸出爲數據幀?此外,行並不一致(即某些列有4分一個的,一些具有2)

結果應該是這樣的:

http://i.stack.imgur.com/HrCt5.png

我可以組裝與A,B等中的數據幀作爲行名稱和列一旦值的總和,但我不知道如何根據他們的相應名稱進行總結。

這是我目前如何處理這個:

  • read.table()我的表

  • 定義空data.frame

  • 使用循環提取並通過相應的名稱添加列值? <與我剛生成

  • cbind()數據幀和列 - 需要幫助通過環的末端繼續

次使用row.names()和colnames()來改變排和列名

到目前爲止我的代碼:

setwd(wd) 
read.table(dat.txt,sep="\t")->x 
read.table(total.txt, sep="\t")->total 
met<-c("a","b","c","d","e","f","g","h","i","j","k","l","m","n","o","p","q","r") 
y<-data.frame(), 
total[2,]->name 
for g in 1:ncol(x){ 
    #column will be the column with the combined values according to name 
    cbind(y,column) 
} 
row.names(y)<-met 
colnames(y)<-name 
write.table(y,file="data.txt",sep="\t") 

任何幫助將不勝感激。

+0

還有的兩個值,B等在thing1,thing2,等你想怎麼說在最後的df中代表? thing1的第二個值是否成爲thing3? –

+2

您應該編輯您的問題以包含dput(dfrm)的結果,其中dfrm是由read.table()操作創建的對象的名稱。 –

+0

我應該澄清:我的數據有2個以上的元素,大約有120對這樣的列,所以事情3只是爲了說明我會繼續添加更多的列 我已經更新了問題以包含我的R代碼到目前爲止。 – user1537589

回答

0

您可以打破這兩個部分:

(1)彙總各列 (2)組合成一個漂亮的表

它可以通過*申請被很好地完成。我在for循環,使其更易於閱讀打破下來

## sample df 
    df <- structure(list(thing1.met = c("a", "a", "b", "b", "c", "c", "d", "d", "e", "e", "f", "f", "g", "g", "h", "h", "i", "i", "j", "j", "k", "k", "l", "l", "m", "n", "o", "p", "q", "r"), thing1 = c(-0.57, 0.42, -1.12, -0.5, -0.94, -1.87, -2.22, -0.33, 1.45, 0.46, -1.96, -0.35, -1.01, 0.72, 0.04, -0.21, 0.81, 1.28, -0.52, -1.19, 0.03, -1.71, 0.53, -1.96, 1.58, -0.1, -0.88, 0.92, 0.02, -0.91), thing2.met = c("a", "a", "a", "b", "b", "c", "c", "c", "d", "e", "e", "f", "f", "g", "g", "h", "h", "i", "i", "j", "j", "k", "l", "l", "m", "n", "o", "p", "q", "r"), thing2 = c(-0.06, -0.7, 0.16, 1.96, 0.78, -0.65, -0.17, 0.89, 0.68, -0.93, -1.44, -0.16, -0.52, -0.19, 1.15, -0.77, 0.69, -0.48, 1.75, 1.62, -0.68, -1.06, -1.2, 1.42, -0.2, 1.33, 2.24, 0.35, 2, -1.21), thing3.met = c("a", "a", "a", "b", "c", "c", "d", "d", "e", "e", "f", "f", "g", "g", "g", "g", "g", "i", "k", "l", "l", "l", "m", "o", "o", "o", "p", "p", "p", "q"),  thing3 = c(1.27, 4.45, -2.42, -9.53, 3.33, 5.58, -2.94, 2.54,  12.44, 12.41, 7.6, 0.63, 5.67, -3.79, 12.28, 1.77, -0.4,  -0.04, 0.95, 4.93, 1.77, 0.37, -2.79, 2.36, 12.76, -5.4,  -4.73, -1.8, 0.52, -4.97)), .Names = c("thing1.met", "thing1", "thing2.met", "thing2", "thing3.met", "thing3"), row.names = c(NA, -30L), class = "data.frame") 

    # create blank data frame 
    results <- data.frame(row.names=met) 

    # grab every other column 
    cols <- seq(2, ncol(df), 2) 

    # aggregate over all m, over every pair of columns 
    for (m in met) { 
    for (c in cols) { 
     results[m, c/2] <- sum(df[,c][df[,c-1]==m]) 
    } 
    } 

    # clean up column names 
    names(results) <- names(df)[cols] 

    # final output 
    results 

例如:

> results 
     thing1 thing2 thing3 
    a -0.15 -0.60 3.30 
    b -1.62 2.74 -9.53 
    c -2.81 0.07 8.91 
    d -2.55 0.68 -0.40 
    e 1.91 -2.37 24.85 
    f -2.31 -0.68 8.23 
    g -0.29 0.96 15.53 
    h -0.17 -0.08 0.00 
    i 2.09 1.27 -0.04 
    j -1.71 0.94 0.00 
    k -1.68 -1.06 0.95 
    l -1.43 0.22 7.07 
    m 1.58 -0.20 -2.79 
    n -0.10 1.33 0.00 
    o -0.88 2.24 9.72 
    p 0.92 0.35 -6.01 
    q 0.02 2.00 -4.97 
    r -0.91 -1.21 0.00 
相關問題