2012-06-30 82 views
0

我聚集在一個國家一年格式的數據集與重塑,聚合/連接字符串

melted <- melt(data, id = c("ccode.a","year")) 

data.fix <- function(x) c(max = max(x), sum = sum(x), min = min(x), 
          newcol = paste(x, sep = ",")) 
casted <- cast(melted, ccode.a + year ~ ..., data.fix) 

我想串聯conflictID.a,這樣在我聚集多行成一個單一的行實例我得到了所有彙總的conflictID.a的值。

下面是一些示例數據:

dput(tail(subset(data, select=c(ccode.a,year,onset,conflictID.a)), 100))

我人工修飾的數據重現該問題也。因此,有兩種情況,其中有兩個或更多行具有相同的yearccode.a值,但具有不同的conflictID.a值,我想在每個ccode.a,year的聚合中將它們連接在一起。

structure(list(ccode.a = c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 41L, 41L, 
41L, 52L, 52L, 70L, 70L, 90L, 90L, 90L, 90L, 90L, 90L, 90L, 90L, 
90L, 90L, 90L, 90L, 90L, 90L, 90L, 90L, 90L, 90L, 90L, 90L, 90L, 
92L, 92L, 92L, 92L, 92L, 92L, 92L, 92L, 92L, 92L, 92L, 92L, 92L, 
93L, 93L, 93L, 93L, 93L, 93L, 93L, 93L, 93L, 93L, 93L, 93L, 95L, 
95L, 100L, 100L, 100L, 100L, 100L, 100L, 100L, 100L, 100L, 100L, 
100L, 100L, 100L, 100L, 100L, 100L, 100L, 100L, 100L, 100L, 100L, 
100L, 100L, 100L, 100L, 100L, 100L, 100L, 100L, 100L, 100L, 100L, 
100L, 100L, 101L, 101L, 115L, 130L), year = c(2001, 2001, 2001, 
2005, 2006, 2007, 2008, 1989, 1991, 2004, 1990, 1990, 1994, 1996, 
1975, 1976, 1977, 1978, 1979, 1980, 1981, 1982, 1983, 1984, 1985, 
1986, 1987, 1988, 1989, 1990, 1991, 1992, 1993, 1994, 1995, 1979, 
1980, 1981, 1982, 1983, 1984, 1985, 1986, 1987, 1988, 1989, 1990, 
1991, 1977, 1978, 1979, 1982, 1983, 1984, 1985, 1986, 1987, 1988, 
1989, 1990, 1989, 1989, 1975, 1976, 1977, 1978, 1979, 1980, 1981, 
1982, 1983, 1984, 1985, 1986, 1987, 1988, 1989, 1990, 1991, 1992, 
1993, 1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 
2004, 2005, 2006, 2007, 2008, 1982, 1982, 1982, 1995), onset = c(1, 
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1), conflictID.a = c(224L, 
224L, 224L, 224L, 224L, 224L, 224L, 186L, 186L, 186L, 183L, 183L, 
205L, 205L, 36L, 36L, 36L, 36L, 36L, 36L, 36L, 36L, 36L, 36L, 
36L, 36L, 36L, 36L, 36L, 36L, 36L, 36L, 36L, 36L, 36L, 120L, 
120L, 120L, 120L, 120L, 120L, 120L, 120L, 120L, 120L, 120L, 120L, 
120L, 140L, 140L, 140L, 140L, 140L, 140L, 140L, 140L, 140L, 140L, 
140L, 140L, 173L, 172L, 92L, 92L, 92L, 92L, 92L, 92L, 92L, 92L, 
92L, 92L, 92L, 92L, 92L, 92L, 92L, 92L, 92L, 92L, 92L, 92L, 92L, 
92L, 92L, 92L, 92L, 92L, 92L, 92L, 92L, 92L, 92L, 92L, 92L, 92L, 
80L, 80L, 162L, 208L)), .Names = c("ccode.a", "year", "onset", 
"conflictID.a"), row.names = c(127L, 128L, 130L, 131L, 132L, 
133L, 134L, 277L, 279L, 292L, 395L, 396L, 452L, 454L, 494L, 495L, 
496L, 497L, 498L, 499L, 500L, 501L, 502L, 503L, 504L, 505L, 506L, 
507L, 508L, 509L, 510L, 511L, 512L, 513L, 514L, 566L, 567L, 568L, 
569L, 570L, 571L, 572L, 573L, 574L, 575L, 576L, 577L, 578L, 598L, 
599L, 600L, 603L, 604L, 605L, 606L, 607L, 608L, 609L, 610L, 611L, 
678L, 679L, 699L, 700L, 701L, 702L, 703L, 704L, 705L, 706L, 707L, 
708L, 709L, 710L, 711L, 712L, 713L, 714L, 715L, 716L, 717L, 718L, 
719L, 720L, 721L, 722L, 723L, 724L, 725L, 726L, 727L, 728L, 729L, 
730L, 731L, 732L, 740L, 750L, 812L, 854L), class = "data.frame") 

回答

2

你不需要reshape這一點,只使用純aggregate

# All aggregated values 
aggregate(data$conflictID.a,by=list(data$ccode.a,data$year),c) 
# Just unique values 
aggregate(data$conflictID.a,by=list(data$ccode.a,data$year),unique) 
+0

不得不適應一點點,因爲我彙總了所有的數據,但是這個工作非常好。謝謝!有沒有辦法做到這一點與重塑,但?好奇。 – Zach

+0

我以爲'cast(熔化,ccode.a +年〜變量,fun.aggregate =列表)'會工作,但它沒有。我無法弄清楚爲什麼。 – nograpes