我想根據兩列進行聚合,並獲取每列中所有唯一元素的輸出。例如,在下面的數據集中,空白區域是NAs,我想基於o和d(我做過)對這些行程進行求和。但是o和d列中的元素A沒有另一對,因此不會出現在輸出中。我怎樣才能在這兩列中包含A,並且使得旅行0(對應於全部爲)。我想要的輸出矩陣也被附加。 在此先感謝!聚合並連接輸出中的其餘唯一元素
CODE
df<-read.csv("smallexample.csv",header = TRUE)
df[["trips"]][is.na(df[["trips"]])] <- 0
#aggregating the trips
result1<-aggregate(trips ~o+d, data=df,sum)
#change from long to wide format
result2<-dcast(result1, o ~ d)
DATA
structure(list(o = structure(c(2L, 1L, 4L, 2L, 1L, 5L, 1L, 6L,
2L, 1L, 4L, 5L, 2L, 4L, 6L, 3L), .Label = c("", "A", "B", "C",
"D", "E"), class = "factor"), d = structure(c(1L, 2L, 3L, 1L,
2L, 1L, 2L, 3L, 1L, 2L, 5L, 1L, 1L, 3L, 3L, 4L), .Label = c("",
"A", "B", "C", "E"), class = "factor"), trips = c(2, 3, 4, 5,
1.5, NA, NA, 1, 4, NA, 6, NA, 0.5, 6, 2, 1)), .Names = c("o",
"d", "trips"), class = "data.frame", row.names = c(NA, -16L))
OUPUT:
structure(list(X = structure(1:5, .Label = c("A", "B", "C", "D",
"E"), class = "factor"), A = c(0L, 0L, 0L, 0L, 0L), B = c(0L,
0L, 10L, 0L, 3L), C = c(0L, 1L, 0L, 0L, 0L), D = c(0L, 0L, 0L,
0L, 0L), E = c(0L, 0L, 6L, 0L, 0L)), .Names = c("X", "A", "B",
"C", "D", "E"), class = "data.frame", row.names = c(NA, -5L))
作品非常適合這個小例子另一種選擇!我試圖將它變成一個很大的問題,它是美國的州而不是信件,並試圖將它們保存在單獨的文件中不起作用。 – santosh
你能提供一些例子來說明嗎? – JasonWang
明白了!我應該使用因子(數據集$變量,水平=排序(唯一(數據集$變量)),有序= TRUE)。無論如何,非常感謝! – santosh