你好,我有以下data.frame(追加)。我想添加一個標準化計數的額外列N = N/sum(N)
。我有沒有日期列前一個data.frame,並能夠做到這一點使用正常化數據R
oo[, N.norm := N/sum(N), by=Operator]
我試圖通過功能
oo[, N.norm := N/sum(N), by=Operator,Date]
到日期添加到,但收到一條錯誤消息
Error in `[.data.frame`(oo, , `:=`(N.norm, N/sum(N)), by = Operator, Date) :
unused argument(s) (by = Operator)
例如,對於運營商「A」在月「2013年1月」,我有每個計數N
數量= c(「好」,「好」,「差」,「廢話」)。我想總結n該組合(A和2013年1月)和sum(N)
劃分數N
在另一方面,任何人都可以給我提供一個體面的介紹操縱data.frames R中
structure(list(Operator = structure(c(1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L, 5L,
5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L), .Label = c("A",
"D", "J", "L", "M"), class = "factor"), ROI_Score = structure(c(1L,
1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 4L, 4L,
4L, 4L, 4L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L,
3L, 3L, 4L, 4L, 4L, 4L, 4L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L,
2L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 1L, 1L, 1L, 1L, 1L,
2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 1L,
1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 4L, 4L,
4L, 4L, 4L), .Label = c("Crap", "Good", "OK", "Poor"), class = "factor"),
Date = c("Apr 2013", "Feb 2013", "Jan 2013", "Mar 2013",
"May 2013", "Apr 2013", "Feb 2013", "Jan 2013", "Mar 2013",
"May 2013", "Apr 2013", "Feb 2013", "Jan 2013", "Mar 2013",
"May 2013", "Apr 2013", "Feb 2013", "Jan 2013", "Mar 2013",
"May 2013", "Apr 2013", "Feb 2013", "Jan 2013", "Mar 2013",
"May 2013", "Apr 2013", "Feb 2013", "Jan 2013", "Mar 2013",
"May 2013", "Apr 2013", "Feb 2013", "Jan 2013", "Mar 2013",
"May 2013", "Apr 2013", "Feb 2013", "Jan 2013", "Mar 2013",
"May 2013", "Apr 2013", "Feb 2013", "Jan 2013", "Mar 2013",
"May 2013", "Apr 2013", "Feb 2013", "Jan 2013", "Mar 2013",
"May 2013", "Apr 2013", "Feb 2013", "Jan 2013", "Mar 2013",
"May 2013", "Apr 2013", "Feb 2013", "Jan 2013", "Mar 2013",
"May 2013", "Apr 2013", "Feb 2013", "Jan 2013", "Mar 2013",
"May 2013", "Apr 2013", "Feb 2013", "Jan 2013", "Mar 2013",
"May 2013", "Apr 2013", "Feb 2013", "Jan 2013", "Mar 2013",
"May 2013", "Apr 2013", "Feb 2013", "Jan 2013", "Mar 2013",
"May 2013", "Apr 2013", "Feb 2013", "Jan 2013", "Mar 2013",
"May 2013", "Apr 2013", "Feb 2013", "Jan 2013", "Mar 2013",
"May 2013", "Apr 2013", "Feb 2013", "Jan 2013", "Mar 2013",
"May 2013", "Apr 2013", "Feb 2013", "Jan 2013", "Mar 2013",
"May 2013"), N = c(0, 0, 0, 0, 0, 1, 2, 15, 1, 5, 3, 2, 3,
1, 0, 3, 0, 5, 5, 1, 0, 0, 0, 1, 0, 14, 17, 16, 8, 7, 5,
10, 6, 1, 5, 24, 27, 31, 16, 15, 0, 0, 0, 0, 0, 26, 24, 20,
11, 18, 3, 4, 17, 3, 2, 20, 36, 12, 21, 9, 0, 0, 0, 0, 0,
3, 12, 5, 12, 4, 0, 0, 3, 4, 0, 29, 37, 41, 25, 10, 0, 0,
0, 0, 0, 9, 9, 15, 17, 3, 6, 4, 5, 4, 1, 14, 13, 9, 15, 9
)), .Names = c("Operator", "ROI_Score", "Date", "N"), row.names = c(NA,
100L), class = "data.frame")
我不確定數據是以data.frame還是data.table格式。這裏是我的代碼,改編自阿倫(reshape/remould data frame to create normalized bar chart and pie chart)給出解決辦法
df <- data.frame(read.csv("/misc/jaguar_data/report/system/db_fs/roi_scores.csv"))
#Get date into nice structure for faceting
df$Date = strftime(strptime(df$Date,f="%d/%m/%Y"), "%b %Y")
dt <- data.table(df)
ops <- as.character(unique(dt$Operator))
scr <- as.character(unique(dt$ROI_Score))
dts <- unique(dt$Date)
oo <- setkey(dt[, .N, by="Operator,ROI_Score,Date"], Operator,
ROI_Score,Date)[CJ(ops, scr,dts)][is.na(N), N:= 0L]
oo[, N.norm := N/sum(N), by=Operator]
這個附加列:第i行的N.norm應該是N [i]/sum(N [1 ... i),但是由操作員和日期彙總?你真的是指'data.table'而不是'data.frame'嗎? ':='運算符僅限於'data.table'。請澄清您正在使用的結構:您給了我們一個數據框。 –
@BryanHanson - 我不確定。我已經更新了我的問題,以解釋我如何使用數據結構oo。它最初是一個data.frame,但我認爲它現在是一個data.table – moadeep
你絕對使用'data.table',看你自己的代碼,這使得清楚(你開始一個'data.frame',但它轉向它到'data.table')。通常在數據集非常大且速度非常關鍵時使用這些數據。否則,'data.frame'通常很好。你試圖計算什麼? –