2012-12-18 44 views
-2

我有一份每日發佈的Excel報告,需要我總結並提供趨勢分析。此報告具有創建日期,工作項類型的工作項目清單。我如何計算2011年,2012年創建的工作項目?另外,如何通過工作項類型獲得計數?到目前爲止,我已經能夠加載Excel數據,並通過執行以下獲得的行數 -根據標準獲取行數 - 特定年份的日期字段,按文本字段分組

library(gdata) 
wi20121812 = read.xls("WorkItemReport20121812.xls") 
nrow(wi20121812) 

樣本數據

> dput(head(workItemReport2)) 
structure(list(DocType = structure(c(6L, 7L, 6L, 6L, 8L, 6L), .Label = c("TYPE10WI", 
"TYPE11WI", "TYPE12WI", "TYPE13WI", "TYPE14WI", "TYPE1WI", "TYPE2WI", 
"TYPE3WI", "TYPE4WI", "TYPE5WI", "TYPE6WI", "TYPE7WI", "TYPE8WI", 
"TYPE9WI"), class = "factor"), CreatedDate = structure(c(7L, 
22L, 146L, 181L, 153L, 191L), .Label = c("1/10/12 15:43 AM/PM ", 
"1/10/12 16:06 AM/PM ", "1/10/12 5:28 AM/PM ", "1/10/12 5:56 AM/PM ", 
"1/11/12 19:51 AM/PM ", "1/11/12 5:26 AM/PM ", "1/12/11 21:58 AM/PM ", 
"1/12/12 11:08 AM/PM ", "1/12/12 5:41 AM/PM ", "1/12/12 9:56 AM/PM ", 
"1/13/12 14:01 AM/PM ", "1/13/12 15:08 AM/PM ", "1/13/12 15:11 AM/PM ", 
"1/13/12 8:51 AM/PM ", "1/16/12 10:27 AM/PM ", "1/16/12 10:28 AM/PM ", 
"1/16/12 16:37 AM/PM ", "1/16/12 7:52 AM/PM ", "1/18/12 15:02 AM/PM ", 
"1/18/12 16:03 AM/PM ", "1/18/12 16:13 AM/PM ", "1/19/11 19:23 AM/PM ", 
"1/20/12 10:48 AM/PM ", "1/20/12 12:23 AM/PM ", "1/20/12 8:38 AM/PM ", 
"1/23/12 5:53 AM/PM ", "1/24/12 15:18 AM/PM ", "1/24/12 8:23 AM/PM ", 
"1/24/12 8:58 AM/PM ", "1/25/12 11:38 AM/PM ", "1/25/12 5:28 AM/PM ", 
"1/26/12 13:48 AM/PM ", "1/26/12 15:53 AM/PM ", "1/26/12 15:58 AM/PM ", 
"1/26/12 16:13 AM/PM ", "1/26/12 16:18 AM/PM ", "1/26/12 7:33 AM/PM ", 
"1/27/12 7:48 AM/PM ", "1/3/12 17:48 AM/PM ", "1/3/12 18:33 AM/PM ", 
"1/3/12 9:07 AM/PM ", "1/30/12 11:22 AM/PM ", "1/30/12 22:52 AM/PM ", 
"1/30/12 23:10 AM/PM ", "1/31/12 19:54 AM/PM ", "1/31/12 20:39 AM/PM ", 
"1/31/12 5:42 AM/PM ", "1/31/12 9:42 AM/PM ", "1/4/12 14:02 AM/PM ", 
"1/4/12 9:52 AM/PM ", "1/5/12 13:42 AM/PM ", "1/5/12 17:42 AM/PM ", 
.... 
.... 
"9/6/12 9:02 AM/PM ", "9/7/12 11:48 AM/PM ", "9/7/12 12:58 AM/PM ", 
"9/7/12 13:52 AM/PM ", "9/7/12 15:07 AM/PM ", "9/7/12 15:12 AM/PM ", 
"9/7/12 15:22 AM/PM ", "9/7/12 15:47 AM/PM ", "9/7/12 15:52 AM/PM ", 
"9/7/12 8:42 AM/PM ", "9/7/12 9:32 AM/PM ", "9/8/11 23:43 AM/PM " 
), class = "factor")), .Names = c("DocType", "CreatedDate"), row.names = c(NA, 
6L), class = "data.frame") 
> 
+4

請提供可重複的數據,例如,'頭(wi20121812)'。 –

+1

更好的是:'dput(head(wi20121812))' –

+0

向問題添加樣本數據 –

回答

1

您的問題的一部分仍未解答,「如何獲得工作項目類型」非常簡單。

res <- table(wi20121812[, "WorkItemType"]) 

這會給你一個簡單的表格,告訴你每個WorkItemType發生的頻率。如果你需要把它的比例,而不是結果絕對計數,運行prop.table():

prop.table(res) 

或者做兩者兼而有之:

res <- prop.table(table(wi20121812[, "WorkItemType"])) 
+0

這樣做,我得到> RES < - 表(workItemReport2 [ 「的DocType」]) > prop.table(RES) TYPE10WI TYPE11WI TYPE12WI TYPE13WI TYPE14WI TYPE1WI TYPE2WI TYPE3WI TYPE4WI TYPE5WI 0.005835544 0.010079576 0.030238727 0.001061008 0.001591512 0.303978780 0.013262599 0.036074271 0.384084881 0.107692308 TYPE6WI TYPE7WI TYPE8WI TYPE9WI 0.041909814 0.005835544 0.013262599 0.045092838 > –

+0

沒錯。由於這些數字是比例,你只需要將它們乘以100即可達到百分比。因此,Type10WI佔所有工作項目的0.6%,Type11WI佔1%,等等。 – tophcito

+0

謝謝!這工作。 –

0

你可以使用ddplyplyr包:

res = ddply(df, "year", summarise, amount = length(year)) 

或使用count形成相同的包裝(這更容易):

res = count(df, "year") 

其中dfdata.frame包含您的數據,而year是包含分類變量的列名稱,該分類變量詳細說明在哪一年創建該行。

+0

做計數(workItemReport2 「CreatedDate」)我得到 >水庫 CreatedDate頻率 1 00:05.4 1 2 00:05.6 1 3 00:19.7 1 4 00:36.8 1 5 00:37.0 1 6 00:42.7 1 7 00:42.8 1我想知道2011年創建的工作項目以及2012年創建的工作項目數量? –

相關問題