試試這個,假設你的數據幀df
:
## in case you have different column names
colnames(df) <- c("Name", "Date")
## you might also have Date as factors when reading in data
## the following ensures it is character string
df$Date <- as.character(df$Date)
## convert to Date object
## see ?strptime for various available format
## see ?as.Date for Date object
df$Date <- as.Date(df$Date, format = "%m-%d-%Y %H:%M:%S")
## reorder, so that date are ascending (see Jane)
## this is necessary, otherwise negative number occur after differencing
## see ?order on ordering
df <- df[order(df$Name, df$Date), ]
## take day lags per person
## see ?diff for taking difference
## see ?tapply for applying FUN on grouped data
## as.integer() makes output clean
## if unsure, compare with: lags <- with(df, tapply(Date, Name, FUN = diff))
lags <- with(df, tapply(Date, Name, FUN = function (x) as.integer(diff(x))))
爲您截斷的數據(以5行),我得到:
> lags
$Jane
[1] 1
$Mary
[1] 0 1
lags
是一個列表。如果你想得到簡的信息,請做lags$Jane
。要獲得直方圖,請執行hist(lags$Jane)
。此外,如果您只想爲所有客戶端生成直方圖,忽略個體差異,請使用hist(unlist(lags))
。 unlist()
將列表摺疊成單個向量。
評論:
- 關於你的很好的參考R,看到需求CRAN: R intro和advanced R;
- 使用
tapply
多指標?也許你可以試試我使用paste
首先構建的輔助索引;
- 呃,看起來我很快就使事情變得複雜得多,通過使用
density
和中心極限定理等來進行可視化。所以我刪除了我的其他答案。
你能證明你的預期輸出(至少是數據),以及你在上面給出的數據中使用了什麼順序? –