計算年齡

-1

df <- data.frame(
    Birth_Date = c("1952-03-21", "1963-12-20", "1956-02-25", "1974-08-04", "1963-06-13", "1956-11-20", "1974-03-07", "1963-10-23", "1952-11-24", "1974-12-16"), 
    Items_Amount = c(68,189,69,19,299,79,149,149,29,189) 
) 
df

我試圖分析一個數據集，其中有跨90年列Item_Amount（以$）和客戶的出生日期蔓延。目標是比較基於適當年齡組的銷售百分比。

主要數據幀包含日期「出生日期」，從「1902年2月13日」到「1991年12月11日」的日期不是字符串列

'data.frame': 350241 obs. of 1 variable: 
$ BirthDate: Date, format: "1964-06-08" "1964-06-08" "1964-06-08" "1964-06-08" ... 


> min(Trans_Cust$Birth_Date) 
[1] "1902-02-13" 

> difftime(max(Trans_Cust$Birth_Date),min(Trans_Cust$Birth_Date),units = "auto") 
Time difference of 32808 days 

> max(Trans_Cust$Birth_Date) 
[1] "1991-12-11"

如何找到立足現在年齡「Birth_Date」列，將其存儲到新列「Present_ages」，然後繼續計算由present_ages分組的sum(Items_Amount)。

來源

2017-03-06 Ashish Sahu

確保你提供了[可重現的例子]（http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example ）在尋求幫助的時候，這裏的期望輸出到底是什麼？這幾十年你想做什麼？ – MrFlick

@MrFlick詳情添加了 –

你的mo相關問題涉及agegroup01 - agegroup09，但您對數據的描述表明您的數據跨越了10年。 – G5W

我假設你的出生日期只是字符串，所以你需要將它們轉換爲某種形式的日期。我正在使用POSIXct。轉換後，您可以設置十進制邊界，並使用cut將日期分組。

BirthDate = c("1964-06-08", "1964-06-08", "1964-06-08", "1964-06-08", 
     "1902-02-13", "1991-12-11", "1944-06-06", "1929-10-24") 
StartDecade = seq(as.POSIXct("1900-01-01"), as.POSIXct("2000-01-01"), by="10 years") 
cut(as.POSIXct(BirthDate), breaks=StartDecade) 
[1] 1960-01-01 1960-01-01 1960-01-01 1960-01-01 1900-01-01 1990-01-01 1940-01-01 1920-01-01

這可能是更漂亮，以簡化的名稱

as.numeric(cut(as.POSIXct(BirthDate), breaks=StartDecade)) - 1 
[1] 6 6 6 6 0 9 4 2

來源

2017-03-06 21:26:35 G5W

感謝您的輸入。我編輯了問題的詳細信息，你可以再次看看它，並幫助我。 –

這一個數值「圓」迴歸十年：

BirthDate = as.Date(c("1964-06-08", "1964-06-08", "1964-06-08", "1964-06-08", "1902-02-13", "1991-12-11", "1944-06-06", "1929-10-24")) 

BDdecade <- round(as.numeric(format(BirthDate, "%Y"))-5, -1) 
BDdecade 
#[1] 1960 1960 1960 1960 1900 1990 1940 1920

所需提取的一年，轉換成數字和減去5，因爲floor函數不具有相同的容量，可舍入到幾十和幾百，與round一樣。

，目前還不清楚是什麼，爲「十年所需的出發點應該是，這將分裂最小日期的基礎上。

> BDdecade2 <- cut(BirthDate, breaks= seq(min(BirthDate), max(BirthDate), by= "10 years")) 
> BDdecade2 
[1] 1962-02-13 1962-02-13 1962-02-13 1962-02-13 1902-02-13 <NA>  1942-02-13 
[8] 1922-02-13 
8 Levels: 1902-02-13 1912-02-13 1922-02-13 1932-02-13 1942-02-13 ... 1972-02-13

的NA建議你可能需要添加+365 （或者甚至更多）到最大日期。

來源

2017-03-06 21:46:47

回答

相關問題