2015-06-17 87 views
1

我有一個時間序列圖,每月文章頻率爲y軸。數據是這樣的:擬合時間序列數據的多項式曲線

 Count.V  Date  Month  Week  Year 
2637  6 2006-01-02 2006-01-01 2006-01-02 2006-01-01 
406  4 2006-01-03 2006-01-01 2006-01-02 2006-01-01 
543  4 2006-01-04 2006-01-01 2006-01-02 2006-01-01 
998  3 2006-01-05 2006-01-01 2006-01-02 2006-01-01 
1400  4 2006-01-06 2006-01-01 2006-01-02 2006-01-01 
2218  4 2006-02-01 2006-02-01 2006-01-30 2006-01-01 
2792  6 2006-02-02 2006-02-01 2006-01-30 2006-01-01 
2488  10 2006-02-03 2006-02-01 2006-01-30 2006-01-01 
954  8 2006-02-04 2006-02-01 2006-01-30 2006-01-01 
2622  3 2006-02-06 2006-02-01 2006-02-06 2006-01-01 
2321  11 2006-02-07 2006-02-01 2006-02-06 2006-01-01 
2452  10 2006-03-21 2006-03-01 2006-03-20 2006-01-01 
2267  5 2006-03-22 2006-03-01 2006-03-20 2006-01-01 
1408  3 2006-03-23 2006-03-01 2006-03-20 2006-01-01 
2602  3 2006-03-24 2006-03-01 2006-03-20 2006-01-01 
2489  5 2006-03-25 2006-03-01 2006-03-20 2006-01-01 
2771  1 2006-03-27 2006-03-01 2006-03-27 2006-01-01 

我用GGPLOT2繪製它:

MyPlot <- ggplot(data = df, aes(x = Month, y = Count.V)) + stat_summary(fun.y = sum, geom ="line") + scale_x_date(
labels = date_format("%m-%y"), 
breaks = "3 months") 

Time series plot

然而,當我嘗試適應多項式曲線的數據,例如,

MyPlot + stat_smooth(method = "lm", formula = y ~ poly(x, 3), size = 1) 

東西不對:

Time series graph with a polynomial curve (failed)

我在做什麼錯?

編輯: 添加了多個月的數據幀的部分:

> dput(df) structure(list(Count.V = c(6L, 4L, 4L, 3L, 4L, 5L, 2L, 8L, 6L, 5L, 12L, 1L, 2L, 3L, 4L, 2L, 2L, 4L, 4L, 4L, 6L, 6L, 2L, 4L, 4L, 6L, 10L, 8L, 3L, 11L, 8L, 13L, 3L, 9L, 7L, 4L, 7L, 9L, 5L, 4L, 5L, 6L, 5L, 9L, 5L, 11L, 4L, 6L, 2L, 8L, 3L, 5L, 4L, 3L, 5L, 4L, 2L, 3L, 3L, 3L, 8L, 6L, 1L, 3L, 10L, 5L, 3L, 3L, 5L, 1L, 8L, 4L, 3L, 2L, 1L, 4L, 4L, 4L, 5L, 7L, 8L, 3L, 4L, 7L, 5L, 3L, 3L, 4L, 6L, 3L, 2L, 3L, 2L, 5L, 6L, 4L, 5L, 8L, 3L, 4L), Date = structure(c(13150, 13151, 13152, 13153, 13154, 13155, 13157, 13158, 13159, 13161, 13162, 13164, 13165, 13166, 13168, 13169, 13171, 13172, 13173, 13174, 13175, 13176, 13178, 13179, 13180, 13181, 13182, 13183, 13185, 13186, 13187, 13188, 13189, 13190, 13192, 13193, 13194, 13195, 13196, 13197, 13199, 13200, 13201, 13202, 13203, 13204, 13206, 13207, 13208, 13209,, 13211, 13214, 13215, 13216, 13217, 13218, 13220, 13221, 13222, 13223, 13224, 13225, 13227, 13228, 13229, 13230, 13231, 13232, 13234, 13235, 13236, 13237, 13238, 13239, 13241, 13242, 13243, 13244, 13245, 13246, 13248, 13249, 13250, 13251, 13252, 13253, 13256, 13257, 13258, 13259, 13260, 13262, 13263, 13264, 13265, 13266, 13267, 13270, 13271), class = "Date"), Month = structure(c(13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13180, 13180, 13180, 13180, 13180, 13180, 13180, 13180, 13180, 13180, 13180, 13180, 13180, 13180, 13180, 13180, 13180, 13180, 13180, 13180, 13180, 13180, 13180, 13180, 13208, 13208, 13208, 13208, 13208, 13208, 13208, 13208, 13208, 13208, 13208, 13208, 13208, 13208, 13208, 13208, 13208, 13208, 13208, 13208, 13208, 13208, 13208, 13208, 13208, 13208, 13239, 13239, 13239, 13239, 13239, 13239, 13239, 13239, 13239, 13239, 13239, 13239, 13239, 13239, 13239, 13239, 13239, 13239, 13239, 13239, 13239, 13239, 13239, 13239, 13269, 13269 ), class = "Date"), Week = structure(c(13150, 13150, 13150, 13150, 13150, 13150, 13157, 13157, 13157, 13157, 13157, 13164, 13164, 13164, 13164, 13164, 13171, 13171, 13171, 13171, 13171, 13171, 13178, 13178, 13178, 13178, 13178, 13178, 13185, 13185, 13185, 13185, 13185, 13185, 13192, 13192, 13192, 13192, 13192, 13192, 13199, 13199, 13199, 13199, 13199, 13199, 13206, 13206, 13206, 13206, 13206, 13206, 13213, 13213, 13213, 13213, 13213, 13220, 13220, 13220, 13220, 13220, 13220, 13227, 13227, 13227, 13227, 13227, 13227, 13234, 13234, 13234, 13234, 13234, 13234, 13241, 13241, 13241, 13241, 13241, 13241, 13248, 13248, 13248, 13248, 13248, 13248, 13255, 13255, 13255, 13255, 13255, 13262, 13262, 13262, 13262, 13262, 13262, 13269, 13269), class = "Date"), Year = structure(c(13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149), class = "Date")), .Names = c("Count.V", "Date", "Month", "Week", "Year"), row.names = c(2637L, 406L, 543L, 998L, 1400L, 2667L, 1211L, 140L, 737L, 545L, 2573L, 978L, 2119L, 842L, 1866L, 1002L, 1956L, 1229L, 2278L, 1889L, 1285L, 1020L, 964L, 1584L, 2218L, 2792L, 2488L, 954L, 2622L, 2321L, 796L, 501L, 294L, 2476L, 2541L, 642L, 177L, 1222L, 1249L, 990L, 2776L, 580L, 1181L, 1792L, 431L, 224L, 214L, 679L, 1601L, 1655L, 645L, 2785L, 1507L, 1580L, 1274L, 2083L, 157L, 2491L, 2733L, 1533L, 2332L, 328L, 1995L, 1598L, 2452L, 2267L, 1408L, 2602L, 2489L, 2771L, 2323L, 1714L, 907L, 1522L, 882L, 2727L, 844L, 2105L, 253L, 1160L, 2075L, 1435L, 821L, 1284L, 2406L, 2357L, 1499L, 2145L, 1539L, 1890L, 1856L, 27L, 887L, 1500L, 812L, 1677L, 1965L, 2580L, 823L, 1482L), class = "data.frame")

回答

2

嘗試使用mean代替sum這樣

ggplot(data = df, aes(x = Month, y = Count.V)) + 
    stat_summary(fun.y = mean, geom ="line")+ 
    stat_smooth(method = "lm", formula = y ~ poly(x, 3), size = 1) + 
    geom_point()+ 
    scale_x_date(labels = date_format("%m-%y"), breaks = "3 months") 
+0

謝謝,但是當我做了改造 'Volkskrant.df $ Date < - ymd(Volkskrant.df $ Date); Volkskrant.df $月< - ymd(Volkskrant.df $ Month)' 並嘗試繪製圖表 'Volkskrant.plot < - ggplot(data = Volkskrant.df,aes(x = Month,y = Count.V) )+ stat_summary(fun.y = sum,geom =「line」)+ stat_smooth(method =「lm」,formula = y〜poly(x,3),size = 1)+ scale_x_date( labels = date_format(「 %1「), breaks =」3 months「)', 我得到這個錯誤:'錯誤:無效的輸入:date_trans只與Date類的對象一起工作' – Zlo

+0

嘗試先轉換你的日期變量:'Volkskrant.df $ Date <-as.Date(Volkskrant.df $ Date,「%Y-%m-%d」)'? –

+0

@MamounBenghezal是的,我今天有點慢。但是,日期數據在我的數據框中已經是日期格式。我認爲曲線的實際問題是'stat_summary(fun.y = sum,geom =「line」)'函數會聚合每個月的Count.V變量的計數,而'stat_smooth'不能處理。例如,如果我嘗試在不使用'stat_summary'的情況下繪製圖表,它會給我正確的曲線。 – Zlo