2017-08-15 132 views
2

我有一個類似於下面的數據框叫做df。我想繪製這些數據的子集,從May 2012June 2014。我一直在使用繪圖功能來繪製整個數據框,但是,當我將它分成繪圖的子集時,無論選擇哪部分數據,我都會得到相同的繪圖。在R中繪製順序(時間序列)數據的子集

 Jan  Feb  Mar  Apr  May  Jun  Jul  Aug  Sep  Oct  Nov  Dec 
2011  NA  NA  NA  NA  NA  NA 13.724575 13.670017 13.782099 13.675788 13.442914 13.279969 
2012 13.114463 13.021268 12.999545 12.990171 12.898018 12.611495 12.311641 12.126345 11.974871 12.019042 12.163618 12.304660 
2013 12.374017 12.365795 12.323280 12.377747 12.462203 12.630298 12.780495 12.848942 12.806210 12.860463 12.838953 12.608965 
2014 12.616257 12.720611 12.841626 12.897939 13.008535 13.136377 13.159393 13.290928 13.495218 13.636360 13.797778 13.827273 
2015 13.662063 13.527596 13.568430 13.782818 13.889276 13.971303 14.181846 14.329937 14.385533 14.289386 14.222535 14.384618 
2016 14.516674 14.759385 14.951384 14.763781 14.536779 13.978265 12.888989 11.612033 10.362592 9.205528 8.649027 8.662219 
2017 8.614229 8.446361 8.239606 8.286693 8.498938 8.972903  NA  NA  NA  NA  NA  NA 

我目前的代碼看起來類似於下面(剝離非必需品,如x和y標籤)。

dates <- seq(as.Date("2011-01-01"), by = "months", length = 84) 
plot(dates, df, type = "l") 

我該如何修改它只繪製某些部分?

我會OK與切換到ggplot2過,但似乎給我怪異的結果,所以我堅持了plot()(也許這是因爲它是一個時間序列數據幀?我不知道)。

回答

3

您需要獲取數據的子集。另外,我會將數據幀的格式轉換爲長格式,這樣更容易處理,特別是繪圖/可視化的目的。

#long format with many rows instead of many columns 
library(reshape2) 
long <- melt(df, id.vars = c("Year")) 

#add a column with actual dates instead year and month in different columns 
long$date <- as.Date(with(long,paste(variable,"15",Year,sep = "-")), "%b-%d-%Y") 

#take the subset of the data for May 2012 to June 2014 
Date1 <- as.Date("2012-05-01", "%Y-%m-%d") 
Date2 <- as.Date("2014-06-30", "%Y-%m-%d") 
subdf <- long[long$date < Date2 & long$date > Date1,] 

#use ggplot2 as it provides us nice labels and theme for timeseries by default 
library(ggplot2) 
ggplot(data=subdf, aes(date,value)) + geom_line() 

enter image description here

數據:

df <- structure(list(Year = 2011:2017, Jan = c(NA, 13.114463, 12.374017, 
     12.616257, 13.662063, 14.516674, 8.614229), Feb = c(NA, 13.021268, 
     12.365795, 12.720611, 13.527596, 14.759385, 8.446361), Mar = c(NA, 
     12.999545, 12.32328, 12.841626, 13.56843, 14.951384, 8.239606 
    ), Apr = c(NA, 12.990171, 12.377747, 12.897939, 13.782818, 14.763781, 
     8.286693), May = c(NA, 12.898018, 12.462203, 13.008535, 13.889276, 
     14.536779, 8.498938), Jun = c(NA, 12.611495, 12.630298, 13.136377, 
     13.971303, 13.978265, 8.972903), Jul = c(13.724575, 12.311641, 
     12.780495, 13.159393, 14.181846, 12.888989, NA), Aug = c(13.670017, 
     12.126345, 12.848942, 13.290928, 14.329937, 11.612033, NA), Sep = c(13.782099, 
     11.974871, 12.80621, 13.495218, 14.385533, 10.362592, NA), Oct = c(13.675788, 
     12.019042, 12.860463, 13.63636, 14.289386, 9.205528, NA), Nov = c(13.442914, 
     12.163618, 12.838953, 13.797778, 14.222535, 8.649027, NA), Dec = c(13.279969, 
     12.30466, 12.608965, 13.827273, 14.384618, 8.662219, NA)), .Names = c("Year", 
     "Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", 
     "Oct", "Nov", "Dec"), class = "data.frame", row.names = c(NA, -7L)) 

重整的數據,以長格式:

head(long) 

# Year variable value  date 
# 1 2011  Jan  NA 2011-01-15 
# 2 2012  Jan 13.11446 2012-01-15 
# 3 2013  Jan 12.37402 2013-01-15 
# 4 2014  Jan 12.61626 2014-01-15 
# 5 2015  Jan 13.66206 2015-01-15 
# 6 2016  Jan 14.51667 2016-01-15 
+0

謝謝,這看起來不錯。一個後續問題,在你創建日期的新列的那一行中,代碼'long $ date < - as.Date(with(long,paste(variable ...)',等等,變量是什麼? – dward4

+1

@ dward4以長格式顯示所有月份的列(正如你可以在我的回答結尾處看到的那樣),'melt'創建一個名爲'variable'的新列,它具有列名稱(在這種情況下,幾個月),查看'melt(df,id.vars = c(「Year」))'的輸出 – Masoud