2017-12-02 654 views
2

enter image description here季節性箱形圖在R或MATLAB

DATE  obs1  obs2 obs3  
1981-01-01 2032.409 3142.46 1741.143 
1981-01-02 2023.687 3870.04 1735.256 
1981-01-03 2014.274 4126.25 1728.556 
1981-01-04 2005.795 2615.91 1722.985 
1981-01-05 2000.674 2940.83 1722.317 
1981-01-06 1998.477 3258.69 1723.937 
1981-01-07 1997.014 3371.6 1724.104 
1981-01-08 1995.576 3184.13 1722.624 
1981-01-09 1993.706 3540.76 1719.592 
1981-01-10 1991.286 3312.43 1715.156 
1981-01-11 1988.633 3028.65 1710.141 
1981-01-12 1986.147 3212.79 1705.183 
1981-01-13 1984.229 3193.23 1700.789 
1981-01-14 1982.756 3294.52 1697.785 
1981-01-15 1981.548 3553.78 1696.068 
1981-01-16 1980.561 3492.28 1694.544 
1981-01-17 1979.792 2452.09 1692.36 
1981-01-18 1979.224 1873.82 1689.525 
1981-01-19 1978.845 3218.28 1686.452 

我需要繪製季節(冬季,春季,夏季和秋季)箱形圖中的R爲每日數據如上所示。對於不同的電臺,我有10年以上的格式數據。情節應該在一個數字中,每個季節都有多個盒子情節。

回答

2

使用tidyverselubridate的解決方案。 tidyverse包括dplyrtidyr來執行數據處理,並且ggplot2來創建該圖。 lubridate是處理數據幀中的日期。

由於您提供的數據集不是很有用,因爲它僅包含一月份的一些記錄,因此無法創建顯示季節差異的箱線圖,因此我決定創建一個新的示例數據框。我的示例數據框架的結構與您的數據集相似,應該爲您提供一些提示,作爲您現實世界問題的出發點。

# Set the seed for reproducibility 
set.seed(123) 

library(tidyverse) 
library(lubridate) 

# Create example data frame 
dt <- data_frame(DATE = seq(ymd("1980-01-01"), ymd("1989-12-31"), by = 1)) %>% 
    mutate(obs1 = rnorm(nrow(.), mean = 0, sd = 1), 
     obs2 = rnorm(nrow(.), mean = 1, sd = 2), 
     obs3 = rnorm(nrow(.), mean = 2, sd = 3)) 

head(dt) 
# # A tibble: 6 x 4 
#   DATE  obs1  obs2  obs3 
#  <date>  <dbl>  <dbl>  <dbl> 
# 1 1980-01-01 -0.56047565 0.7874145 2.7827006 
# 2 1980-01-02 -0.23017749 0.1517417 8.5720252 
# 3 1980-01-03 1.55870831 0.7193725 1.3293478 
# 4 1980-01-04 0.07050839 0.5454177 0.3253155 
# 5 1980-01-05 0.12928774 1.41.1245771 
# 6 1980-01-06 1.71506499 -0.6491910 4.5034395 

tail(dt) 
# # A tibble: 6 x 4 
#   DATE  obs1  obs2  obs3 
#  <date>  <dbl>  <dbl>  <dbl> 
# 1 1989-12-26 -0.3629796 0.6750946 0.8586325 
# 2 1989-12-27 0.1102218 2.8572337 9.8541328 
# 3 1989-12-28 -0.2700741 1.7614026 1.9109596 
# 4 1989-12-29 0.6920973 0.5275611 -0.4756240 
# 5 1989-12-30 0.9282803 1.3811225 1.5222535 
# 6 1989-12-31 0.5931301 -1.6638739 4.1157087 

示例數據框包含10年3個觀察組的記錄。每列的值都是正態分佈,具有不同的平均值和標準偏差。

第一步是通過將數據集從寬格式轉換爲長格式並添加顯示季節信息的列來處理數據幀。

dt2 <- dt %>% 
    # Convert data frame from lwide format to long format 
    gather(Observation, Value, -DATE) %>% 
    # Remove "obs" in the Observation column 
    mutate(Observation = str_replace(Observation, "obs", "")) %>% 
    # Convert the DATE column to date class 
    mutate(DATE = ymd(DATE)) %>% 
    # Create Month column 
    mutate(Month = month(DATE)) %>% 
    # Create Season column 
    mutate(Season = case_when(
    Month %in% c(12, 1, 2)  ~ "winter", 
    Month %in% c(3, 4, 5)  ~ "spring", 
    Month %in% c(6, 7, 8)  ~ "summer", 
    Month %in% c(9, 10, 11)  ~ "fall", 
    TRUE      ~ NA_character_ 
)) 

在那之後,我們就可以使用ggplot2創建箱線圖。請注意,我使用stat_summary爲每個組添加紅線來表示平均值。

# Create a boxplot using ggplot2 
# Specify the aesthetics 
ggplot(dt2, aes(x = Season, y = Value, fill = Observation)) + 
    # Specify the geom to be boxplot 
    geom_boxplot() +     
    # Add a red line to the mean 
    stat_summary(aes(ymax = ..y.., ymin = ..y..), 
       fun.y = "mean", 
       geom = "errorbar",    # Use geom_errorbar to add line as mean 
       color = "red", 
       width = 0.7, 
       position = position_dodge(width = 0.75), # Add the line to each group 
       show.legend = FALSE) 

enter image description here

2

好吧...第一步是建立可以檢測在一個給定的日期落在本賽季的功能。幸運的是,我很早以前就已經開發出了能夠在南半球處理季節(這是逆轉的)。

該函數沒有執行任何完整性檢查,因爲我已經將它與已經過清理的數據集一起使用,但最終實現一些應該不難(除非您決定在使用它之前對數據集進行清理)。它以矢量化的方式工作,以最大化Matlab中計算的性能。

這裏是:

function season = GetSeason(date,southern_hemisphere) 

    if (nargin == 1) 
     southern_hemisphere = false; 
    end 

    [~,month,day] = datevec(date); 
    offset = month + (day/100); 

    winter = (offset < 3.21) | (offset >= 12.22); 
    spring = ~winter & (offset < 6.21); 
    summer = ~winter & ~spring & (offset < 9.23); 
    autumn = ~winter & ~spring & ~summer; 

    offset(spring) = 0; 
    offset(summer) = 1; 
    offset(autumn) = 2; 
    offset(winter) = 3; 

    if (southern_hemisphere) 
     offset = offset + 2; 
    end 

    season = mod(offset,4) + 1; 
end 

現在,第一步,你的腳本中,是從一個數據集文件中提取你的觀察。爲了爲您創建完整的演示,我創建了一個Excel數據集。但你也可以使用一個CSV數據集在用Matlab處理的代碼或其它格式的文件幾乎沒有變化:

% detect the dataset columns format 
opts = detectImportOptions('data.xlsx'); 

% impose a specific format for the dataset columns 
opts = setvartype(opts,{'datetime' 'double' 'double' 'double'}); 

% extract data in a table variable 
data = readtable('data.xlsx',opts); 

% sanitize the table variable removing the rows with missing or invalid values 
data = rmmissing(data); 

% sort the table variable rows by date (default first rows, default ascending) 
data = sortrows(data); 

第二次測試包括在獲得相應的季節爲觀察日期:

seasons = GetSeason(data.Date); 

第三步,假設我們只對觀測的第一列稱爲Obs1執行所有這一過程:

spring_1 = data.Obs1(seasons == 1); 
summer_1 = data.Obs1(seasons == 2); 
autumn_1 = data.Obs1(seasons == 3); 
winter_1 = data.Obs1(seasons == 4); 

第四步也是最後一步包括在一個圖表中繪製每個季節的一個箱形圖(必須將變量groups作爲參數傳遞給boxplot函數,以便知道後者需要繪製多少個箱子並使用哪些值):

groups = [ 
    ones(size(spring_1)); 
    2 * ones(size(summer_1)); 
    3 * ones(size(autumn_1)); 
    4 * ones(size(winter_1)); 
]; 

figure(); 
boxplot([spring_1; summer_1; autumn_1; winter_1],groups); 
set(gca,'XTickLabel',{'Spring' 'Summer' 'Autumn' 'Winter'}); 

這裏是結果:

Result

以飽滿的工作代碼更新所有意見

opts = detectImportOptions('data.xlsx'); 
opts = setvartype(opts,{'datetime' 'double' 'double' 'double'}); 

data = readtable('data.xlsx',opts); 
data = rmmissing(data); 
data = sortrows(data); 

seasons = GetSeason(data.Date); 

spring_1 = data.Obs1(seasons == 1); 
summer_1 = data.Obs1(seasons == 2); 
autumn_1 = data.Obs1(seasons == 3); 
winter_1 = data.Obs1(seasons == 4); 
spring_2 = data.Obs2(seasons == 1); 
summer_2 = data.Obs2(seasons == 2); 
autumn_2 = data.Obs2(seasons == 3); 
winter_2 = data.Obs2(seasons == 4); 
spring_3 = data.Obs3(seasons == 1); 
summer_3 = data.Obs3(seasons == 2); 
autumn_3 = data.Obs3(seasons == 3); 
winter_3 = data.Obs3(seasons == 4); 

plot_data = [ 
    spring_1; 
    summer_1; 
    autumn_1; 
    winter_1; 
    spring_2; 
    summer_2; 
    autumn_2; 
    winter_2; 
    spring_3; 
    summer_3; 
    autumn_3; 
    winter_3 
]; 

plot_groups = [ 
    (1 * ones(size(spring_1))) (1 * ones(size(spring_1))); 
    (1 * ones(size(summer_1))) (2 * ones(size(summer_1))); 
    (1 * ones(size(autumn_1))) (3 * ones(size(autumn_1))); 
    (1 * ones(size(winter_1))) (4 * ones(size(winter_1))); 
    (2 * ones(size(spring_2))) (5 * ones(size(spring_2))); 
    (2 * ones(size(summer_2))) (6 * ones(size(summer_2))); 
    (2 * ones(size(autumn_2))) (7 * ones(size(autumn_2))); 
    (2 * ones(size(winter_2))) (8 * ones(size(winter_2))); 
    (3 * ones(size(spring_3))) (9 * ones(size(spring_3))); 
    (3 * ones(size(summer_3))) (10 * ones(size(summer_3))); 
    (3 * ones(size(autumn_3))) (11 * ones(size(autumn_3))); 
    (3 * ones(size(winter_3))) (12 * ones(size(winter_3))) 
]; 

labels_obs = {'' '' '' '' '' '' '' '' '' '' '' ''}; 
labels_season = repmat({'Spring' 'Summer' 'Autumn' 'Winter'},1,3); 

figure('Units','normalized','Position',[0.05 0.1 0.9 0.8]); 
boxplot(plot_data,plot_groups, ... 
    'BoxStyle','outline', ... 
    'FactorGap',[5 1], ... 
    'Labels',{labels_obs; labels_season}, ... 
    'Notch','on'); 

colors = repmat('wcyg',1,3); 
h = findobj(gca,'Tag','Box'); 

for i = 1:numel(h) 
    patch(get(h(i),'XData'),get(h(i),'YData'),colors(i),'FaceAlpha',0.5); 
end 

h = findall(allchild(findall(gca,'Type','hggroup')),'Type','text','String',''); 
positions = cell2mat(get(h,'pos')); 
positions_new = num2cell([mean(reshape(positions(:,1),4,[]))' positions(1:4:end,2:end)],2); 
set(h(1:4:end),{'Position'},positions_new,{'String'},{'Observations 3'; 'Observations 2'; 'Observations 1'}) 

h = findall(allchild(findall(gca,'Type','hggroup')),'Type','text','String',''); 
delete(h); 

結果:

Result Update