2016-12-28 43 views
1

我有一個數據集,其中缺少一些時間戳。如下到目前爲止,我已經寫代碼,用NAN填充丟失的時間戳數據行 - MATLAB

x = table2dataset(Testing_data); 
T1 = x(:,1);    
C1 =dataset2cell(T1); 
formatIn = 'yyyy-mm-dd HH:MM:SS'; 
t1= datenum(C1,formatIn); 

% Creating 10 minutes of time interval; 
avg = 10/60/24;   
tnew = [t1(1):avg:t1(end)]'; 
indx = round((t1-t1(1))/avg) + 1; 
ynew = NaN(length(tnew),1); 
ynew(indx)=t1; 

% replacing missing time with NaN 
t = datetime(ynew,'ConvertFrom','datenum');     
formatIn = 'yyyy-mm-dd HH:MM:SS'; 
DateVector = datevec(ynew,formatIn); 
dt = datestr(ynew,'yyyy-mm-dd HH:MM:SS'); 
ds = string(dt); 

測試數據已經在這裏顯示三個參數,

 Time      x   y 
2009-04-10 02:00:00.000   1   0.1 
2009-04-10 02:10:00.000   2   0.2 
2009-04-10 02:30:00.000   3   0.3 
2009-04-10 02:50:00.000   4   0.4 

現在你可以看到,對於10分鐘的間隔,有遺漏的時間戳(2 :20和2:40),所以我想添加時間戳。然後我想xy值爲NAN。所以我的輸出會是這樣,

 Time      x   y 
2009-04-10 02:00:00.000   1   0.1 
2009-04-10 02:10:00.000   2   0.2 
2009-04-10 02:20:00.000   NaN  NaN 
2009-04-10 02:30:00.000   3   0.3  
2009-04-10 02:40:00.000   NaN  NaN 
2009-04-10 02:50:00.000   4   0.4 

正如你可以從我的代碼中看到的,我只是能夠增加NaN帶時間戳,但現在想取我所需的相應x和y的值。

請注意我有超過3000個以上格式的數據行,我想對我的所有值執行相同的操作。

回答

0

它似乎是在你的問題上的矛盾;你說你可以插入NaN來代替缺少的時間字符串,但是在你寫的時間字符串的預期輸出的例子中。

而且你是指缺少時間戳(2:20),但是,如果時間步長爲10分鐘,在您的示例數據還有另外一個缺少時間戳(2:40)

假設:

  • 你真的想插入缺少時間蜇
  • 要管理所有缺少的時間戳

可以按如下修改代碼:

  • 不需要ynew時間
  • tnew時間應在地方的ynew
  • 使用在xy柱插入NaN值,你必須:
    • 提取它們從dataset
    • 創建兩個新陣列初始化它們到NaN
    • 通過indx

在插入標識的位置原來xy數據如下鬱可找到你的代碼的更新版本。

  • xy數據存儲在x_datay_data陣列
  • xy數據在腳本的末端存儲在x_data_newy_data_new陣列

在,兩個表生成:第一個生成時間爲string,第二個爲cellarray。

代碼中的註釋應標識修改。

x = table2dataset(Testing_data); 
T1 = x(:,1); 
% Get X data from the table 
x_data=x(:,2) 
% Get Y data from the table 
y_data=x(:,3) 

C1 =dataset2cell(T1); 

formatIn = 'yyyy-mm-dd HH:MM:SS'; 
t1= datenum(C1(2:end),formatIn) 

avg = 10/60/24;  % Creating 10 minutes of time interval; 
tnew = [t1(1):avg:t1(end)]' 
indx = round((t1-t1(1))/avg) + 1 
% 
% Not Needed 
% 
% ynew = NaN(length(tnew),1); 
% ynew(indx)=t1; 
% 
% Create the new X and Y data 
% 
y_data_new = NaN(length(tnew),1) 
y_data_new(indx)=t1 

x_data_new=nan(length(tnew),1) 
x_data_new(indx)=x_data 
y_data_new=nan(length(tnew),1) 
y_data_new(indx)=y_data 

% t = datetime(ynew,'ConvertFrom','datenum') % replacing missing time with NAN 
% 
% Use tnew instead of ynew 
% 
t = datetime(tnew,'ConvertFrom','datenum') % replacing missing time with NAN 
formatIn = 'yyyy-mm-dd HH:MM:SS' 
% DateVector = datevec(y_data_new,formatIn) 
% dt = datestr(ynew,'yyyy-mm-dd HH:MM:SS') 
% 
% Use tnew instead of ynew 
% 
dt = datestr(tnew,'yyyy-mm-dd HH:MM:SS') 
% ds = char(dt) 

new_table=table(dt,x_data_new,y_data_new) 
new_table_1=table(cellstr(dt),x_data_new,y_data_new) 

輸出是

new_table = 

     dt   x_data_new y_data_new 
    ___________ __________ __________ 

    [1x19 char]  1   0.1  
    [1x19 char]  2   0.2  
    [1x19 char] NaN   NaN  
    [1x19 char]  3   0.3  
    [1x19 char] NaN   NaN  
    [1x19 char]  4   0.4  


new_table_1 = 

      Var1    x_data_new y_data_new 
    _____________________ __________ __________ 

    '2009-04-10 02:00:00'  1   0.1  
    '2009-04-10 02:10:00'  2   0.2  
    '2009-04-10 02:20:00' NaN   NaN  
    '2009-04-10 02:30:00'  3   0.3  
    '2009-04-10 02:40:00' NaN   NaN  
    '2009-04-10 02:50:00'  4   0.4 

希望這有助於。

Qapla」

+0

謝謝你。我只是舉了一個例子。它正在工作,但正如我前面所說,我有數據集由6個參數組成(例如x,y,z,a,b,c)。有沒有更簡單的方法(對x和y所做的)對所有這6個參數都是一樣的。意味着每當缺少時間戳時,將NAN加入其時間以及其對應的x,y,z,a,b,c ...? –

+0

我不明白這個問題:你是否想將'NaN'設置爲只有附加參數('z,a,b,c')或時間戳?說:'NaN-NaN-NaN ... NaN NaN NaN'或'2009-04-10 02:40:00 NaN NaN NaN NaN NaN NaN'? –

+0

僅適用於參數。你剛剛寫的代碼我想要的參數不是時間戳相同... –

0

這個例子是不是從接受的答案也不同,但對眼睛恕我直言更容易一點。但是,它支持的差距大於1步,而且更通用一些,因爲它的假設更少。

它的工作原理與普通電池陣列代替了原來的表中的數據,這樣的轉換是由你(我在R2010a版本所以無法測試)

% Example data with intentional gaps of varying size 
old_data = {'2009-04-10 02:00:00.000' 1 0.1 
      '2009-04-10 02:10:00.000' 2 0.2 
      '2009-04-10 02:30:00.000' 3 0.3 
      '2009-04-10 02:50:00.000' 4 0.4 
      '2009-04-10 03:10:00.000' 5 0.5 
      '2009-04-10 03:20:00.000' 6 0.6 
      '2009-04-10 03:50:00.000' 7 0.7} 


% Convert textual dates to numbers we can work with more easily 
old_dates = datenum(old_data(:,1)); 

% Nominal step size is the minimum of all differences 
deltas = diff(old_dates); 
nominal_step = min(deltas); 

% Generate new date numbers with constant step 
new_dates = old_dates(1) : nominal_step : old_dates(end); 

% Determine where the gaps in the data are, and how big they are, 
% taking into account rounding error 
step_gaps = abs(deltas - nominal_step) > 10*eps; 
gap_sizes = round(deltas(step_gaps)/nominal_step - 1); 

% Create new data structure with constant-step time stamps, 
% initially with the data of interest all-NAN 
new_size = size(old_data,1) + sum(gap_sizes); 
new_data = [cellstr(datestr(new_dates, 'yyyy-mm-dd HH:MM:SS')),... 
      repmat({NaN}, new_size, 2)]; 

% Compute proper locations of the old data in the new data structure, 
% again, taking into account rounding error 
day = 86400; % (seconds in a day) 
new_datapoint = ismember(round(new_dates * day), ... 
         round(old_dates * day)); 

% Insert the old data at the right locations 
new_data(new_datapoint, 2:3) = data(:, 2:3) 

輸出是:

old_data = 
    '2009-04-10 02:00:00.000' [1] [0.100000000000000] 
    '2009-04-10 02:10:00.000' [2] [0.200000000000000] 
    '2009-04-10 02:30:00.000' [3] [0.300000000000000] 
    '2009-04-10 02:50:00.000' [4] [0.400000000000000] 
    '2009-04-10 03:10:00.000' [5] [0.500000000000000] 
    '2009-04-10 03:20:00.000' [6] [0.600000000000000] 
    '2009-04-10 03:50:00.000' [7] [0.700000000000000] 

new_data = 
    '2009-04-10 02:00:00' [ 1] [0.100000000000000] 
    '2009-04-10 02:10:00' [ 2] [0.200000000000000] 
    '2009-04-10 02:20:00' [NaN] [    NaN] 
    '2009-04-10 02:30:00' [ 3] [0.300000000000000] 
    '2009-04-10 02:40:00' [NaN] [    NaN] 
    '2009-04-10 02:50:00' [ 4] [0.400000000000000] 
    '2009-04-10 03:00:00' [NaN] [    NaN] 
    '2009-04-10 03:10:00' [ 5] [0.500000000000000] 
    '2009-04-10 03:20:00' [ 6] [0.600000000000000] 
    '2009-04-10 03:30:00' [NaN] [    NaN] 
    '2009-04-10 03:40:00' [NaN] [    NaN] 
    '2009-04-10 03:50:00' [ 7] [0.700000000000000]