2016-01-03 121 views
0

tl; dr:如何在繪製時間序列時跳過沒有數據的時間段?matplotlib:跳過不帶數據的時間段繪製時間序列


我正在運行一個長計算,我想監視它的進度。有時我打斷這個計算。日誌存儲在一個巨大的CSV文件中,如下所示:

2016-01-03T01:36:30.958199,0,0,0,startup 
2016-01-03T01:36:32.363749,10000,0,0,regular 
... 
2016-01-03T11:12:21.082301,51020000,13402105,5749367,regular 
2016-01-03T11:12:29.065687,51030000,13404142,5749367,regular 
2016-01-03T11:12:37.657022,51040000,13408882,5749367,regular 
2016-01-03T11:12:54.236950,51050000,13412824,5749375,shutdown 
2016-01-03T19:02:38.293681,51050000,13412824,5749375,startup 
2016-01-03T19:02:49.296161,51060000,13419181,5749377,regular 
2016-01-03T19:03:00.547644,51070000,13423127,5749433,regular 
2016-01-03T19:03:05.599515,51080000,13427189,5750183,regular 
... 

實際上,有41列。每一列都是進展的一個指標。第二列總是以10000步爲單位遞增。最後一列是不言自明的。

我想繪製每個列在同一個圖上,同時跳過「關機」和「啓動」之間的時間段。理想情況下,我還想在每個跳過時畫一條垂直線。


這裏是我到目前爲止有:

import matplotlib.pyplot as plt 
import pandas as pd 

# < ... reading my CSV in a Pandas dataframe `df` ... > 

fig, ax = plt.subplots() 

for col in ['total'] + ['%02d' % i for i in range(40)]: 
    ax.plot_date(df.index.values, df[col].values, '-') 

fig.autofmt_xdate() 
plt.show() 

so far

我想擺脫那個長平時期的,只是畫一條垂直線來代替。

我知道df.plot(),但在我的經驗,它的分解(除其他事項外,大熊貓在自己的格式,而不是使用date2numnum2date轉換datetime對象)。


看起來像一個可能的解決方案是寫一個custom scaler,但這似乎很複雜。

據我所知,編寫一個自定義Locator只會改變滴答位置(小垂直線和相關標籤),而不是繪圖本身的位置。那是對的嗎?

UPD:一個簡單的解決方案是改變時間戳(比如說,將它們重新計算爲「自啓動以來的時間」),但我更願意保留它們。

UPD:在https://stackoverflow.com/a/5657491/1214547作品對我來說有一些修改答案。我會盡快寫出我的解決方案。

+1

你想對你的x軸是不連續的或者你想調整數據的時間戳嗎? – karlson

+0

@karlson:前者。後者很簡單,我會用它作爲最後的手段,但我更願意保留原始時間戳。 – Pastafarianist

+1

也許你可以基於這個例子:http://matplotlib.org/examples/pylab_examples/broken_axis.html – karlson

回答

1

這是一個適合我的解決方案。它不能很好地處理緊密的休息(標籤可能過於擁擠),但在我看來並不重要。

import bisect 
import numpy as np 
import matplotlib.pyplot as plt 
import matplotlib.scale as mscale 
import matplotlib.transforms as mtransforms 
import matplotlib.dates as mdates 
import pandas as pd 

# heavily borrows from http://stackoverflow.com/a/5657491/1214547 

def CustomScaleFactory(breaks): 
    class CustomScale(mscale.ScaleBase): 
     name = 'custom' 

     def __init__(self, axis, **kwargs): 
      mscale.ScaleBase.__init__(self) 

     def get_transform(self): 
      return self.CustomTransform() 

     def set_default_locators_and_formatters(self, axis): 
      class HourSkippingLocator(mdates.HourLocator): 
       _breaks = breaks 
       def __init__(self, *args, **kwargs): 
        super(HourSkippingLocator, self).__init__(*args, **kwargs) 

       def _tick_allowed(self, tick): 
        for left, right in self._breaks: 
         if left <= tick <= right: 
          return False 
        return True 

       def __call__(self): 
        ticks = super(HourSkippingLocator, self).__call__() 
        ticks = [tick for tick in ticks if self._tick_allowed(tick)] 
        ticks.extend(right for (left, right) in self._breaks) 
        return ticks 

      axis.set_major_locator(HourSkippingLocator(interval=3)) 
      axis.set_major_formatter(mdates.DateFormatter("%h %d, %H:%M")) 

     class CustomTransform(mtransforms.Transform): 
      input_dims = 1 
      output_dims = 1 
      is_separable = True 
      has_inverse = True 
      _breaks = breaks 

      def __init__(self): 
       mtransforms.Transform.__init__(self) 

      def transform_non_affine(self, a): 
       # I have tried to write something smart using np.cumsum(), 
       # but failed, since it was too complicated to handle the 
       # transformation for points within breaks. 
       # On the other hand, these loops are very easily translated 
       # in plain C. 

       result = np.empty_like(a) 

       a_idx = 0 
       csum = 0 
       for left, right in self._breaks: 
        while a_idx < len(a) and a[a_idx] < left: 
         result[a_idx] = a[a_idx] - csum 
         a_idx += 1 
        while a_idx < len(a) and a[a_idx] <= right: 
         result[a_idx] = left - csum 
         a_idx += 1 
        csum += right - left 

       while a_idx < len(a): 
        result[a_idx] = a[a_idx] - csum 
        a_idx += 1 

       return result 

      def inverted(self): 
       return CustomScale.InvertedCustomTransform() 

     class InvertedCustomTransform(mtransforms.Transform): 
      input_dims = 1 
      output_dims = 1 
      is_separable = True 
      has_inverse = True 
      _breaks = breaks 

      def __init__(self): 
       mtransforms.Transform.__init__(self) 

      def transform_non_affine(self, a): 
       # Actually, this transformation isn't exactly invertible. 
       # It may glue together some points, and there is no way 
       # to separate them back. This implementation maps both 
       # points to the *left* side of the break. 

       diff = np.zeros(len(a)) 

       total_shift = 0 

       for left, right in self._breaks: 
        pos = bisect.bisect_right(a, left - total_shift) 
        if pos >= len(diff): 
         break 
        diff[pos] = right - left 
        total_shift += right - left 

       return a + diff.cumsum() 

      def inverted(self): 
       return CustomScale.CustomTransform() 

    return CustomScale 


# < ... reading my CSV in a Pandas dataframe `df` ... > 

startups = np.where(df['kind'] == 'startup')[0] 
shutdowns = np.where(df['kind'] == 'shutdown')[0] 

breaks_idx = list(zip(shutdowns, startups[1:])) 
breaks_dates = [(df.index[l], df.index[r]) for (l, r) in breaks_idx] 
breaks = [(mdates.date2num(l), mdates.date2num(r)) for (l, r) in breaks_dates] 

fig, ax = plt.subplots() 

for col in ['total'] + ['%02d' % i for i in range(40)]: 
    ax.plot_date(df.index.values, df[col].values, '-') 

# shame on matplotlib: there is no way to unregister a scale 
mscale.register_scale(CustomScaleFactory(breaks)) 
ax.set_xscale('custom') 

vlines_x = [r for (l, r) in breaks] 
vlines_ymin = np.zeros(len(vlines_x)) 
vlines_ymax = [df.iloc[r]['total'] for (l, r) in breaks_idx] 
plt.vlines(vlines_x, vlines_ymin, vlines_ymax, color='darkgrey') 

fig.autofmt_xdate() 
plt.ticklabel_format(axis='y', style='plain') 

plt.show() 

result

1

@Pastafarianist提供了一個很好的解決方案。但是,當我處理多次中斷繪圖時,我發現InvertedCustomTransform中存在一個錯誤。例如,在下面的代碼中,十字線不能跟隨第二和第三個休息時間的光標。

import bisect 
import numpy as np 
import matplotlib.pyplot as plt 
import matplotlib.scale as mscale 
import matplotlib.transforms as mtransforms 
import matplotlib.dates as mdates 
import pandas as pd 
from matplotlib.widgets import Cursor 


def CustomScaleFactory(breaks): 
    class CustomScale(mscale.ScaleBase): 
     name = 'custom' 

     def __init__(self, axis, **kwargs): 
      mscale.ScaleBase.__init__(self) 

     def get_transform(self): 
      return self.CustomTransform() 

     def set_default_locators_and_formatters(self, axis): 
      class HourSkippingLocator(mdates.HourLocator): 
       _breaks = breaks 

       def __init__(self, *args, **kwargs): 
        super(HourSkippingLocator, self).__init__(*args, **kwargs) 

       def _tick_allowed(self, tick): 
        for left, right in self._breaks: 
         if left <= tick <= right: 
          return False 
        return True 

       def __call__(self): 
        ticks = super(HourSkippingLocator, self).__call__() 
        ticks = [tick for tick in ticks if self._tick_allowed(tick) 
          ] 
        ticks.extend(right for (left, right) in self._breaks) 
        return ticks 

      axis.set_major_locator(HourSkippingLocator(interval=3)) 
      axis.set_major_formatter(mdates.DateFormatter("%h %d, %H:%M")) 

     class CustomTransform(mtransforms.Transform): 
      input_dims = 1 
      output_dims = 1 
      is_separable = True 
      has_inverse = True 
      _breaks = breaks 

      def __init__(self): 
       mtransforms.Transform.__init__(self) 

      def transform_non_affine(self, a): 
       # I have tried to write something smart using np.cumsum(), 
       # It may glue together some points, and there is no way 
       # to separate them back. This implementation maps both 
       # points to the *left* side of the break. 

       diff = np.zeros(len(a)) 

       total_shift = 0 

       for left, right in self._breaks: 
        pos = bisect.bisect_right(a, left - total_shift) 
        if pos >= len(diff): 
         break 
        diff[pos] = right - left 
        total_shift += right - left 

       return a + diff.cumsum() 

      def inverted(self): 
       return CustomScale.CustomTransform() 

    return CustomScale 

# stimulating data 
index1 = pd.date_range(start='2016-01-08 9:30', periods=10, freq='30s') 
index2 = pd.date_range(end='2016-01-08 15:00', periods=10, freq='30s') 
index = index1.union(index2) 
data1 = pd.Series(range(20), index=index.values) 
index3 = pd.date_range(start='2016-01-09 9:30', periods=10, freq='30s') 
index4 = pd.date_range(end='2016-01-09 15:00', periods=10, freq='30s') 
index = index3.union(index4) 
data2 = pd.Series(range(20), index=index.values) 
data = pd.concat([data1, data2]) 
breaks_dates = [ 
    pd.datetime.strptime('2016-01-08 9:35:00', '%Y-%m-%d %H:%M:%S'), 
    pd.datetime.strptime('2016-01-08 14:55:00', '%Y-%m-%d %H:%M:%S'), 
    pd.datetime.strptime('2016-01-08 15:00:00', '%Y-%m-%d %H:%M:%S'), 
    pd.datetime.strptime('2016-01-09 9:30:00', '%Y-%m-%d %H:%M:%S'), 
    pd.datetime.strptime('2016-01-09 9:35:00', '%Y-%m-%d %H:%M:%S'), 
    pd.datetime.strptime('2016-01-09 14:55:00', '%Y-%m-%d %H:%M:%S') 
] 
breaks_dates = [mdates.date2num(point_i) for point_i in breaks_dates] 
breaks = [(breaks_dates[i], breaks_dates[i + 1]) for i in [0, 2, 4]] 
fig, ax = plt.subplots() 
ax.plot(data.index.values, data.values) 
mscale.register_scale(CustomScaleFactory(breaks)) 
ax.set_xscale('custom') 
cursor = Cursor(ax, useblit=True, color='r', linewidth=2) 
plt.show() 

enter image description here 如果改變「transform_non_affine」功能中的「InvertedCustomTransform」類如下效果很好。

def transform_non_affine(self, a): 
    # Actually, this transformation isn't exactly invertible. 
    # It may glue together some points, and there is no way 
    # to separate them back. This implementation maps both 
    # points to the *left* side of the break. 

    diff = np.zeros(len(a)) 

    total_shift = 0 

    for left, right in self._breaks: 
     pos = bisect.bisect_right(a, left - total_shift) 
     if pos >= len(diff): 
      break 
     diff[pos] = right - left + total_shift # changed point 
     total_shift += right - left 
    return a + diff # changed point 

的原因可能是,輸入「A」的轉化方法是不整軸,這是隻有長度numpy.array 1.