2014-01-27 140 views
2

線如果在大熊貓數據幀我有兩列這樣的:熊貓繪製不同列忽略值

df.high 
Out[11]: 
date 
2004-01-14  NaN 
2004-01-15 1.2675 
2004-01-16 1.2609 
2004-01-19 1.2426 
2004-01-20  NaN 
2004-01-21  NaN 
2004-01-22  NaN 
2004-01-23 1.2778 
2004-01-26 1.2616 

df.low 
Out[12]: 
date 
2004-01-14  NaN 
2004-01-15 1.2558 
2004-01-16 1.2349 
2004-01-19 1.2334 
2004-01-20  NaN 
2004-01-21  NaN 
2004-01-22  NaN 
2004-01-23 1.2564 
2004-01-26 1.2457 

如何繪製每個組使用該組的第一個值值的直線df.high和df.low中的最後一個組忽略了beetween中的值?

例如在這個例子中,第一行必須從df.high 2004-01-15到df.low 2004-01-19,第二個從df.high 01-23到df.low 01-26

FYI從這個例子中,我有更大的數據框比這組數值交替與NaN組,我需要保持日期時間索引在同一順序。

回答

2

首先,你可以根據NaN S編譯代碼,其將數據幀的功能:

def mysplit(df): 
    parts = np.split(df, np.where(np.isnan(df.value))[0]) 
    # removing NaN entries 
    parts = [part[~np.isnan(part.value)] for part in parts 
       if not isinstance(part, np.ndarray)] 
    # removing empty DataFrames 
    parts = [part for part in parts if not part.empty] 
    return parts 

然後你可以運行這個功能,你必須每個數據幀:

parts1 = mysplit(df1) 
#[     date value 
#1 2004-01-15 00:00:00 1.2675 
#2 2004-01-16 00:00:00 1.2609 
#3 2004-01-19 00:00:00 1.2426, 
#     date value 
#7 2004-01-23 00:00:00 1.2778 
#8 2004-01-26 00:00:00 1.2616] 

parts2 = mysplit(df2) 
#[     date value 
#1 2004-01-15 00:00:00 1.2558 
#2 2004-01-16 00:00:00 1.2349 
#3 2004-01-19 00:00:00 1.2334, 
#     date value 
#7 2004-01-23 00:00:00 1.2564 
#8 2004-01-26 00:00:00 1.2457] 

因此很容易繪製:

import matplotlib.pyplot as plt 
values = [[i.values[0,1], i.values[-1,1]] for i,j in zip(parts1, parts2)] 
for value in values: 
    plt.plot([0,1], value) 

enter image description here


編輯:實現你的建議的評論,你可以稍微改變最後一部分:

for i,j in zip(parts1, parts2): 
    plt.plot([i.index[0], j.index[-1]], [i.values[0,1], j.values[-1,1]]) 
plt.show() 

,並提供:

enter image description here

+1

好吧,看起來不錯。唯一的問題是,我想在繪圖時使日期索引保持原始順序。在你的例子中,行重疊並且索引被改變。你能幫我嗎? – pietrovismara

+0

@Cuz,我用這個新建議更新了答案...... –