2016-09-07 155 views
1

作爲模擬結果,我使用Pandas groupby()分析了輸出。按照我想要的方式繪製數據我有點困難。下面是我想要繪製大熊貓輸出文件(抑制爲簡單起見):用Pandas,Matplotlib和Numpy繪製2D陣列

    Avg-del Min-del Max-del Avg-retx Min-retx Max-retx 
Prob Producers 
0.3 1   8.060291 0.587227 26.709371 42.931779 5.130041 136.216642 
    5   8.330889 0.371387 54.468836 43.166326 3.340193 275.932170 
    10   1.012147 0.161975 4.320447 6.336965 2.026241 19.177802 
0.5 1   8.039639 0.776463 26.053635 43.160880 5.798276 133.090358 
    5   4.729875 0.289472 26.717824 25.732373 2.909811 135.289244 
    10   1.043738 0.160671 4.353993 6.461914 2.015735 19.595393 

我y軸是延遲和我x軸是生產商的數量。我想要有可能性爲p=0.3的錯誤條和另一個爲p=0.5的錯誤條。 我的python腳本如下:

import sys 
import time 
import numpy as np 
import pandas as pd 
import matplotlib.pyplot as plt 

pd.set_option('display.expand_frame_repr', False) 

outputFile = 'averages.txt' 
f_out = open(outputFile, 'w') 

data = pd.read_csv(sys.argv[1], delimiter=",") 
result = data.groupby(["Prob", "Producers"]).mean() 

print "Writing to output file: " + outputFile 
result_s = str(result) 
f_out.write(result_s) 
f_out.close() 

*** Update from James *** 
for prob_index in result.index.levels[0]: 
r = result.loc[prob_index] 
labels = [col for col in r] 
lines = plt.plot(r) 
[line.set_label(str(prob_index)+" "+col) for col, line in zip(labels, lines)] 
ax = plt.gca() 
ax.legend() 
ax.set_xticks(r.index) 
ax.set_ylabel('Latency (s)') 
ax.set_xlabel('Number of producer nodes') 

plt.show() 

現在我有4個陣列,每一個概率。 如何基於delay(del)和retx再次對它們進行切片,並基於ave,min,max繪製誤差線?

回答

1

好吧,這裏有很多事情要做。首先,它繪製了6條線。當您的代碼調用

plt.plot(np.transpose(np.array(result)[0:3, 0:3]), label = 'p=0.3') 
plt.plot(np.transpose(np.array(result)[3:6, 0:3]), label = 'p=0.5') 

它在3x3數據陣列上調用plt.plotplt.plot將此輸入解釋爲x和y,而不是y值的3個獨立系列(每個3個點)。對於x值,它將輸入值0,1,2。換句話說,它的第plot調用它繪製數據:根據您的X-標籤上

x = [1,2,3]; y = [8.060291, 8.330889, 1.012147] 
x = [1,2,3]; y = [0.587227, 0.371387, 0.161975] 
x = [1,2,3]; y = [26.709371, 54.468836, 4.320447] 

,我想你想的值是x = [1,5,10]。試試看看它是否得到你想要的陰謀。

# iterate over the first dataframe index 
for prob_index in result.index.levels[0]: 
    r = result.loc[prob_index] 
    labels = [col for col in r] 
    lines = plt.plot(r) 
    [line.set_label(str(prob_index)+" "+col) for col, line in zip(labels, lines)] 
    ax = plt.gca() 
    ax.legend() 
    ax.set_xticks(r.index) 
    ax.set_ylabel('Latency (s)') 
    ax.set_xlabel('Number of producer nodes') 
+0

嗨James, 感謝您的回覆。 我注意到'r'通過'Prob'獲取'結果'和索引。好。 雖然有一個問題仍然存在,因爲我的數據集有更多的列,我該如何切割'r'? 我會根據您的代碼更新問題。 謝謝 – Thiago