用Pandas，Matplotlib和Numpy繪製2D陣列

作爲模擬結果，我使用Pandas groupby()分析了輸出。按照我想要的方式繪製數據我有點困難。下面是我想要繪製大熊貓輸出文件（抑制爲簡單起見）：用Pandas，Matplotlib和Numpy繪製2D陣列

    Avg-del Min-del Max-del Avg-retx Min-retx Max-retx 
Prob Producers 
0.3 1   8.060291 0.587227 26.709371 42.931779 5.130041 136.216642 
    5   8.330889 0.371387 54.468836 43.166326 3.340193 275.932170 
    10   1.012147 0.161975 4.320447 6.336965 2.026241 19.177802 
0.5 1   8.039639 0.776463 26.053635 43.160880 5.798276 133.090358 
    5   4.729875 0.289472 26.717824 25.732373 2.909811 135.289244 
    10   1.043738 0.160671 4.353993 6.461914 2.015735 19.595393

我y軸是延遲和我x軸是生產商的數量。我想要有可能性爲p=0.3的錯誤條和另一個爲p=0.5的錯誤條。我的python腳本如下：

import sys 
import time 
import numpy as np 
import pandas as pd 
import matplotlib.pyplot as plt 

pd.set_option('display.expand_frame_repr', False) 

outputFile = 'averages.txt' 
f_out = open(outputFile, 'w') 

data = pd.read_csv(sys.argv[1], delimiter=",") 
result = data.groupby(["Prob", "Producers"]).mean() 

print "Writing to output file: " + outputFile 
result_s = str(result) 
f_out.write(result_s) 
f_out.close() 

*** Update from James *** 
for prob_index in result.index.levels[0]: 
r = result.loc[prob_index] 
labels = [col for col in r] 
lines = plt.plot(r) 
[line.set_label(str(prob_index)+" "+col) for col, line in zip(labels, lines)] 
ax = plt.gca() 
ax.legend() 
ax.set_xticks(r.index) 
ax.set_ylabel('Latency (s)') 
ax.set_xlabel('Number of producer nodes') 

plt.show()

現在我有4個陣列，每一個概率。如何基於delay（del）和retx再次對它們進行切片，並基於ave，min，max繪製誤差線？

來源

2016-09-07 Thiago

好吧，這裏有很多事情要做。首先，它繪製了6條線。當您的代碼調用

plt.plot(np.transpose(np.array(result)[0:3, 0:3]), label = 'p=0.3') 
plt.plot(np.transpose(np.array(result)[3:6, 0:3]), label = 'p=0.5')

它在3x3數據陣列上調用plt.plot。 plt.plot將此輸入解釋爲x和y，而不是y值的3個獨立系列（每個3個點）。對於x值，它將輸入值0,1,2。換句話說，它的第plot調用它繪製數據：根據您的X-標籤上

x = [1,2,3]; y = [8.060291, 8.330889, 1.012147] 
x = [1,2,3]; y = [0.587227, 0.371387, 0.161975] 
x = [1,2,3]; y = [26.709371, 54.468836, 4.320447]

，我想你想的值是x = [1,5,10]。試試看看它是否得到你想要的陰謀。

# iterate over the first dataframe index 
for prob_index in result.index.levels[0]: 
    r = result.loc[prob_index] 
    labels = [col for col in r] 
    lines = plt.plot(r) 
    [line.set_label(str(prob_index)+" "+col) for col, line in zip(labels, lines)] 
    ax = plt.gca() 
    ax.legend() 
    ax.set_xticks(r.index) 
    ax.set_ylabel('Latency (s)') 
    ax.set_xlabel('Number of producer nodes')

來源

2016-09-08 01:37:08 James

嗨James，感謝您的回覆。我注意到'r'通過'Prob'獲取'結果'和索引。好。雖然有一個問題仍然存在，因爲我的數據集有更多的列，我該如何切割'r'？我會根據您的代碼更新問題。謝謝 – Thiago

用Pandas，Matplotlib和Numpy繪製2D陣列

回答

相關問題