的Python：Matplotlib - 用於若干數據概率圖設定

我有幾個數據組（分佈）如下：的Python：Matplotlib - 用於若干數據概率圖設定

set1 = [1,2,3,4,5] 
set2 = [3,4,5,6,7] 
set3 = [1,3,4,5,8]

如何繪製上面與y軸是與所述數據集的散點圖概率（即集合中分佈的百分位數：0％-100％），x軸是數據集名稱？ JMP中的，它被稱爲「分位數圖」。

喜歡的東西像附： enter image description here

請教育。謝謝。

[編輯]

我的數據是CSV這樣：

enter image description here

使用JMP分析工具，我能夠繪製概率分佈圖（QQ-劇情/普通位數圖如圖遠低於）：

enter image description here

我相信Joe Kington幾乎讓我的問題解決了，但是，我想知道如何將原始CSV數據處理爲probalility或percentiles數組。

我這樣做是爲了在Python中自動化一些統計分析，而不是依賴JMP進行繪圖。

來源

2011-06-13 siva

，它會更容易幫助你做到這一點。 – 2011-06-13 16:23:45

我不是你想要的東西完全清楚，所以我要去猜測，這裏...

你想要的「概率/百分點」的值是一個累積的柱狀圖？

因此，對於一個情節，你會有這樣的事情？（有標記繪製它，你已經如上圖所示，而不是更傳統的一步情節......）

import scipy.stats 
import numpy as np 
import matplotlib.pyplot as plt 

# 100 values from a normal distribution with a std of 3 and a mean of 0.5 
data = 3.0 * np.random.randn(100) + 0.5 

counts, start, dx, _ = scipy.stats.cumfreq(data, numbins=20) 
x = np.arange(counts.size) * dx + start 

plt.plot(x, counts, 'ro') 
plt.xlabel('Value') 
plt.ylabel('Cumulative Frequency') 

plt.show()

enter image description here

如果這是大概你想要的一個情節，有很多種方法在一個人物上製作多個情節。最簡單的就是使用子圖。

在這裏，我們會產生一些數據集並畫出他們用不同的符號不同的次要情節......

import itertools 
import scipy.stats 
import numpy as np 
import matplotlib.pyplot as plt 

# Generate some data... (Using a list to hold it so that the datasets don't 
# have to be the same length...) 
numdatasets = 4 
stds = np.random.randint(1, 10, size=numdatasets) 
means = np.random.randint(-5, 5, size=numdatasets) 
values = [std * np.random.randn(100) + mean for std, mean in zip(stds, means)] 

# Set up several subplots 
fig, axes = plt.subplots(nrows=1, ncols=numdatasets, figsize=(12,6)) 

# Set up some colors and markers to cycle through... 
colors = itertools.cycle(['b', 'g', 'r', 'c', 'm', 'y', 'k']) 
markers = itertools.cycle(['o', '^', 's', r'$\Phi$', 'h']) 

# Now let's actually plot our data... 
for ax, data, color, marker in zip(axes, values, colors, markers): 
    counts, start, dx, _ = scipy.stats.cumfreq(data, numbins=20) 
    x = np.arange(counts.size) * dx + start 
    ax.plot(x, counts, color=color, marker=marker, 
      markersize=10, linestyle='none') 

# Next we'll set the various labels... 
axes[0].set_ylabel('Cumulative Frequency') 
labels = ['This', 'That', 'The Other', 'And Another'] 
for ax, label in zip(axes, labels): 
    ax.set_xlabel(label) 

plt.show()

enter image description here

如果我們希望這看起來像一個連續的情節，我們可以只一起擠壓小區，關閉一些邊界。只需添加下面的呼籲plt.show()

# Because we want this to look like a continuous plot, we need to hide the 
# boundaries (a.k.a. "spines") and yticks on most of the subplots 
for ax in axes[1:]: 
    ax.spines['left'].set_color('none') 
    ax.spines['right'].set_color('none') 
    ax.yaxis.set_ticks([]) 
axes[0].spines['right'].set_color('none') 

# To reduce clutter, let's leave off the first and last x-ticks. 
for ax in axes: 
    xticks = ax.get_xticks() 
    ax.set_xticks(xticks[1:-1]) 

# Now, we'll "scrunch" all of the subplots together, so that they look like one 
fig.subplots_adjust(wspace=0)

enter image description here

希望有點幫助，無論如何前！

編輯：如果你想要百分位值，而不是累積直方圖（我真的不應該使用100作爲樣本大小！），這很容易做到。

就做這樣的事情（使用numpy.percentile而不是標準化的東西手動）：

# Replacing the for loop from before... 
plot_percentiles = range(0, 110, 10) 
for ax, data, color, marker in zip(axes, values, colors, markers): 
    x = np.percentile(data, plot_percentiles) 
    ax.plot(x, plot_percentiles, color=color, marker=marker, 
      markersize=10, linestyle='none')

enter image description here

如果你確切地描述如何將數據集轉換成要繪製什麼

來源

2011-06-14 05:06:38

不錯！順便說一句，你有沒有考慮過把這些送到畫廊？有一半時間，我發現最快的方式來弄清楚如何在matplotlib中做些事情，就是去看看它的樣子。 – DSM 2011-06-14 06:08:01

@Joe：累計頻率與百分位數相同嗎？我需要檢查一下。你幾乎解決了我的問題，我在這裏和那裏調整處理數據表。 – siva 2011-06-15 02:51:23

@siva - 不，他們不是。我不應該使用100作爲樣本大小！這使它很具誤導性！（對不起！）但是，將累積頻率值表示爲百分比非常簡單。您只需根據數據集中的樣本數進行歸一化。 – 2011-06-15 03:16:06

的Python：Matplotlib - 用於若干數據概率圖設定

回答

相關問題