問題與熊貓和半對數爲箱線圖

我有一個pandasdataframe有列：問題與熊貓和半對數爲箱線圖

「視頻」，並點擊「鏈接」以日期時間的索引值

。出於某種原因，當我使用semilogy和箱線與視頻系列中，我得到的錯誤

ValueError: Data has no positive values, and therefore can not be log-scaled.

但是當我做它的「鏈接」系列中，我可以正確地繪製箱線圖。

我已驗證均爲'視頻'和'鏈接'系列具有NaN值和正值。

有關爲什麼會發生這種情況的任何想法？下面是我做了什麼，以驗證這種情況

下面的示例代碼：

#get all the not null values of video to show that there are positive 
temp=a.types_pivot[a.types_pivot['video'].notnull()] 
print temp 

#get a count of all the NaN values to show both 'video' and 'link' has NaN 
count = 0 
for item in a.types_pivot['video']: 
    if(item.is_integer() == False): 
     count += 1 

#try to draw the plots 
print "there is %s nan values in video" % (count) 

fig=plt.figure(figsize=(6,6),dpi=50) 
ax=fig.add_subplot(111) 
ax.semilogy() 
plt.boxplot(a.types_pivot['video'].values)

這裏是

type link video created_time
2011-02-10 15:00:51+00:00 NaN 5 2011-02-17 17:50:38+00:00 NaN 5 2011-03-22 14:04:56+00:00 NaN 5

there is 5463 nan values in video

我運行視頻序列中的碼相關的輸出完全相同的代碼，除了我做

a.types_pivot['link']

，我能夠繪製箱線圖。

下面是從環系列

 

Index: 5269 entries, 2011-01-24 20:03:58+00:00 to 2012-06-22 16:56:30+00:00 
Data columns: 
link  5269 non-null values 
photo  0 non-null values 
question 0 non-null values 
status  0 non-null values 
swf   0 non-null values 
video  0 non-null values 
dtypes: float64(6) 

there is 216 nan values in link

Using the describe function 

a.types_pivot['video'].describe() 

<pre> 
count 22.000000 
mean  16.227273 
std  15.275040 
min  1.000000 
25%  5.250000 
50%  9.500000 
75%  23.000000 
max  58.000000 
</pre>

來源

2012-06-26 Bonnie Yu MSFT

您是否嘗試從'a.types_pivot ['video']。values'中移除NaNs？ –

大點振亞！是的，我確實嘗試過。 'PLT。boxplot（temp ['video']）'通過使用我的臨時變量，我有非空值，它確實工作。我不明白爲什麼直接調用它時不起作用，因爲它適用於「鏈接」系列。如果它有效，那麼我可以輕鬆地使用熊貓.boxplot和.hist函數與semilog來比較數據 –

注輸出有關：我不能上傳圖片，由於一些問題與imgur。我稍後再試。

通過調用pd.DataFrame.boxplot（）來利用pandas matplotlib helper/wrappers。我相信這會照顧你的NaN價值。它還會將兩個系列放在同一個圖中，以便您可以輕鬆比較數據。

例創建一些NaN值和負值

In [7]: df = pd.DataFrame(np.random.rand(10, 5))  
In [8]: df.ix[2:4,3] = np.nan 
In [9]: df.ix[2:3,4] = -0.45 
In [10]: df 
Out[10]: 
      0   1   2   3   4 
0 0.391882 0.776331 0.875009 0.350585 0.154517 
1 0.772635 0.657556 0.745614 0.725191 0.483967 
2 0.057269 0.417439 0.861274  NaN -0.450000 
3 0.997749 0.736229 0.084077  NaN -0.450000 
4 0.886303 0.596473 0.943397  NaN 0.816650 
5 0.018724 0.459743 0.472822 0.598056 0.273341 
6 0.894243 0.097513 0.691781 0.802758 0.785258 
7 0.222901 0.292646 0.558909 0.220400 0.622068 
8 0.458428 0.039280 0.670378 0.457238 0.912308 
9 0.516554 0.445004 0.356060 0.861035 0.433503

注意一個數據幀，我可以指望的NaN值的數目，像這樣：

In [14]: df[3].isnull().sum() # Count NaNs in the 4th column 
Out[14]: 3

箱線圖僅僅是：

In [16]: df.boxplot()

您可以創建一個半對數箱線圖，例如，通過：

In [23]: np.log(df).boxplot()

，或者更一般地說，修改/轉換到你的心臟的內容，然後箱線圖。

In [24]: df_mod = np.log(df).dropna()  
In [25]: df_mod.boxplot()

來源

2012-10-31 01:44:35 Aman

問題與熊貓和半對數爲箱線圖

回答

相關問題