2017-02-25 36 views
1

請有人解釋爲什麼當我用熊貓創建一個簡單的異構數據框時,當我單獨訪問每一行時,數據類型會發生變化。熊貓爲什麼我的列數據類型改變了?

例如

scene_df = pd.DataFrame({ 
    'magnitude': np.random.uniform(0.1, 0.3, (10,)), 
    'x-center': np.random.uniform(-1, 1, (10,)), 
    'y-center': np.random.uniform(-1, 1, (10,)), 
    'label': np.random.randint(2, size=(10,), dtype='u1')}) 

scene_df.dtypes 

打印:

label   uint8 
magnitude float64 
x-center  float64 
y-center  float64 
dtype: object 

但是當我重複行:

[r['label'].dtype for i, r in scene_df.iterrows()] 

我得到float64的標籤

[dtype('float64'), 
dtype('float64'), 
dtype('float64'), 
dtype('float64'), 
dtype('float64'), 
... 

編輯:

要回答什麼,我打算用這個做:

def square(mag, x, y): 
    wh = np.array([mag, mag]) 
    pos = np.array((x, y)) - wh/2 
    return plt.Rectangle(pos, *wh) 

def circle(mag, x, y): 
    return plt.Circle((x, y), mag) 

shape_fn_lookup = [square, circle] 

,因爲這醜陋的代碼從而結束了:

[shape_fn_lookup[int(s['label'])](
     *s[['magnitude', 'x-center', 'y-center']]) 
for i, s in scene_df.iterrows()] 

其中給出一堆的圓圈和方塊,我可能繪製的:

[<matplotlib.patches.Circle at 0x7fcf3ea00d30>, 
<matplotlib.patches.Circle at 0x7fcf3ea00f60>, 
<matplotlib.patches.Rectangle at 0x7fcf3eb4da90>, 
<matplotlib.patches.Circle at 0x7fcf3eb4d908>, 
... 
] 

即使DataFrame.to_dict('records')執行此數據類型轉換:

type(scene_df.to_dict('records')[0]['label']) 

回答

1

我建議使用itertuples代替interrows因爲iterrows返回一個系列的每一行,它不保留跨dtypes行(對於DataFrame跨列保留dtypes)。

[type(r.label) for r in scene_df.itertuples()] 

輸出:

[numpy.uint8, 
numpy.uint8, 
numpy.uint8, 
numpy.uint8, 
numpy.uint8, 
numpy.uint8, 
numpy.uint8, 
numpy.uint8, 
numpy.uint8, 
numpy.uint8] 
+0

是的,這對我的用例來說更好: '[shape_fn_lookup [s](* rest)for i,s,* rest in scene_df。 itertuples()]' –

1

因爲iterrows()返回一個Series,其索引由每行的列名組成。

Pandas.Series只有一個D型,所以它會被downcasted到float64

In [163]: first_row = list(scene_df.iterrows())[0][1] 

In [164]: first_row 
Out[164]: 
label  0.000000 
magnitude 0.293681 
x-center -0.628142 
y-center -0.218315 
Name: 0, dtype: float64 # <--------- NOTE 

In [165]: type(first_row) 
Out[165]: pandas.core.series.Series 

In [158]: [(type(r), r.dtype) for i, r in scene_df.iterrows()] 
Out[158]: 
[(pandas.core.series.Series, dtype('float64')), 
(pandas.core.series.Series, dtype('float64')), 
(pandas.core.series.Series, dtype('float64')), 
(pandas.core.series.Series, dtype('float64')), 
(pandas.core.series.Series, dtype('float64')), 
(pandas.core.series.Series, dtype('float64')), 
(pandas.core.series.Series, dtype('float64')), 
(pandas.core.series.Series, dtype('float64')), 
(pandas.core.series.Series, dtype('float64')), 
(pandas.core.series.Series, dtype('float64'))] 
+0

那麼,有沒有合理的方式,以避免向下轉換? –

+0

您可以將'int'表示爲'float'而不丟失任何信息,但反之亦然。正如我上面說的熊貓。系列必須有一個'dtype' – MaxU

+0

@FrankWilson,這取決於你想在這個循環中做什麼... – MaxU