熊貓爲什麼我的列數據類型改變了？

請有人解釋爲什麼當我用熊貓創建一個簡單的異構數據框時，當我單獨訪問每一行時，數據類型會發生變化。熊貓爲什麼我的列數據類型改變了？

例如

scene_df = pd.DataFrame({ 
    'magnitude': np.random.uniform(0.1, 0.3, (10,)), 
    'x-center': np.random.uniform(-1, 1, (10,)), 
    'y-center': np.random.uniform(-1, 1, (10,)), 
    'label': np.random.randint(2, size=(10,), dtype='u1')}) 

scene_df.dtypes

打印：

label   uint8 
magnitude float64 
x-center  float64 
y-center  float64 
dtype: object

但是當我重複行：

[r['label'].dtype for i, r in scene_df.iterrows()]

我得到float64的標籤

[dtype('float64'), 
dtype('float64'), 
dtype('float64'), 
dtype('float64'), 
dtype('float64'), 
...

編輯：

要回答什麼，我打算用這個做：

def square(mag, x, y): 
    wh = np.array([mag, mag]) 
    pos = np.array((x, y)) - wh/2 
    return plt.Rectangle(pos, *wh) 

def circle(mag, x, y): 
    return plt.Circle((x, y), mag) 

shape_fn_lookup = [square, circle]

，因爲這醜陋的代碼從而結束了：

[shape_fn_lookup[int(s['label'])](
     *s[['magnitude', 'x-center', 'y-center']]) 
for i, s in scene_df.iterrows()]

其中給出一堆的圓圈和方塊，我可能繪製的：

[<matplotlib.patches.Circle at 0x7fcf3ea00d30>, 
<matplotlib.patches.Circle at 0x7fcf3ea00f60>, 
<matplotlib.patches.Rectangle at 0x7fcf3eb4da90>, 
<matplotlib.patches.Circle at 0x7fcf3eb4d908>, 
... 
]

即使DataFrame.to_dict('records')執行此數據類型轉換：

type(scene_df.to_dict('records')[0]['label'])

來源

2017-02-25 Frank Wilson

我建議使用itertuples代替interrows因爲iterrows返回一個系列的每一行，它不保留跨dtypes行（對於DataFrame跨列保留dtypes）。

[type(r.label) for r in scene_df.itertuples()]

輸出：

[numpy.uint8, 
numpy.uint8, 
numpy.uint8, 
numpy.uint8, 
numpy.uint8, 
numpy.uint8, 
numpy.uint8, 
numpy.uint8, 
numpy.uint8, 
numpy.uint8]

來源

2017-02-25 12:57:43 Rene

是的，這對我的用例來說更好： '[shape_fn_lookup [s]（* rest）for i，s，* rest in scene_df。 itertuples（）]' –

因爲iterrows()返回一個Series，其索引由每行的列名組成。

Pandas.Series只有一個D型，所以它會被downcasted到float64：

In [163]: first_row = list(scene_df.iterrows())[0][1] 

In [164]: first_row 
Out[164]: 
label  0.000000 
magnitude 0.293681 
x-center -0.628142 
y-center -0.218315 
Name: 0, dtype: float64 # <--------- NOTE 

In [165]: type(first_row) 
Out[165]: pandas.core.series.Series 

In [158]: [(type(r), r.dtype) for i, r in scene_df.iterrows()] 
Out[158]: 
[(pandas.core.series.Series, dtype('float64')), 
(pandas.core.series.Series, dtype('float64')), 
(pandas.core.series.Series, dtype('float64')), 
(pandas.core.series.Series, dtype('float64')), 
(pandas.core.series.Series, dtype('float64')), 
(pandas.core.series.Series, dtype('float64')), 
(pandas.core.series.Series, dtype('float64')), 
(pandas.core.series.Series, dtype('float64')), 
(pandas.core.series.Series, dtype('float64')), 
(pandas.core.series.Series, dtype('float64'))]

來源

2017-02-25 12:26:03 MaxU

那麼，有沒有合理的方式，以避免向下轉換？ –

您可以將'int'表示爲'float'而不丟失任何信息，但反之亦然。正如我上面說的熊貓。系列必須有一個'dtype' – MaxU

@FrankWilson，這取決於你想在這個循環中做什麼... – MaxU

熊貓爲什麼我的列數據類型改變了？

回答

相關問題