何時應用（pd.to_numeric）以及何時在python中使用astype（np.float64）？

我有一個名爲xiv的pandas DataFrame對象，其中有一列int64卷測量。何時應用（pd.to_numeric）以及何時在python中使用astype（np.float64）？

In[]: xiv['Volume'].head(5) 
Out[]: 

0 252000 
1 484000 
2  62000 
3 168000 
4 232000 
Name: Volume, dtype: int64

我已閱讀其他職位（如this和this）暗示以下解決方案。但是當我使用這兩種方法，它不會出現更改基礎數據的dtype：

In[]: xiv['Volume'] = pd.to_numeric(xiv['Volume']) 

In[]: xiv['Volume'].dtypes 
Out[]: 
dtype('int64')

或者......

In[]: xiv['Volume'] = pd.to_numeric(xiv['Volume']) 
Out[]: ###omitted for brevity### 

In[]: xiv['Volume'].dtypes 
Out[]: 
dtype('int64') 

In[]: xiv['Volume'] = xiv['Volume'].apply(pd.to_numeric) 

In[]: xiv['Volume'].dtypes 
Out[]: 
dtype('int64')

我也試着做一個單獨的大熊貓Series和使用該系列上面列出的方法並重新分配至x['Volume'] obect，這是pandas.core.series.Series對象。

我有，但是，發現使用numpy包的float64類型這個問題的解決方案 - 這個工作，但我不知道爲什麼它的不同。

In[]: xiv['Volume'] = xiv['Volume'].astype(np.float64) 

In[]: xiv['Volume'].dtypes 
Out[]: 
dtype('float64')

有人能解釋如何與pandas庫什麼numpy庫似乎與它的float64類容易達到的目的;即將xiv DataFrame中的列轉換爲float64。

來源

2016-10-17 d8aninja

'int64'已經是「數字」dtype。 'to_numeric（）'應該有助於將字符串轉換爲數字dtypes ... – MaxU

引用的帖子顯示通過調用'to_numeric'返回的'dtype'將會是'float64' ... – d8aninja

選中此項：'pd.to_numeric（pd。系列（[ '1'， '2'， '3']））。dtype'。只有在必要時它纔會是float64：1.系列中有NaN或不可轉換的值。 2.系列中有浮標 – MaxU

如果您已經擁有數字dtypes（int8|16|32|64，float64，boolean），你可以將其轉換爲使用大熊貓.astype()方法的另一個「數字」 D型。

演示：所以在這裏

In [95]: df.loc[1, 'b'] = 'XXXXXX' 

In [96]: df 
Out[96]: 
      a  b  c 
0 9059440.0 9590567 2076918 
1 5861102.0 XXXXXX 1947323 
2 6636568.0 162770 2487991 
3 6794572.0 5236903 5628779 
4 470121.0 4044395 4546794 

In [97]: df.dtypes 
Out[97]: 
a float64 
b  object 
c  int64 
dtype: object 

In [98]: df['b'].astype(float) 
... 
skipped 
... 
ValueError: could not convert string to float: 'XXXXXX'

我們要使用pd.to_numeric()：

In [90]: df = pd.DataFrame(np.random.randint(10**5,10**7,(5,3)),columns=list('abc'), dtype=np.int64) 

In [91]: df 
Out[91]: 
     a  b  c 
0 9059440 9590567 2076918 
1 5861102 4566089 1947323 
2 6636568 162770 2487991 
3 6794572 5236903 5628779 
4 470121 4044395 4546794 

In [92]: df.dtypes 
Out[92]: 
a int64 
b int64 
c int64 
dtype: object 

In [93]: df['a'] = df['a'].astype(float) 

In [94]: df.dtypes 
Out[94]: 
a float64 
b  int64 
c  int64 
dtype: object

它不會爲object（串）dtypes工作，這不能轉換爲數字方法：

In [99]: df.b = pd.to_numeric(df['b'], errors='coerse') 

In [100]: df 
Out[100]: 
      a   b  c 
0 9059440.0 9590567.0 2076918 
1 5861102.0  NaN 1947323 
2 6636568.0 162770.0 2487991 
3 6794572.0 5236903.0 5628779 
4 470121.0 4044395.0 4546794 

In [101]: df.dtypes 
Out[101]: 
a float64 
b float64 
c  int64 
dtype: object

來源

2016-10-17 21:31:19 MaxU

何時應用（pd.to_numeric）以及何時在python中使用astype（np.float64）？

回答

相關問題