2017-02-12 24 views
4

我讀過幾篇關於如何使用pd.to_numeric和applymap(locale.atof)將Pandas列轉換爲浮點數的文章。使用逗號和負數將Pandas數據框轉換爲浮點型

我遇到的問題都沒有奏效。

原帖由數據幀是D型:對象

df.append(df_income_master[", Net"]) 
Out[76]: 
Date 
2016-09-30  24.73 
2016-06-30  18.73 
2016-03-31  17.56 
2015-12-31  29.14 
2015-09-30  22.67 
2015-12-31  95.85 
2014-12-31  84.58 
2013-12-31  58.33 
2012-12-31  29.63 
2016-09-30  243.91 
2016-06-30  230.77 
2016-03-31  216.58 
2015-12-31  206.23 
2015-09-30  192.82 
2015-12-31  741.15 
2014-12-31  556.28 
2013-12-31  414.51 
2012-12-31  308.82 
2016-10-31 2,144.78 
2016-07-31 2,036.62 
2016-04-30 1,916.60 
2016-01-31 1,809.40 
2015-10-31 1,711.97 
2016-01-31 6,667.22 
2015-01-31 5,373.59 
2014-01-31 4,071.00 
2013-01-31 3,050.20 
2016-09-30  -0.06 
2016-06-30  -1.88 
2016-03-31    
2015-12-31  -0.13 
2015-09-30    
2015-12-31  -0.14 
2014-12-31  0.07 
2013-12-31   0 
2012-12-31   0 
2016-09-30  -0.8 
2016-06-30  -1.12 
2016-03-31  1.32 
2015-12-31  -0.05 
2015-09-30  -0.34 
2015-12-31  -1.37 
2014-12-31  -1.9 
2013-12-31  -1.48 
2012-12-31   0.1 
2016-10-31  41.98 
2016-07-31   35 
2016-04-30  -11.66 
2016-01-31  27.09 
2015-10-31  -3.44 
2016-01-31  14.13 
2015-01-31  -18.69 
2014-01-31  -4.87 
2013-01-31  -5.7 
dtype: object 

pd.to_numeric(df, errors='coerce') 
    Out[77]: 
    Date 
    2016-09-30  24.73 
    2016-06-30  18.73 
    2016-03-31  17.56 
    2015-12-31  29.14 
    2015-09-30  22.67 
    2015-12-31  95.85 
    2014-12-31  84.58 
    2013-12-31  58.33 
    2012-12-31  29.63 
    2016-09-30 243.91 
    2016-06-30 230.77 
    2016-03-31 216.58 
    2015-12-31 206.23 
    2015-09-30 192.82 
    2015-12-31 741.15 
    2014-12-31 556.28 
    2013-12-31 414.51 
    2012-12-31 308.82 
    2016-10-31  NaN 
    2016-07-31  NaN 
    2016-04-30  NaN 
    2016-01-31  NaN 
    2015-10-31  NaN 
    2016-01-31  NaN 
    2015-01-31  NaN 
    2014-01-31  NaN 
    2013-01-31  NaN 
    Name: Revenue, dtype: float64 

注意,當我執行轉換to_numeric,原來用逗號(千個分隔符)到楠字符串以及負數。你能幫我找個辦法嗎?

編輯:

繼續嘗試重現這一點,我添加了兩列,這對他們有問題的文字一個數據幀。我試圖最終將這些列轉換爲浮動。但是,我得到了各種各樣的錯誤:

df 
Out[168]: 
      Revenue Other, Net 
Date       
2016-09-30  24.73  -0.06 
2016-06-30  18.73  -1.88 
2016-03-31  17.56   
2015-12-31  29.14  -0.13 
2015-09-30  22.67   
2015-12-31  95.85  -0.14 
2014-12-31  84.58  0.07 
2013-12-31  58.33   0 
2012-12-31  29.63   0 
2016-09-30 243.91  -0.8 
2016-06-30 230.77  -1.12 
2016-03-31 216.58  1.32 
2015-12-31 206.23  -0.05 
2015-09-30 192.82  -0.34 
2015-12-31 741.15  -1.37 
2014-12-31 556.28  -1.9 
2013-12-31 414.51  -1.48 
2012-12-31 308.82  0.1 
2016-10-31 2,144.78  41.98 
2016-07-31 2,036.62   35 
2016-04-30 1,916.60  -11.66 
2016-01-31 1,809.40  27.09 
2015-10-31 1,711.97  -3.44 
2016-01-31 6,667.22  14.13 
2015-01-31 5,373.59  -18.69 
2014-01-31 4,071.00  -4.87 
2013-01-31 3,050.20  -5.7 

下面是使用下面的解決方案的結果是:

print (pd.to_numeric(df.astype(str).str.replace(',',''), errors='coerce')) 
Traceback (most recent call last): 

    File "<ipython-input-169-d003943c86d2>", line 1, in <module> 
    print (pd.to_numeric(df.astype(str).str.replace(',',''), errors='coerce')) 

    File "/Users/Lee/anaconda/lib/python3.5/site-packages/pandas/core/generic.py", line 2744, in __getattr__ 
    return object.__getattribute__(self, name) 

AttributeError: 'DataFrame' object has no attribute 'str' 
+0

感謝修復約翰·高爾特 – leeprevost

回答

6

看來你需要replace,strings

print (df) 
2016-10-31 2,144.78 
2016-07-31 2,036.62 
2016-04-30 1,916.60 
2016-01-31 1,809.40 
2015-10-31 1,711.97 
2016-01-31 6,667.22 
2015-01-31 5,373.59 
2014-01-31 4,071.00 
2013-01-31 3,050.20 
2016-09-30  -0.06 
2016-06-30  -1.88 
2016-03-31    
2015-12-31  -0.13 
2015-09-30    
2015-12-31  -0.14 
2014-12-31  0.07 
2013-12-31   0 
2012-12-31   0 
Name: val, dtype: object 
print (pd.to_numeric(df.str.replace(',',''), errors='coerce')) 
2016-10-31 2144.78 
2016-07-31 2036.62 
2016-04-30 1916.60 
2016-01-31 1809.40 
2015-10-31 1711.97 
2016-01-31 6667.22 
2015-01-31 5373.59 
2014-01-31 4071.00 
2013-01-31 3050.20 
2016-09-30  -0.06 
2016-06-30  -1.88 
2016-03-31  NaN 
2015-12-31  -0.13 
2015-09-30  NaN 
2015-12-31  -0.14 
2014-12-31  0.07 
2013-12-31  0.00 
2012-12-31  0.00 
Name: val, dtype: float64 

編輯:

如果採用追加,那麼可能的第一dfdtypefloat和第二object,因此需要轉換爲str第一,因爲獲得混合DataFrame - 例如第一行是typefloat行和最後一行是strings

print (pd.to_numeric(df.astype(str).str.replace(',',''), errors='coerce')) 

還可以檢查types通過:

print (df.apply(type)) 
2016-09-30 <class 'float'> 
2016-06-30 <class 'float'> 
2015-12-31 <class 'float'> 
2014-12-31 <class 'float'> 
2014-01-31  <class 'str'> 
2013-01-31  <class 'str'> 
2016-09-30  <class 'str'> 
2016-06-30  <class 'str'> 
2016-03-31  <class 'str'> 
2015-12-31  <class 'str'> 
2015-09-30  <class 'str'> 
2015-12-31  <class 'str'> 
2014-12-31  <class 'str'> 
2013-12-31  <class 'str'> 
2012-12-31  <class 'str'> 
Name: val, dtype: object 

EDIT1:

如果需要的DataFrame使用apply所有列應用的解決方案:

df1 = df.apply(lambda x: pd.to_numeric(x.astype(str).str.replace(',',''), errors='coerce')) 
print (df1) 
      Revenue Other, Net 
Date       
2016-09-30 24.73  -0.06 
2016-06-30 18.73  -1.88 
2016-03-31 17.56   NaN 
2015-12-31 29.14  -0.13 
2015-09-30 22.67   NaN 
2015-12-31 95.85  -0.14 
2014-12-31 84.58  0.07 
2013-12-31 58.33  0.00 
2012-12-31 29.63  0.00 
2016-09-30 243.91  -0.80 
2016-06-30 230.77  -1.12 
2016-03-31 216.58  1.32 
2015-12-31 206.23  -0.05 
2015-09-30 192.82  -0.34 
2015-12-31 741.15  -1.37 
2014-12-31 556.28  -1.90 
2013-12-31 414.51  -1.48 
2012-12-31 308.82  0.10 
2016-10-31 2144.78  41.98 
2016-07-31 2036.62  35.00 
2016-04-30 1916.60  -11.66 
2016-01-31 1809.40  27.09 
2015-10-31 1711.97  -3.44 
2016-01-31 6667.22  14.13 
2015-01-31 5373.59  -18.69 
2014-01-31 4071.00  -4.87 
2013-01-31 3050.20  -5.70 

print(df1.dtypes) 
Revenue  float64 
Other, Net float64 
dtype: object 

但如果需要只轉換DataFrame使用subsetapply一些列:

cols = ['Revenue', ...] 
df[cols] = df[cols].apply(lambda x: pd.to_numeric(x.astype(str) 
                .str.replace(',',''), errors='coerce')) 
print (df) 
      Revenue Other, Net 
Date       
2016-09-30 24.73  -0.06 
2016-06-30 18.73  -1.88 
2016-03-31 17.56   
2015-12-31 29.14  -0.13 
2015-09-30 22.67   
2015-12-31 95.85  -0.14 
2014-12-31 84.58  0.07 
2013-12-31 58.33   0 
2012-12-31 29.63   0 
2016-09-30 243.91  -0.8 
2016-06-30 230.77  -1.12 
2016-03-31 216.58  1.32 
2015-12-31 206.23  -0.05 
2015-09-30 192.82  -0.34 
2015-12-31 741.15  -1.37 
2014-12-31 556.28  -1.9 
2013-12-31 414.51  -1.48 
2012-12-31 308.82  0.1 
2016-10-31 2144.78  41.98 
2016-07-31 2036.62   35 
2016-04-30 1916.60  -11.66 
2016-01-31 1809.40  27.09 
2015-10-31 1711.97  -3.44 
2016-01-31 6667.22  14.13 
2015-01-31 5373.59  -18.69 
2014-01-31 4071.00  -4.87 
2013-01-31 3050.20  -5.7 

print(df.dtypes) 
Revenue  float64 
Other, Net  object 
dtype: object 

EDIT2:

解決方案爲您的獎金問題:

df = pd.DataFrame({'A':['q','e','r'], 
        'B':['4','5','q'], 
        'C':[7,8,9.0], 
        'D':['1,000','3','50,000'], 
        'E':['5','3','6'], 
        'F':['w','e','r']}) 

print (df) 
    A B C  D E F 
0 q 4 7.0 1,000 5 w 
1 e 5 8.0  3 3 e 
2 r q 9.0 50,000 6 r 
#first apply original solution 
df1 = df.apply(lambda x: pd.to_numeric(x.astype(str).str.replace(',',''), errors='coerce')) 
print (df1) 
    A B C  D E F 
0 NaN 4.0 7.0 1000 5 NaN 
1 NaN 5.0 8.0  3 3 NaN 
2 NaN NaN 9.0 50000 6 NaN 

#mask where all columns are NaN - string columns 
mask = df1.isnull().all() 
print (mask) 
A  True 
B False 
C False 
D False 
E False 
F  True 
dtype: bool 
#replace NaN to string columns 
df1.loc[:, mask] = df1.loc[:, mask].combine_first(df) 
print (df1) 
    A B C  D E F 
0 q 4.0 7.0 1000 5 w 
1 e 5.0 8.0  3 3 e 
2 r NaN 9.0 50000 6 r 
+0

謝謝 - 我試過,但不幸的是,他們本來是D類對象,不是字符串,所以我得到一個錯誤,當我嘗試。 – leeprevost

+0

謝謝。這就是我所做的 - 我混合了DataFrames。 – leeprevost

+0

Darn - 我仍然收到一個錯誤:print(pd.to_numeric(df.astype(str).str.replace(',',''),errors ='脅迫')) 回溯(最近的通話最後): 文件「」,第1行,在 print(pd.to_numeric(df.astype(str).str.replace(',',''),errors = 'coerce')) 文件「/Users/Lee/anaconda/lib/python3.5/site-packages/pandas/core/generic.py」,第2744行,在__getattr__ 返回對象.__ getattribute __(self,name) AttributeError:'DataFrame'對象沒有屬性'str' – leeprevost

相關問題