嘗試將字符串轉換爲整數的熊貓錯誤

DataFrame中的一個特定列是「混合」類型。它可以具有像"123456"或"ABC12345"這樣的值。

此數據框正在使用xlsxwriter寫入Excel。

對於像"123456"值，上下行熊貓將其轉換成123456.0（使它看起來像一個浮動）

我們需要把它變成XLSX 123456（即作爲+整數）的情況下，價值的完全數字。

努力：

代碼片段所示下面

import pandas as pd 
import numpy as np 
import xlsxwriter 
import os 
import datetime 
import sys 
excel_name = str(input("Please Enter Spreadsheet Name :\n").strip()) 

print("excel entered : " , excel_name) 
df_header = ['DisplayName','StoreLanguage','Territory','WorkType','EntryType','TitleInternalAlias', 
     'TitleDisplayUnlimited','LocalizationType','LicenseType','LicenseRightsDescription', 
     'FormatProfile','Start','End','PriceType','PriceValue','SRP','Description', 
     'OtherTerms','OtherInstructions','ContentID','ProductID','EncodeID','AvailID', 
     'Metadata', 'AltID', 'SuppressionLiftDate','SpecialPreOrderFulfillDate','ReleaseYear','ReleaseHistoryOriginal','ReleaseHistoryPhysicalHV', 
      'ExceptionFlag','RatingSystem','RatingValue','RatingReason','RentalDuration','WatchDuration','CaptionIncluded','CaptionExemption','Any','ContractID', 
      'ServiceProvider','TotalRunTime','HoldbackLanguage','HoldbackExclusionLanguage'] 
first_pass_drop_duplicate = df_m_d.drop_duplicates(['StoreLanguage','Territory','TitleInternalAlias','LocalizationType','LicenseType', 
            'LicenseRightsDescription','FormatProfile','Start','End','PriceType','PriceValue','ContentID','ProductID', 
            'AltID','ReleaseHistoryPhysicalHV','RatingSystem','RatingValue','CaptionIncluded'], keep=False) 
# We need to keep integer AltID as is 

first_pass_drop_duplicate.loc[first_pass_drop_duplicate['AltID']] = first_pass_drop_duplicate['AltID'].apply(lambda x : str(int(x)) if str(x).isdigit() == True else x)

我曾嘗試：

1. using `dataframe.astype(int).astype(str)` # works as long as value is not alphanumeric 
2.importing re and using pure python `re.compile()` and `replace()` -- does not work 
3.reading DF row by row in a for loop !!! Kills the machine as dataframe can have 300k+ records

每一次，錯誤，我得到：

raise KeyError('%s not in index' % objarr[mask])
KeyError: '[ 102711. 102711. 102711. 102711. 102711. 102711. 102711. 102711.\n 102711. 102711. 102711. 102711. 102711. 102711. 102711. 102711.\n 102711. 102711. 102711. 102711. 102711. 102711. 102711. 102711.\n 102711. 102711. 102711. 102711. 102711. 102711. 102711. 102711.\n 102711. 102711. 102711. 102711. 102711. 102711. 102711. 102711.\n 102711. 102711. 102711. 102711. 102711. 102711. 102711. 102711.\n 102711. 102711. 102711. 102711. 102711. 102711. 102711. 102711.\n 102711. 102711. 102711. 102711. 102711. 102711. 102711. 102711.\n 5337. 5337. 5337. 5337. 5337. 5337. 5337. 5337.\n 5337. 5337. 5337. 5337. 5337. 5337. 5337. 5337.\n 5337. 5337. 5337. 5337. 5337. 5337. 5337. 5337.\n 5337. 5337. 5337. 5337. 5337. 5337. 5337. 5337.\n 5337. 5337. 5337. 5337. 5337. 5337. 5337. 5337.\n 5337. 5337. 2124. 2124. 2124. 2124. 2124. 2124.\n 2124. 2124. 6643. 6643. 6643. 6643. 6643. 6643.\n 6643. 6643. 6643. 6643. 6643. 6643. 6643. 6643.\n 6643. 6643. 6643. 6643. 6643. 6643. 6643. 6643.\n 6643. 6643. 6643. 6643. 6643. 6643. 6643. 6643.] not in index'

我是新手在蟒蛇/熊貓，任何幫助，非常感謝解決方案。

來源

2016-09-21 SanBan

因此，你只需要將數值轉換爲'浮動'和非數值不是？ – jezrael

我需要確保它將一個+整數視爲TEXT/STRING，並且不會在實際顯示在Excel中的末尾添加一個.0（小數點）。 – SanBan

所以你需要將所有值轉換爲'type'' string'？問題是'Excel'解析'int'值轉換爲'string'爲'float'？ – jezrael

我想你需要to_numeric：

df = pd.DataFrame({'AltID':['123456','ABC12345','123456'], 
        'B':[4,5,6]}) 

print (df) 
     AltID B 
0 123456 4 
1 ABC12345 5 
2 123456 6 

df.ix[df.AltID.str.isdigit(), 'AltID'] = pd.to_numeric(df.AltID, errors='coerce') 

print (df) 
     AltID B 
0 123456 4 
1 ABC12345 5 
2 123456 6 

print (df['AltID'].apply(type)) 
0 <class 'float'> 
1  <class 'str'> 
2 <class 'float'> 
Name: AltID, dtype: object

來源

2016-09-21 05:45:34 jezrael

太棒了！它不適用於我的系列，因爲第四個元素已經是'int'了。 'pd.Series（[1]，dtype = object）.str.isdigit（）'返回'NaN'。我必須這樣做：'s.ix [s.str.isdigit（）。fillna（False）] = pd.to_numeric（s，errors ='coerce'）'它的工作很完美。 – piRSquared

而！這幾乎肯定會更快。 – piRSquared

@piRSquared - 謝謝。另一個解決方案是'df.ix [df.AltID.astype（str）.str.isdigit（），'AltID'] = pd.to_numeric（df.AltID，errors ='coerce'）' – jezrael

使用apply和pd.to_numeric與參數errors='ignore'

考慮pd.Seriess

s = pd.Series(['12345', 'abc12', '456', '65hg', 54, '12-31-2001']) 

s.apply(pd.to_numeric, errors='ignore') 

0   12345 
1   abc12 
2   456 
3   65hg 
4   54 
5 12-31-2001 
dtype: object

的通知類型

s.apply(pd.to_numeric, errors='ignore').apply(type) 

0 <type 'numpy.int64'> 
1   <type 'str'> 
2 <type 'numpy.int64'> 
3   <type 'str'> 
4   <type 'int'> 
5   <type 'str'> 
dtype: object

來源

2016-09-21 05:45:41 piRSquared

最後，它工作在熊貓使用「轉換器」選項read_excel格式

df_w02 = pd.read_excel(excel_name, names = df_header,converters = {'AltID':str,'RatingReason' : str}).fillna("")

轉換器可以「投」一類由我的功能/價值定義和不斷intefer存儲爲字符串，不增加小數點。

來源

2016-09-21 21:16:33 SanBan

嘗試將字符串轉換爲整數的熊貓錯誤

回答

相關問題