2015-09-29 81 views
0

我正在處理來自Kaggle的數據集,並且想要提取帶有名稱的Pandas列的標題。我使用下面的代碼:錯誤:無法將參數轉換爲Python中的整數

def extract_patt(patt, linea): 
     matchObj = re.match(patt, linea) 
     result = "" 
     if matchObj: 
      return matchObj.group(1).lower() 
     else: 
      return "" 

    def extract_title(linea): 
     return extract_patt('^.+,\s(.+)\..+', linea) 

    titles = dataframe1["Name"].apply(extract_title) 

    title_mapping = {"": 0, "mr": 1, "miss": 2, "mrs": 3, "master": 4, "dr": 5, "rev": 6, "major": 7, "col": 7, "mlle": 8, "mme": 8, "don": 9, "lady": 10, "countess": 10, "jonkheer": 10, "sir": 9, "capt": 7, "ms": 2} 

    for k in title_mapping: 
     titles[titles == k] = title_mapping[k] 

    dataframe1["Title"] = titles 

然而,當我在Azure上機運行這段代碼的學習平臺作爲一個Python代碼,我有以下錯誤:

Error 0085: The following error occurred during script evaluation, please view the output log for more information: 
---------- Start of error message from Python interpreter ---------- 
data:text/plain,Caught exception while executing function: Traceback (most recent call last): 
    File "C:\server\invokepy.py", line 176, in batch 
    rutils.RUtils.DataFrameToRFile(outlist[i], outfiles[i]) 
    File "C:\server\RReader\rutils.py", line 28, in DataFrameToRFile 
    rwriter.write_attribute_list(attributes) 
    File "C:\server\RReader\rwriter.py", line 59, in write_attribute_list 
    self.write_object(value); 
    File "C:\server\RReader\rwriter.py", line 121, in write_object 
    write_function(flags, value.values()) 
    File "C:\server\RReader\rwriter.py", line 104, in write_objects 
    self.write_object(value) 
    File "C:\server\RReader\rwriter.py", line 121, in write_object 
    write_function(flags, value.values()) 
    File "C:\server\RReader\rwriter.py", line 71, in write_integers 
    self.write_integer(value) 
    File "C:\server\RReader\rwriter.py", line 147, in write_integer 
    self.writer.WriteInt32(value) 
    File "C:\server\RReader\BinaryIO\binarywriter.py", line 23, in WriteInt32 
    self.WriteData(self.Int32Format, data) 
    File "C:\server\RReader\BinaryIO\binarywriter.py", line 14, in WriteData 
    self.stream.write(pack(format, data)) 
error: cannot convert argument to integer 

---------- End of error message from Python interpreter ---------- 
Start time: UTC 09/29/2015 07:47:02 
End time: UTC 09/29/2015 07:47:13 

的問題可能是在映射代碼,因爲如果我刪除這個,我有一個標題而不是整數的列。

編輯:我也試過以下,而不是for循環圖,但我有同樣的錯誤:

dataframe1["Title"].replace(title_mapping, inplace=True) 

回答

0

我遇到了同樣的問題,也與泰坦尼克號的數據集。我首先使用Azure內置的「項目列」刪除了票證和機艙號碼列,然後將文件推送到Python腳本中,現在它可以工作。

idk在這些專欄裏有什麼困擾呢?有人在其他地方發佈了一條消息,其中第一行的空值可能是一個問題,MS說錯誤修復即將到來。

+0

事實上,第一行是空的(NaN),當我在上載數據集之前手動刪除它時,錯誤不再存在。 – Tasos

0

每我的經驗,這個問題的代碼是在代碼titles[titles == k] = title_mapping[k]titles == k。表達式titles == k的值類型是布爾類型。

在Python中,布爾類型是一種整型值類型。 False值等於0,並且所有非零整數都是True值。

但映射'標題'的鍵的值類型應該是字符串類型,以便錯誤消息是「不能將參數轉換爲整數」。

最好的問候。

相關問題