我正在寫一個python腳本以在AzureML中使用。我的數據集相當大。我有一個名爲ID(int)和DataType(文本)的數據集。我想將這些值連接在一起,只有一列文本的ID和數據類型都用逗號分開。OverflowError:大小不適合int
如何避免在執行此操作時發生錯誤。我的代碼中有任何錯誤嗎?
當我運行這段代碼,我得到了以下錯誤:
Error 0085: The following error occurred during script evaluation, please view the output log for more information:
---------- Start of error message from Python interpreter ----------
data:text/plain,Caught exception while executing function: Traceback (most recent call last):
File "C:\server\invokepy.py", line 167, in batch
idfs.append(rutils.RUtils.RFileToDataFrame(infile))
File "C:\server\RReader\rutils.py", line 15, in RFileToDataFrame
rreader = RReaderFactory.construct_from_file(filename, compressed)
File "C:\server\RReader\rreaderfactory.py", line 25, in construct_from_file
return _RReaderFactory.construct_from_stream(stream)
File "C:\server\RReader\rreaderfactory.py", line 46, in construct_from_stream
return RReader(BinaryReader(RFactoryConstants.big_endian, stream.read()))
File "C:\pyhome\lib\gzip.py", line 254, in read
self._read(readsize)
File "C:\pyhome\lib\gzip.py", line 313, in _read
self._add_read_data(uncompress)
File "C:\pyhome\lib\gzip.py", line 329, in _add_read_data
self.crc = zlib.crc32(data, self.crc) & 0xffffffffL
OverflowError: size does not fit in an int
我的代碼如下:
# The script MUST contain a function named azureml_main
# which is the entry point for this module.
#
# The entry point function can contain up to two input arguments:
# Param<dataframe1>: a pandas.DataFrame
# Param<dataframe2>: a pandas.DataFrame
def azureml_main(dataframe1):
import pandas as pd
dataframe1['SignalID,DataType'] = dataframe1['ID'] + " , " + dataframe1['DataType']
dataframe1 = dataframe1.drop('DataType')
dataframe1 = dataframe1.drop('ID')
# Return value must be of a sequence of pandas.DataFrame
return dataframe1
我得到同樣的錯誤,當我運行AzureML默認的Python代碼。所以我很確定我的數據不適合數據框。
默認腳本如下:
# The script MUST contain a function named azureml_main
# which is the entry point for this module.
#
# The entry point function can contain up to two input arguments:
# Param<dataframe1>: a pandas.DataFrame
# Param<dataframe2>: a pandas.DataFrame
def azureml_main(dataframe1 = None, dataframe2 = None):
# Execution logic goes here
print('Input pandas.DataFrame #1:\r\n\r\n{0}'.format(dataframe1))
# If a zip file is connected to the third input port is connected,
# it is unzipped under ".\Script Bundle". This directory is added
# to sys.path. Therefore, if your zip file contains a Python file
# mymodule.py you can import it using:
# import mymodule
# Return value must be of a sequence of pandas.DataFrame
return dataframe1,
我想你需要'dataframe1 [ 'SignalID,字段類型字段'] = dataframe1 [ 'ID'] astype(STR)+ 「」 + dataframe1 [ 'DataType']'用於將列ID轉換爲'字符串'和'dataframe1 = dataframe1.drop(['DataType','ID'],axis = 1)'用於丟棄列'DataType'和'ID' – jezrael