2017-02-28 33 views
0

我想從一個python項目中使用AWS Machine Learning批處理流程。我正在使用boto3。我在響應中收到此失敗消息。python call to boto3.client.create_data_source_from_s3

有試圖解析架構中的錯誤:\「無法反序列化布爾 實例出來START_ARRAY令牌\ n在[來源: [email protected];行:1,柱:2](參考通過鏈 : com.amazon.eml.dp.recordset.SchemaPojo [ 「dataFileContainsHeader」])\

我使用作品.csv文件。我知道這是因爲它通過控制檯進程工作。

這是我的代碼;它是被(INPUT_FILE)處理持有的URL文件的Django模型中的一個函數:

def create_data_source_from_s3(self): 
     attributes = [] 
     attribute = { "fieldName": "Var1", "fieldType": "CATEGORICAL" } 
     attributes.append(attribute) 
     attribute = { "fieldName": "Var2", "fieldType": "CATEGORICAL" } 
     attributes.append(attribute) 
     attribute = { "fieldName": "Var3", "fieldType": "NUMERIC" } 
     attributes.append(attribute) 
     attribute = { "fieldName": "Var4", "fieldType": "CATEGORICAL" } 
     attributes.append(attribute) 
     attribute = { "fieldName": "Var5", "fieldType": "CATEGORICAL" } 
     attributes.append(attribute) 
     attribute = { "fieldName": "Var6", "fieldType": "CATEGORICAL" } 
     attributes.append(attribute) 

     dataSchema = {} 
     dataSchema['version'] = '1.0' 
     dataSchema['dataFormat'] = 'CSV' 
     dataSchema['attributes'] = attributes 
     dataSchema["targetFieldName"] = "Var6" 
     dataSchema["dataFileContainsHeader"] = True, 
     json_data = json.dumps(dataSchema) 

     client = boto3.client('machinelearning', region_name=settings.region, aws_access_key_id=settings.aws_access_key_id, aws_secret_access_key=settings.aws_secret_access_key) 
     #create a datasource 
     return client.create_data_source_from_s3(
      DataSourceId=self.input_file.name, 
      DataSourceName=self.input_file.name, 
      DataSpec={ 
       'DataLocationS3': 's3://' + settings.AWS_S3_BUCKET_NAME + '/' + self.input_file.name, 
       'DataSchema': json_data, 
      }, 
      ComputeStatistics=True 
      ) 

任何想法我做錯了嗎?

回答

2

刪除逗號

dataSchema["dataFileContainsHeader"] = True, 

這導致Python來認爲,要添加一個元組。所以,你的dataSchema實際上包含了(真)

和你的輸出看起來像這樣

{"dataFileContainsHeader": [true], "attributes": [{"fieldName": "Var1", "fieldType": "CATEGORICAL"}, {"fieldName": "Var2", "fieldType": "CATEGORICAL"}, {"fieldName": "Var3", "fieldType": "NUMERIC"}, {"fieldName": "Var4", "fieldType": "CATEGORICAL"}, {"fieldName": "Var5", "fieldType": "CATEGORICAL"}, {"fieldName": "Var6", "fieldType": "CATEGORICAL"}], "version": "1.0", "dataFormat": "CSV", "targetFieldName": "Var6"} 

AWS是不是期待這樣的事情

"dataFileContainsHeader": true