2017-10-21 123 views
3

我正嘗試使用HTMLQuestion數據結構和boto3的create_hit函數來構建XML以提交給Amazon的Mechanical Turks服務。根據文檔,XML應格式化爲like this爲boto3構建HTMLQuestion XML MTurk

我創建了一個TurkTaskAssembler類,該類具有用於生成XML並通過API將此XML傳遞到Mechanical Turks平臺的方法。我使用boto3庫來處理與亞馬遜的溝通。

似乎是我生成的格式不正確,因爲當我試圖通過API通過這個XML,我得到驗證錯誤,像這樣的XML:

>>> tta = TurkTaskAssembler("What color is the sky?") 
>>> response = tta.create_hit_task() 
>>> ParamValidationError: Parameter validation failed: Invalid type for parameter Question, value: <Element HTMLQuestion at 0x1135f68c0>, type: <type 'lxml.etree._Element'>, valid types: <type 'basestring'> 

我再修改create_question_xml方法將XML轉換信封使用tostring方法的字符串,但這種生成一個不同的錯誤:

>>> tta = TurkTaskAssembler("What color is the sky?") 
>>> tta.create_hit_task() 
>>> ClientError: An error occurred (ParameterValidationError) when calling the CreateHIT operation: There was an error parsing the XML question or answer data in your request. Please make sure the data is well-formed and validates against the appropriate schema. Details: cvc-elt.1.a: Cannot find the declaration of element 'HTMLQuestion'. (1508611228659 s) 

我真的不知道我在做什麼錯的,很少有XML的經驗。

這裏是所有相關代碼:

import os 
import boto3 
from lxml.etree import Element, SubElement, CDATA, tostring 
from .settings import mturk_access_key_id, mturk_access_secret_key 

xml_schema_url = 'http://mechanicalturk.amazonaws.com/AWSMechanicalTurkDataSchemas/2011-11-11/HTMLQuestion.xsd' 


class TurkTaskAssembler(object): 

    def __init__(self, question): 
     self.client = boto3.client(
      service_name='mturk', 
      region_name='us-east-1', 
      endpoint_url='https://mturk-requester-sandbox.us-east-1.amazonaws.com', 
      aws_access_key_id=mturk_access_key_id, 
      aws_secret_access_key=mturk_access_secret_key 
     ) 
     self.question = question 

    def create_question_xml(self): 
     # questionFile = open(os.path.join(__location__, "question.xml"), "r") 
     # question = questionFile.read() 
     # return question 
     XHTML_NAMESPACE = xml_schema_url 
     XHTML = "{%s}" % XHTML_NAMESPACE 
     NSMAP = { 
      None : XHTML_NAMESPACE, 
      'xsi': 'http://www.w3.org/2001/XMLSchema-instance', 
      '' 
      } 
     envelope = Element("HTMLQuestion", nsmap=NSMAP) 

     html = """ 
      <!DOCTYPE html> 
      <html> 
      <head> 
       <meta http-equiv='Content-Type' content='text/html; charset=UTF-8'/> 
       <script type='text/javascript' src='https://s3.amazonaws.com/mturk-public/externalHIT_v1.js'></script> 
      </head> 
      <body> 
       <form name='mturk_form' method='post' id='mturk_form' action='https://www.mturk.com/mturk/externalSubmit'> 
       <input type='hidden' value='' name='assignmentId' id='assignmentId'/> 
       <h1>Answer this question</h1> 
       <p>{question}</p> 
       <p><textarea name='comment' cols='80' rows='3'></textarea></p> 
       <p><input type='submit' id='submitButton' value='Submit' /></p></form> 
       <script language='Javascript'>turkSetAssignmentID();</script> 
      </body> 
      </html> 
      """.format(question=self.question) 

     html_content = SubElement(envelope, 'HTMLContent') 
     html_content.text = CDATA(html) 
     xml_meta = """<?xml version="1.1" encoding="utf-8"?>""" 
     return xml_meta + tostring(envelope, encoding='utf-8') 

    def create_hit_task(self): 
     response = self.client.create_hit(
      MaxAssignments=1, 
      AutoApprovalDelayInSeconds=10800, 
      LifetimeInSeconds=10800, 
      AssignmentDurationInSeconds=300, 
      Reward='0.05', 
      Title='a title', 
      Keywords='some keywords', 
      Description='a description', 
      Question=self.create_question_xml(), 
     ) 
     return response 

回答

2

爲什麼不能簡單地把一個單獨的XML文件中的XML數據(像你一樣,但註釋掉)?這將阻止你必須包含幾個模塊和很多代碼。在create_question_xml()功能

<HTMLQuestion xmlns="http://mechanicalturk.amazonaws.com/AWSMechanicalTurkDataSchemas/2011-11-11/HTMLQuestion.xsd"> 
    <HTMLContent><![CDATA[ 
<!DOCTYPE html> 
<html> 
<head> 
    <meta http-equiv='Content-Type' content='text/html; charset=UTF-8'/> 
    <script type='text/javascript' src='https://s3.amazonaws.com/mturk-public/externalHIT_v1.js'></script> 
</head> 
<body> 
    <form name='mturk_form' method='post' id='mturk_form' action='https://www.mturk.com/mturk/externalSubmit'> 
    <input type='hidden' value='' name='assignmentId' id='assignmentId'/> 
    <h1>Answer this question</h1> 
    <p>{question}</p> 
    <p><textarea name='comment' cols='80' rows='3'></textarea></p> 
    <p><input type='submit' id='submitButton' value='Submit' /></p></form> 
    <script language='Javascript'>turkSetAssignmentID();</script> 
</body> 
</html> 
]]> 
    </HTMLContent> 
    <FrameHeight>450</FrameHeight> 
</HTMLQuestion> 

然後:

使用您所描述here模板,創建question.xml

def create_question_xml(self): 
    question_file = open("question.xml", "r").read() 
    xml = question_file.format(question=self.question) 
    return xml 

這應該是你所需要的。

+0

這是一個很棒的建議,謝謝!完美的作品。 –

0

我想你有點和亞馬遜建議你使用的3種格式混淆。 從我看到你去了HTMLQuestion。 (其他兩個是:ExternalQuestionQuestionFormData)。

要保留HTMLQuestion格式的問題,只需使用文檔中提供的簡單示例,無需將其封裝在XML中。這裏是一個固定功能:

def create_question_html(self): 
    # you can extract template into a file, 
    # as @Mangohero1 suggested which would simplify code a bit. 

    return """ 
    <HTMLQuestion xmlns="http://mechanicalturk.amazonaws.com/AWSMechanicalTurkDataSchemas/2011-11-11/HTMLQuestion.xsd"> 
     <HTMLContent><![CDATA[ 
    <!DOCTYPE html> 
    <html> 
    <head> 
     <meta http-equiv='Content-Type' content='text/html; charset=UTF-8'/> 
     <script type='text/javascript' src='https://s3.amazonaws.com/mturk-public/externalHIT_v1.js'></script> 
    </head> 
    <body> 
     <form name='mturk_form' method='post' id='mturk_form' action='https://www.mturk.com/mturk/externalSubmit'> 
     <input type='hidden' value='' name='assignmentId' id='assignmentId'/> 
     <h1>{question}</h1> 
     <p><textarea name='comment' cols='80' rows='3'></textarea></p> 
     <p><input type='submit' id='submitButton' value='Submit' /></p></form> 
     <script language='Javascript'>turkSetAssignmentID();</script> 
    </body> 
    </html> 
    ]]> 
     </HTMLContent> 
     <FrameHeight>450</FrameHeight> 
    </HTMLQuestion> 
    """.format(question=self.question)