2017-06-21 74 views
0

我試圖建立在使用Azure上如何訪問微軟認知API(HTTPError:HTTP錯誤400:錯誤的請求)

該文分析API上csv文件情感分析模型是我的代碼使用:

for j in range(0,num_of_batches): # this loop will add num_of_batches strings to input_texts 
    input_texts.set_value(j,"") # initialize input_texts string j 
    for i in range(j*l//num_of_batches,(j+1)*l//num_of_batches): #loop through a window of rows from the dataset 
     comment = str(mydata["tweet"][i])   #grab the comment from the current row 
     comment = comment.replace("\"", "'") #remove backslashes (why? I don’t remember. #honestblogger) 

     #add the current comment to the end of the string we’re building in input_texts string j 
     input_texts.set_value(j, input_texts[j] + '{"language":"' + "pt"',"id":"' + str(i) + '","text":"'+ comment + '"},') 

    #after we’ve looped through this window of the input dataset to build this series, add the request head and tail 
    input_texts.set_value(j, '{"documents":[' + input_texts[j] + ']}') 

headers = {'Content-Type':'application/json', 'Ocp-Apim-Subscription-Key':account_key} 

Sentiment = pd.Series() 
batch_sentiment_url = "https://westus.api.cognitive.microsoft.com/text/analytics/v2.0/sentiment" 

直到現在每一件事情是好的,但是當我嘗試從API獲取數據我在最後一部分

for j in range(0,num_of_batches): 
    # Detect sentiment for the each batch. 
    req = urllib2.Request(batch_sentiment_url, input_texts[j], headers) 
    response = urllib2.urlopen(req) 
    result = response.read() 
    obj = json.loads(result.decode('utf-8')) 

    #loop through each result string, extracting the sentiment associated with each id 
    for sentiment_analysis in obj['documents']: 
     Sentiment.set_value(sentiment_analysis['id'], sentiment_analysis['score']) 

#tack our new sentiment series onto our original dataframe 

mydata.insert(len(mydata.columns),'Sentiment',Sentiment.values) 

這個錯誤得到一個錯誤

HTTPError: HTTP Error 400: Bad Request 

回答

1

你得到一個400錯誤,因爲你的JSON格式不正確(約「PT」不匹配的引號)。我不認爲你通過將pandas模塊用於傳出請求,或試圖手工製作JSON,對自己有利。特別是你容易犯錯誤的引號或轉義字符。

這裏是你可能會怎麼做,而不是:

input_texts = [] 
for j in range(0,num_of_batches): # this loop will add num_of_batches strings to input_texts 
    documents = [] 
    for i in range(j*l//num_of_batches,(j+1)*l//num_of_batches): #loop through a window of rows from the dataset 
    documents.append({ 
     'language':'pt', 
     'id': str(i), 
     'text': str(mydata["tweet"][i])}) 
    input_texts.append({'documents':documents}) 

... 
req = urllib2.Request(batch_sentiment_url, json.dumps(input_texts[j]), headers) 
+0

它的工作非常好,返回的結果,但我的數據的長度爲1544和情緒的返回的長度是1543我怎樣才能找到失蹤記錄或放棄它! 非常感謝 –

+0

通過「數據長度」,你是指文件的數量?您可以使用'id'字段關聯輸入和輸出。 – cthrash

+0

供將來參考我添加了tweetid字段而不是id,然後我爲情感結果和tweetid創建了一個新的數據框,並將它與原始數據框連接在一起,以刪除丟失的記錄 ,並通過數據長度來表示記錄數 謝謝爲你救了我一天的幫助:) –

0

總是首先使用curl驗證API調用。之後插入代碼。這curl線工作對我來說:

curl -k -X POST -H "Ocp-Apim-Subscription-Key: <your ocp-apim-subscription-key>" -H "Content-Type: application/json" --data "{ 'documents': [ { 'id': '12345', 'text': 'now is the time for all good men to come to the aid of their party.' } ] }" "https://westus.api.cognitive.microsoft.com/text/analytics/v2.0/sentiment" 
相關問題