無法通過Python循環瀏覽API響應

所以，我正在用這個抓我的腦袋。使用HubSpot的API，我需要獲取客戶「門戶」（帳戶）中所有公司的列表。可悲的是，標準的API調用一次只能返回100家公司。當它返回一個響應時，它包含兩個參數，可以通過響應進行分頁。無法通過Python循環瀏覽API響應

其中之一是"has-more": True（這可以讓你知道，如果你可以期待更多的頁面），另一個是"offset":12345678（時間戳來抵消由請求。）

這兩個參數是事情，你可以回傳進入下一個API調用以獲取下一頁。因此，例如，最初的API調用可能看起來像：

"https://api.hubapi.com/companies/v2/companies/?hapikey={hapikey}".format(hapikey=wta_hubspot_api_key)

儘管後續調用看起來象這樣：

"https://api.hubapi.com/companies/v2/companies/?hapikey={hapikey}&offset={offset}".format(hapikey=wta_hubspot_api_key, offset=offset)

所以這是我試過到目前爲止：

#!/usr/bin/python 
# -*- coding: utf-8 -*- 

import sys 
import os.path 
import requests 
import json 
import csv 
import glob2 
import shutil 
import time 
import time as howLong 
from time import sleep 
from time import gmtime, strftime 

HubSpot_Customer_Portal_ID = "XXXXXX" 

wta_hubspot_api_key = "XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX" 

findCSV = glob2.glob('*contact*.csv') 

theDate = time=strftime("%Y-%m-%d", gmtime()) 
theTime = time=strftime("%H:%M:%S", gmtime()) 

try: 
    testData = findCSV[0] 
except IndexError: 
    print ("\nSyncronisation attempted on {date} at {time}: There are no \"contact\" CSVs, please upload one and try again.\n").format(date=theDate, time=theTime) 
    print("====================================================================================================================\n") 
    sys.exit() 

for theCSV in findCSV: 

    def get_companies(): 
     create_get_recent_companies_call = "https://api.hubapi.com/companies/v2/companies/?hapikey={hapikey}".format(hapikey=wta_hubspot_api_key) 
     headers = {'content-type': 'application/json'} 
     create_get_recent_companies_response = requests.get(create_get_recent_companies_call, headers=headers) 
     if create_get_recent_companies_response.status_code == 200: 

      offset = create_get_recent_companies_response.json()[u'offset'] 
      hasMore = create_get_recent_companies_response.json()[u'has-more'] 

      while hasMore == True: 
       for i in create_get_recent_companies_response.json()[u'companies']: 
        get_more_companies_call = "https://api.hubapi.com/companies/v2/companies/?hapikey={hapikey}&offset={offset}".format(hapikey=wta_hubspot_api_key, offset=offset) 
        get_more_companies_call_response = requests.get(get_more_companies_call, headers=headers) 
        companyName = i[u'properties'][u'name'][u'value'] 
        print("{companyName}".format(companyName=companyName)) 


     else: 
      print("Something went wrong, check the supplied field values.\n") 
      print(json.dumps(create_get_recent_companies_response.json(), sort_keys=True, indent=4)) 

    if __name__ == "__main__": 
     get_companies() 
     sys.exit()

問題是它只是一直返回相同的intitial 100結果;這是因爲參數"has-more":True在初始調用時是真實的，所以它會繼續返回相同的...

我理想的情況是，我可以解析所有公司大約120個響應頁（大約有12000家公司）。當我通過每個頁面時，我想將它的JSON內容附加到列表中，以便最終獲得包含所有120個頁面的JSON響應的列表，以便我可以解析該列表以用於不同的功能。

我在一個解決方案:(

急需這是我在我的主腳本替換功能：

  def get_companies(): 

       create_get_recent_companies_call = "https://api.hubapi.com/companies/v2/companies/recent/modified?hapikey={hapikey}".format(hapikey=wta_hubspot_api_key) 
       headers = {'content-type': 'application/json'} 
       create_get_recent_companies_response = requests.get(create_get_recent_companies_call, headers=headers) 
       if create_get_recent_companies_response.status_code == 200: 

        for i in create_get_recent_companies_response.json()[u'results']: 
         company_name = i[u'properties'][u'name'][u'value'] 
         #print(company_name) 
         if row[0].lower() == str(company_name).lower(): 
          contact_company_id = i[u'companyId'] 
          #print(contact_company_id) 
          return contact_company_id 
       else: 
        print("Something went wrong, check the supplied field values.\n") 
        #print(json.dumps(create_get_recent_companies_response.json(), sort_keys=True, indent=4))

來源

2016-03-22 Marko

這個問題似乎是：

您在第一次通話中獲得抵消額，但不會對此通話返回的實際公司數據進行任何操作。
然後在while循環中使用相同的偏移量;你從不使用後續調用中的新的。這就是爲什麼你每次都得到同樣的公司。

我認爲這個代碼get_companies()應該適合你。我無法測試它，很明顯，但我希望這是確定：

def get_companies(): 
     create_get_recent_companies_call = "https://api.hubapi.com/companies/v2/companies/?hapikey={hapikey}".format(hapikey=wta_hubspot_api_key) 
     headers = {'content-type': 'application/json'} 
     create_get_recent_companies_response = requests.get(create_get_recent_companies_call, headers=headers) 
     if create_get_recent_companies_response.status_code == 200: 

      while True: 
       for i in create_get_recent_companies_response.json()[u'companies']: 
        companyName = i[u'properties'][u'name'][u'value'] 
        print("{companyName}".format(companyName=companyName)) 
       offset = create_get_recent_companies_response.json()[u'offset'] 
       hasMore = create_get_recent_companies_response.json()[u'has-more'] 
       if not hasMore: 
        break 
       else: 
        create_get_recent_companies_call = "https://api.hubapi.com/companies/v2/companies/?hapikey={hapikey}&offset={offset}".format(hapikey=wta_hubspot_api_key, offset=offset) 
        create_get_recent_companies_response = requests.get(create_get_recent_companies_call, headers=headers) 


     else: 
      print("Something went wrong, check the supplied field values.\n") 
      print(json.dumps(create_get_recent_companies_response.json(), sort_keys=True, indent=4))

嚴格地說，else的break後不需要，但它是在與Zen of Python保持「顯式優於隱式」

請注意，您只需檢查一次200響應代碼，如果循環內出現問題，您將錯過它。您應該將所有呼叫放入循環中，並且每次都檢查是否有適當的響應。

來源

2016-03-22 07:44:39 SiHa

嗨@SiHa，感謝您的回覆 - 不幸的是，它也返回了相同的結果，儘管直接返回了前100個而不是一個一個（這是一個改進！） – Marko

@Marko對不起，我錯過了您在while循環內外使用不同名稱（'create_get_recent_companies ...'和'get_more_companies_call'）的事實。這意味着，在我的第一份草案中，儘管在循環中提取了更多的結果，每次都會迭代* first *響應。我現在更改了名字，以便它們相同。希望它現在能工作。 – SiHa

@SiHia你是一個絕對的傳奇 - 完全有效。我還有一個問題。上面的腳本是一個「測試腳本」 - 我試圖從主腳本中縮小功能範圍。回到主腳本中，我需要替換的函數是我現在在上面添加的那個函數......您認爲收集每頁結果的最佳方式是什麼？我會試着將它追加到一個列表中，或者你認爲我可以像原來那樣「返回」它嗎？ – Marko

無法通過Python循環瀏覽API響應

回答

相關問題