BeautifulSoup：輸出開始一定數量的迭代之後改變循環

我有一個問題的答案，我問的SO here，似乎當我再運行代碼工作。BeautifulSoup：輸出開始一定數量的迭代之後改變循環

然而，當我嘗試實施它在一個循環中，結果開始的第三次迭代後改變。這只是每次調用相同URL的示例。

from bs4 import BeautifulSoup 
import requests 
import re 

for x in range(5): 
    url = 'https://www.adviserinfo.sec.gov/IAPD/content/viewform/adv/Sections/iapd_AdvPrivateFundReportingSection.aspx?ORG_PK=161227&FLNG_PK=05C43A1A0008018C026407B10062D49D056C8CC0' 
    html = requests.get(url, headers={'Cookie': 'PHPSESSID=notimportant'}) 
    soup = BeautifulSoup(html.text, "lxml") 

    tags = list(soup.find_all('span', {'class':'PrintHistRed'})) 
    tags.extend(list(soup.find_all('img', alt=re.compile('Radio|Checkbox')))[2:])  # 2: skip "are you an adviser" at the top 
    tags.extend([t.parent for t in soup.find_all(text="No Information Filed")]) 

    output = [] 

    for entry in sorted(tags): 
     if entry.name == 'img': 
      alt = entry['alt'] 
      if 'Radio' in alt: 
       output.append('NO' if 'not selected' in alt else 'YES') 
      else: 
       output.append('O' if 'not checked' in alt else 'X') 
     else: 
      output.append(entry.text) 

    print output[:9]

我試着把time.sleep（）放在代碼中的不同位置，認爲它必須這樣做，但沒有運氣。我也想知道是否與Cookie有關？但也沒辦法，真的...

任何幫助，不勝感激！

來源

2017-08-07 measure_theory

如果你把進口環的內部是什麼？ –

這實際上使得輸出在第二次迭代之後開始改變，這很有趣。不知道這是否給任何人提供任何線索...... –

好吧，這很奇怪。 –

所以你得到怪異的行爲要排序的「對象」（類型bs4.element.Tag，請參閱https://www.crummy.com/software/BeautifulSoup/bs4/doc/#tag），而不是字符串你的代碼。

變化：

for entry in sorted(tags):

到：

for entry in tags:

然後輸出：

[u'APEX INVESTMENT FUND V, L.P.', u'805-2054766781', u'Delaware', u'United States', u'$\xa07,402,178', u'$\xa05,000,000', u'47', u'4', u'28'] 
[u'APEX INVESTMENT FUND V, L.P.', u'805-2054766781', u'Delaware', u'United States', u'$\xa07,402,178', u'$\xa05,000,000', u'47', u'4', u'28'] 
[u'APEX INVESTMENT FUND V, L.P.', u'805-2054766781', u'Delaware', u'United States', u'$\xa07,402,178', u'$\xa05,000,000', u'47', u'4', u'28'] 
[u'APEX INVESTMENT FUND V, L.P.', u'805-2054766781', u'Delaware', u'United States', u'$\xa07,402,178', u'$\xa05,000,000', u'47', u'4', u'28'] 
[u'APEX INVESTMENT FUND V, L.P.', u'805-2054766781', u'Delaware', u'United States', u'$\xa07,402,178', u'$\xa05,000,000', u'47', u'4', u'28']

響應更新發表評論，如果您需要保留的順序嘗試這樣的事情（如果你願意，可以更多地壓縮代碼，不需要兩個語句）：

from bs4 import BeautifulSoup 
import requests 
import re 

for x in range(5): 
    url = 'https://www.adviserinfo.sec.gov/IAPD/content/viewform/adv/Sections/iapd_AdvPrivateFundReportingSection.aspx?ORG_PK=161227&FLNG_PK=05C43A1A0008018C026407B10062D49D056C8CC0' 
    html = requests.get(url, headers={'Cookie': 'PHPSESSID=notimportant'}) 
    soup = BeautifulSoup(html.text, "lxml") 

    regexp = re.compile(r'Radio|Checkbox') 
    mytags = [] 
    tags = soup.find_all(['span', 'img']) 
    for tag in tags: 
     if (tag.has_attr('class') and 'PrintHistRed' in tag['class']) or (tag.has_attr('alt') and regexp.search(tag['alt'])): 
      mytags.append(tag) 
     elif (tag.text == "No Information Filed"): 
      mytags.append(tag.parent) 

    output = [] 

    for entry in mytags: 
     if entry.name == 'img': 
      alt = entry['alt'] 
      if 'Radio' in alt: 
       output.append('NO' if 'not selected' in alt else 'YES') 
      else: 
       output.append('O' if 'not checked' in alt else 'X') 
     else: 
      output.append(entry.text) 

    print (output)

來源

2017-08-07 22:55:22

我需要的輸出，以很慢按照他們在網站上的顯示方式進行排序。這不會以正確的順序給出輸出，這是排序（標籤）（有時）所做的。我應該擴展示例輸出，以便明白我的意思。 –

BeautifulSoup：輸出開始一定數量的迭代之後改變循環

回答

相關問題