用Beautifulsoup和Requests刮取'N'頁（如何獲得真實頁碼）

我想獲取網站中的所有標題（）。用Beautifulsoup和Requests刮取'N'頁（如何獲得真實頁碼）

http://www.shyan.gov.cn/zwhd/web/webindex.action

現在，我的代碼只能成功擦除一個頁面。但是，我想在上面的網站上找到多個可用的頁面。

例如，通過上面的url，當我點擊鏈接到「第2頁」時，整個網址不會改變。我查看了頁面源代碼，並看到javascript代碼像這樣前進到下一頁：javascript：gotopage（2）或javascript：void（0）。我的代碼是在這裏（獲取頁面1）

from bs4 import Beautifulsoup 
import requests 
url = 'http://www.shyan.gov.cn/zwhd/web/webindex.action' 
r = requests.get(url) 
soup = Beautifulsoup(r.content,'lxml') 
titles = soup.select('td.tit3 > a') 
for title in titles: 
    print(title.get_text())

如何將我的代碼更改爲颳去所有可用列出的網頁標題？非常感謝！

來源

2016-04-18 champion Ch

非常感謝！但是我無法得到下一頁。我的代碼在下面。請幫我修改它。 –

嘗試使用以下URL格式：

http://www.shiyan.gov.cn/zwhd/web/webindex.action?keyWord=&searchType=3&page.currentpage=2&page.pagesize=15&page.pagecount=2357&docStatus=&sendOrg=

該網站使用JavaScript來隱藏頁面信息傳遞給服務器請求下一個頁面。當您查看源代碼時，您會發現：

<form action="/zwhd/web/webindex.action" id="searchForm" name="searchForm" method="post"> 
<div class="item"> 
    <div class="titlel"> 
     <span>留言查詢</span> 
    <label class="dow"></label> 
    </div> 
    <input type="text" name="keyWord" id="keyword" value="" class="text"/> 
    <div class="key"> 
     <ul> 
      <li><span><input type="radio" checked="checked" value="3" name="searchType"/></span><p>編號</p></li> 
      <li><span><input type="radio" value="2" name="searchType"/></span><p>關鍵字</p></li> 
     </ul>  
    </div> 
    <input type="button" class="btn1" onclick="search();" value="查詢"/> 
    </div> 
    <input type="hidden" id="pageIndex" name="page.currentpage" value="2"/> 
    <input type="hidden" id="pageSize" name="page.pagesize" value="15"/> 
    <input type="hidden" id="pageCount" name="page.pagecount" value="2357"/> 
    <input type="hidden" id="docStatus" name="docStatus" value=""/> 
    <input type="hidden" id="sendorg" name="sendOrg" value=""/> 
    </form>

來源

2016-04-18 17:01:15 vassilo

謝謝，這是一個不錯的選擇。它比硒更容易理解。 –

@vassilo你是怎麼想出這個URL的（將隱藏元素格式化爲url）？ – Phillip

當我點擊下一頁鏈接時，我使用Google Chrome的DevTools來檢查網頁的請求。確定適當的請求，你很好走。 – vassilo

用Beautifulsoup和Requests刮取'N'頁（如何獲得真實頁碼）

回答

相關問題