我已經在vba中編寫了一些代碼,以獲得從網頁導向下一頁的所有鏈接。下一頁鏈接的最大數量是255.運行我的腳本,我得到了6906鏈接中的所有鏈接。這意味着循環一次又一次地運行,我覆蓋了一些東西。篩選出重複的鏈接我可以看到有254個獨特的鏈接。我的目標不是將最高頁碼硬編碼到迭代鏈接。以下是我與努力:如何獲取通往下一頁的所有鏈接?
Sub YifyLink()
Const link = "https://www.yify-torrent.org/search/1080p/"
Dim http As New XMLHTTP60, html As New HTMLDocument, htm As New HTMLDocument
Dim x As Long, y As Long, item_link as String
With http
.Open "GET", link, False
.send
html.body.innerHTML = .responseText
End With
For Each post In html.getElementsByClassName("pager")(0).getElementsByTagName("a")
If InStr(post.innerText, "Last") Then
x = Split(Split(post.href, "-")(1), "/")(0)
End If
Next post
For y = 0 To x
item_link = link & "t-" & y & "/"
With http
.Open "GET", item_link, False
.send
htm.body.innerHTML = .responseText
End With
For Each posts In htm.getElementsByClassName("pager")(0).getElementsByTagName("a")
I = I + 1: Cells(I, 1) = posts.href
Next posts
Next y
End Sub
元素在其中鏈接:
<div class="pager"><a href="/search/1080p/" class="current">1</a> <a href="/search/1080p/t-2/">2</a> <a href="/search/1080p/t-3/">3</a> <a href="/search/1080p/t-4/">4</a> <a href="/search/1080p/t-5/">5</a> <a href="/search/1080p/t-6/">6</a> <a href="/search/1080p/t-7/">7</a> <a href="/search/1080p/t-8/">8</a> <a href="/search/1080p/t-9/">9</a> <a href="/search/1080p/t-10/">10</a> <a href="/search/1080p/t-11/">11</a> <a href="/search/1080p/t-12/">12</a> <a href="/search/1080p/t-13/">13</a> <a href="/search/1080p/t-14/">14</a> <a href="/search/1080p/t-15/">15</a> <a href="/search/1080p/t-16/">16</a> <a href="/search/1080p/t-17/">17</a> <a href="/search/1080p/t-18/">18</a> <a href="/search/1080p/t-19/">19</a> <a href="/search/1080p/t-20/">20</a> <a href="/search/1080p/t-21/">21</a> <a href="/search/1080p/t-22/">22</a> <a href="/search/1080p/t-23/">23</a> <a href="/search/1080p/t-2/">Next</a> <a href="/search/1080p/t-255/">Last</a> </div>
我得到的結果(部分部分):
about:/search/1080p/t-20/
about:/search/1080p/t-21/
about:/search/1080p/t-22/
about:/search/1080p/t-23/
about:/search/1080p/t-255/
爲什麼從頁面刮(特別是因爲他們不在頁面上開始)?爲什麼不自己生成它們? – YowE3K
因爲我嘗試使用的頁面可能不在第一頁。我怎麼能硬編碼最後一個數字然後迭代? – SIM
您目前正在抓取的頁面似乎告訴您存在的最高頁碼(255),因此您不能只抓取一個數字,然後從1循環到該數字以生成全部255個鏈接? – YowE3K