網絡與美麗的湯拼搶給空結果

我與美麗的湯嘗試，我試圖從包含以下類型的段一個HTML文檔的信息：網絡與美麗的湯拼搶給空結果

<div class="entity-body"> 
<h3 class="entity-name with-profile"> 
<a href="https://www.linkedin.com/profile/view?id=AA4AAAAC9qXUBMuA3-txf-cKOPsYZZ0TbWJkhgfxfpY&amp;trk=manage_invitations_profile" 
data-li-url="/profile/mini-profile-with-connections?_ed=0_3fIDL9gCh6b5R-c9s4-e_B&amp;trk=manage_invitations_miniprofile" 
class="miniprofile" 
aria-label="View profile for Ivan Grigorov"> 
<span>Ivan Grigorov</span> 
</a> 
</h3> 
<p class="entity-subheader"> 
Teacher 
</p> 
</div>

我用下面的命令：

with open("C:\Users\pv\MyFiles\HTML\Invites.html","r") as Invites: soup = bs(Invites, 'lxml') 
soup.title 
out: <title>Sent Invites\n| LinkedIn\n</title> 
invites = soup.find_all("div", class_ = "entity-body") 
type(invites) 
out: bs4.element.ResultSet 
len(invites) 
out: 0

爲什麼find_all返回空的ResultSet對象？

您的建議將不勝感激。

來源

2017-01-10 gk7

嘗試查看頁面時，您獲取它。如果你在這裏看不到這個'div'標籤，那就意味着這個部分是用'JS'生成的，所以你不能用這種方法刮擦它（你必須使用'selenium'）。 – Fejs

的問題是，該文件沒有被讀取，這是一個公正的TextIOWrapper（Python 3）或File（Python 2)對象你。閱讀文檔並通過標記，實質上是string至BeautifulSoup。

c正確的代碼將是：

with open("C:\Users\pv\MyFiles\HTML\Invites.html", "r") as Invites: 
    soup = BeautifulSoup(Invites.read(), "html.parser") 
    soup.title 
    invites = soup.find_all("div", class_="entity-body") 
    len(invites)

來源

2017-01-10 16:37:53 dasdachs

我按照你的建議更改了代碼，但是我仍然得到len（邀請）爲0. – gk7

我得到1.也許添加'print'statement：'print（len（invites））'（Python 3）或'print len （邀請）'（Python 2）。 – dasdachs

import bs4 

html = '''<div class="entity-body"> 
<h3 class="entity-name with-profile"> 
<a href="https://www.linkedin.com/profile/view?id=AA4AAAAC9qXUBMuA3-txf-cKOPsYZZ0TbWJkhgfxfpY&amp;trk=manage_invitations_profile" 
data-li-url="/profile/mini-profile-with-connections?_ed=0_3fIDL9gCh6b5R-c9s4-e_B&amp;trk=manage_invitations_miniprofile" 
class="miniprofile" 
aria-label="View profile for Ivan Grigorov"> 
<span>Ivan Grigorov</span> 
</a> 
</h3> 
<p class="entity-subheader"> 
Teacher 
</p> 
</div>''' 

soup = bs4.BeautifulSoup(html, 'lxml') 
invites = soup.find_all("div", class_ = "entity-body") 
len(invites)

出來：

此代碼工作正常

來源

2017-01-10 09:01:35

然後問題在於讀取html頁面並將其轉換爲湯對象的語句。這很奇怪，因爲我從書中複製了這個語法，並且已經用另一個html頁面測試過它。 Chrome瀏覽器通過右鍵單擊在瀏覽器中打開的網頁時通過「另存爲...」命令生成html頁面。出了什麼問題？ – gk7

@ gk7您能否提供該頁面的完整HTML代碼或網址 –

感謝您的回覆。該網頁是：https：//www.linkedin.com/people/invite – gk7

網絡與美麗的湯拼搶給空結果

回答

相關問題