如何提取特定的段落標記

我想提取這個響應的內容：如何提取特定的段落標記

<div class="bio-container"> 
    <p class="bio profile" > 
     Chinedu is a good boy 
    </p> 
</div>

請假設也存在不同的階級屬性，其他paragrpah標籤，但我想提取這一帶班屬性「生物配置文件」

我只是想提取chinedu是一個好男孩的文件。

我試圖desc = bs.find ('p', {'class' : 'bio profile'})

但不工作

這是我想申請上述答案我確切的代碼：

import urllib 
from bs4 import BeautifulSoup as bsoup 
import string 


httpResponse = urllib.urlopen("https://twitter.com/drericcole") 
html = httpResponse.read() 
bs = bsoup(html) 
desc = bs.find("p", class_="bio profile-field") 
print desc.get_text().strip()

，但我得到一個錯誤的語句

print desc.get_text().strip() 
AttributeError: 'NoneType' object has no attribute 'get_text'

來源

2014-03-25 user3455095

將''bio profile-field''改成''bio profile'''。 – Manhattan

您應該使用.get_text()方法上desc。使用Python 2.7和4.3.2 BS：

from bs4 import BeautifulSoup as bsoup 

ofile = open("test.html") 
soup = bsoup(ofile) 

desc = soup.find("p", class_="bio profile") 
# or desc = soup.find("p", {"class":"bio profile"}) 
print desc.get_text().strip()

結果：

Chinedu is a good boy 
[Finished in 0.2s]

希望這有助於。

來源

2014-03-25 08:55:48 Manhattan

你是什麼意思，「找到我下面的確切代碼」？我的方法是提取正確的標籤，不是嗎？ – Manhattan

你是對的，我已經添加了問題的確切代碼。請幫助解決這個錯誤。 – user3455095

使用BeautifulSoup模塊從中提取所有文本標籤。 script.py的

內容：

from bs4 import BeautifulSoup 
import sys 

soup = BeautifulSoup(open(sys.argv[1], 'r'), 'html') 

    print(' '.join(map(lambda e: e.string, soup.find_all('p'))))

運行它想：

python3 script.py infile

來源

2014-03-25 08:48:01 newTag

但有其他段落標記與html代碼，我怎麼才能得到這個特定的 – user3455095

試試這個

from BeautifulSoup import BeautifulSoup as bs 
soup = bs(<Your html>) 
soup.p.text

來源

2014-03-25 08:50:09 loki

如何提取特定的段落標記

回答

相關問題