我正在使用2.7.8並且有點意外bcz我得到所有文本,但是包含在最後<「br」>之後的文本沒有得到。就像我的html頁面:python如何提取br之後的文本?
<html>
<body>
<div class="entry-content" >
<p>Here is a listing of C interview questions on 「Variable Names」 along with answers, explanations and/or solutions:
</p>
<p>Which of the following is not a valid C variable name?<br>
a) int number;<br>
b) float rate;<br>
c) int variable_count;<br>
d) int $main;</p> <!--not getting-->
<p> more </p>
<p>Which of the following is true for variable names in C?<br>
a) They can contain alphanumeric characters as well as special characters<br>
b) It is not an error to declare a variable to be one of the keywords(like goto, static)<br>
c) Variable names cannot start with a digit<br>
d) Variable can be of any length</p> <!--not getting -->!
</div>
</body>
</html>
和我的代碼:
url = "http://www.sanfoundry.com/c-programming-questions-answers-variable-names-1/"
#url="http://www.sanfoundry.com/c-programming-questions-answers-variable-names-2/"
req = Request(url)
resp = urllib2.urlopen(req)
htmls = resp.read()
from bs4 import BeautifulSoup
soup = BeautifulSoup(htmls)
for br in soup.findAll('br'):
next = br.nextSibling
if not (next and isinstance(next,NavigableString)):
continue
next2 = next.nextSibling
if next2 and isinstance(next2,Tag) and next2.name == 'br':
text = str(next).strip()
if text:
print "Found:", next.encode('utf-8')
# print '...........sfsdsds.............',answ[0].encode('utf-8') #
輸出:
Found:
a) int number;
Found:
b) float rate;
Found:
c) int variable_count;
Found:
a) They can contain alphanumeric characters as well as special characters
Found:
b) It is not an error to declare a variable to be one of the keywords(like goto, static)
Found:
c) Variable names cannot start with a digit
但是我沒有得到最後的 「文本」,這是例如:
d) int $main
and
d) Variable can be of any length
後面是<「BR」>
和輸出我想獲得:
Found:
a) int number;
Found:
b) float rate;
Found:
c) int variable_count;
Found:
d) int $main
Found:
a) They can contain alphanumeric characters as well as special characters
Found:
b) It is not an error to declare a variable to be one of the keywords(like goto, static)
Found:
c) Variable names cannot start with a digit
d) Variable can be of any length
添加更多打印語句。當你繼續打印你正在跳過的內容時。將其他語句放到你的if語句中並打印你正在跳過的內容。 –
好的,我正在嘗試......... – user3440716
爲什麼你仍舊按照舊的方式來做,而不是我建議的方式[here](http://stackoverflow.com/a/34159940/771848) ? – alecxe