0
下面是代碼:PyQuery找到子元素節點文本
from pyquery import PyQuery
content = '''<td field="exceptions"><div style="white-space:normal;height:auto;" \
class="datagrid-cell datagrid-cell-c2-exceptions">Traceback (most recent call last):<br>\
File "./crawler.py", line 381, in <module><br> \
crawler.start()<br> File "./crawler.py", line 153, in start<br> \
raise RemoteTransportException(e)<br>RemoteTransportException: \
This socket is already used by another greenlet: <bound method Waiter.\
switch of <gevent.hub.Waiter object at 0x7f64d499d6e0>><br></div></td>'''
pq = PyQuery(content)
for content in pq('td div'):
print content.text # get Traceback (most recent call last):
for content in pq('td div'):
for sub in content.getchildren():
print sub.text
# Traceback (most recent call last):
# None
# None
# None
# None
# None
# None
當你,我想在td div
元素的內容,它應該是
Traceback (most recent call last):
File "./crawler.py", line 381, in <module>
crawler.start()
File "./crawler.py", line 153, in start
raise RemoteTransportException(e)
RemoteTransportException: This socket is already used by another greenlet: <bound method Waiter.switch of <gevent.hub.Waiter object at 0x7f64d499d6e0>>
但我只是得到Traceback (most recent call last):
。 那麼如何找到td div
裏面帶有子標籤的所有文字呢?
它應該是'soup.find( 'TD')找到( 'DIV')text' –
哎呦,對不起,:/ – rofls
你需要一個解決方案與PyQuery? :) – rofls