如何找到正確的python urllib2 splitted_page列表索引？

我無法獲取此splitted_page內的內容。我想要的只是標題，其內容爲"Sian Blake partner ..."如何找到正確的python urllib2 splitted_page列表索引？

這是我的代碼。這似乎是打印的方式來多的信息比我需要

import urllib2 

url="http://www.bbc.co.uk/news/uk-england-london-35412127" 

request = urllib2.Request(url) 

handle = urllib2.urlopen(request) 

content = handle.read() 

splitted_page = content.split("<h1 class=\"story-body\">"); 

splitted_page = splitted_page[0].split("</h1>") 

print splitted_page[0]

謝謝。

來源

2016-01-26 python_starter

可能是因爲您使用錯誤的類你有問題 - 它必須是story-body__h1

我喜歡requests和lxml所以我用它們來創建工作示例

import requests 
import lxml, lxml.html 

url="http://www.bbc.co.uk/news/uk-england-london-35412127" 

r = requests.get(url) 

html = lxml.html.fromstring(r.content) 

print(html.cssselect('.story-body__h1')[0].text)

編輯：現在你的代碼也可以工作 - 你需要story-body__h1和[1]來代替[0]

import urllib2 

url="http://www.bbc.co.uk/news/uk-england-london-35412127" 

request = urllib2.Request(url) 

handle = urllib2.urlopen(request) 

content = handle.read() 

splitted_page = content.split("<h1 class=\"story-body__h1\">"); 

splitted_page = splitted_page[1].split("</h1>") # [1] instead of [0] 

print splitted_page[0]

來源

2016-01-26 21:21:37 furas

非常感謝好友 –

如何找到正確的python urllib2 splitted_pa​​ge列表索引？

回答

相關問題

如何找到正確的python urllib2 splitted_page列表索引？