如何在網頁中獲得特定值？

我有一些<div> S和站點中的其他東西，在inumerous的div如何在網頁中獲得特定值？

<input name="extWarrantyProds" type="hidden" value="23814298^true"/>

中旬，我怎樣才能獲得「價值」的一部分，從這個代碼，它是在中間的具體線路一個網站與其他的東西？

我和urllib的努力，但我甚至不知道從哪裏開始=/

來源

2011-12-05 Shady

[html5lib（http://code.google.com/p/html5lib/） – ephemient

你有任何控制權頁面的內容？你能合理地保證它不會變得太劇烈嗎？如果是的話，那麼簡單模式匹配的作品（見下面的答案），否則你需要做「真正的」HTML解析。 – jwd

我能想到的最簡單的方法：

import urllib 

urlStr = "http://www..." 

fileObj = urllib.urlopen(urlStr) 

for line in fileObj: 
    if ('<input name="extWarrantyProds"' in line): 
     startIndex = line.find('value="') + 7 
     endIndex = line.find('"',startIndex) 
     print line[startIndex:endIndex]

來源

2011-12-05 22:39:52 vdbuilder

無需什麼太花哨，如果這就是你所需要的。使用urllib下載頁面，並使用re.findall()查找值。

import re 
import urllib 

url = 'http://...' 
html = urllib.urlopen(url).read() 
matches = re.findall('<input name="extWarrantyProds.*?>', x, re.DOTALL) 
for i in matches: 
    print re.findall('value="(.*?)"', i)

來源

2011-12-05 22:37:27 kichik

import lxml.html as lh 

html = ''' 
<input name="extWarrantyProds" type="hidden" value="23814298^true"/> 
''' 

# If you want to parse from a URL: 
# tree = lh.parse('http://example.com') 

tree = lh.fromstring(html) 

print tree.xpath("//input[@name='extWarrantyProds']/@value")

來源

2011-12-05 23:40:49 Acorn

正則表達式+ html =惡夢。使用正確的解析器+1。如果你打算養成這樣的習慣，我還建議看一下http://www.crummy.com/software/BeautifulSoup/。 –

如何在網頁中獲得特定值？

回答

相關問題