解析HTML表單輸入標籤與美麗的湯

好吧，我需要解析HTML表單，從「輸入」，我需要提取與不是文本類型「文本」和任何人的人。解析HTML表單輸入標籤與美麗的湯

我有這樣的代碼：

from BeautifulSoup import BeautifulSoup as beatsop 

html_data = open("forms.html") 

def html_parser(html_data) 
    html_proc = beatsop(html_data) 
    #We extract the text inputs. 
    txtinput = html_proc.findAll('input', {'type':'text'}) 
    #We extract the any kind of input that is not text. 
    listform = ["radio", "checkbox", "password", "file", "image", "hidden"] 
    otrimput = html_proc.findAll('input', {'type':listform}) 

html_parser(html_data)

我與當地的文檔中使用它，但你可以使用的urllib要求任何網頁與形式。現在的問題，我需要提取的非文本輸入形式的「價值」的標籤，文本的人的「名字」的標籤。有誰知道我該怎麼做？

謝謝！

來源

2013-06-21 Asp1r3-At0m

要訪問元素的屬性，使用element['attribute']。

from BeautifulSoup import BeautifulSoup as beatsop 


def html_parser(html_data): 
    html_proc = beatsop(html_data) 
    #We extract the text inputs. 
    txtinput = html_proc.findAll('input', {'type':'text'}) 
    listform = ["radio", "checkbox", "password", "file", "image", "hidden"] 
    otrimput = html_proc.findAll('input', {'type': listform}) 

    print('Text input names:') 
    for elem in txtinput: 
     print(elem['name']) 

    print('Non-text input values:') 
    for elem in otrimput: 
     value = elem.get('value') 
     if value: 
      print(value) 
     else 
      print('{} has no value'.format(elem)) 

with open("forms.html") as html_data: 
    html_parser(html_data)

來源

2013-06-21 01:00:47 falsetru

你是神？這工作像一個魅力。但是，我得到這個錯誤：回溯（最近通話最後一個）：文件「beatusup.py」，第21行，在 html_parser（html_data）文件「beatusup.py」 18行，在html_parser打印（ELEM [ '值'] ）文件「/usr/local/lib/python2.7/dist-packages/BeautifulSoup.py」，線路613，在返回的GetItem self._getAttrMap（）[關鍵] KeyError異常： '值' –

你可以幫我這個錯誤？或者它與我的機器？ –

@ Asp1r3-At0m，我更新了代碼。 – falsetru

解析HTML表單輸入標籤與美麗的湯

回答

相關問題