2013-01-11 73 views
1

海岸(NS/WS)= EUR35.99/US $ 46.09解析HTML,pyparser或beautifulsoup

貨幣對象= EUR42.00/US $ 53.79

<div id="t142_1" class="text" >Data Center</div> 
<div id="t143_1" class="text" >Coast (NS/WS)</div> 
<div id="t144_1" class="text" >EUR35.99/US$46.09</div> 
<div id="t145_1" class="text" >Money Object</div> 
<div id="t146_1" class="text" >EUR42.00/US$53.79</div> 
<div id="t147_1" class="text" >Date</div> 
<div id="t148_1" class="text" >7-Nov-2013/7-Nov-2013</div> 
<div id="t149_1" class="text" >Opinions</div> 

如何從這個碼值獲取「Money Object」和「Coast(NS/WS)」使用pyparser還是beautifulsoup?

我需要的變量(例如):

coast = 'EUR35.99/US$46.09' 

money_obj = 'EUR42.00/US$53.79' 

編輯:

a = soup.find_all(text='Money Object') 
for i in a: 
    print i.find_next('div').text 

但返回:

Change 

EUR42.00/US$53.79 

我只需要一個值(EUR42.00/US $ 53.79 )

回答

1

其中text就是你們的榜樣HTML:

from bs4 import BeautifulSoup as bs 

soup = bs(text) 
print soup.find(text='Money Object').find_next('div').text 
# EUR42.00/US$53.79 

其內容 - 找到Money Object的東西作爲其文本內容,然後採取下一步div s的文字...

+0

如果我有幾次我們怎麼辦編輯「錢對象」一詞?我在「下一行」中有一個不好的值 – user1966421

+0

@ user1966421然後你使用'find_all'和循環結果...不知道你爲什麼要得到這個消息 - 它意味着你沒有「Money對象」作爲根據您的數據樣本 –

+0

謝謝!我更新問題 – user1966421

0

使用pyparsing

from pyparsing import * 

data = """\ 
<div id="t142_1" class="text" >Data Center</div> 
<div id="t143_1" class="text" >Coast (NS/WS)</div> 
<div id="t144_1" class="text" >EUR35.99/US$46.09</div> 
<div id="t145_1" class="text" >Money Object</div> 
<div id="t146_1" class="text" >EUR42.00/US$53.79</div> 
<div id="t147_1" class="text" >Date</div> 
<div id="t148_1" class="text" >7-Nov-2013/7-Nov-2013</div> 
<div id="t149_1" class="text" >Opinions</div> 
""" 

divS,divE = makeHTMLTags("div") 

div = divS + SkipTo(divE).setResultsName("body") + divE 
divS.setParseAction(withAttribute(id="t144_1")) 

for tokens,start,end in div.scanString(data): 
    print "cost = " + tokens.body 

divS.setParseAction(withAttribute(id="t146_1")) 
for tokens,start,end in div.scanString(data): 
    print "money_obj = " + tokens.body 

輸出:

>>> 
cost = EUR35.99/US$46.09 
money_obj = EUR42.00/US$53.79