2017-04-06 101 views
3

我想從使用BeautifulSoup的HTML中提取一些數據。我只想返回data-buyout="1 alchemy" data-ign="DanForeverr" data-league="Standard" data-name="Sulphur Wastes Map" data-seller="NoCocent" data-sellerid="None" data-tab="~price 1 alch" data-x="6" data-y="7" *`,但我沒有得到任何結果。我正在使用下面的代碼。任何幫助,將不勝感激。使用BeautifulSoup從tbody提取數據

parsed = soup.find_all('tbody', class=re.compile('^data-')) 
<tbody class="item item-live-c324ceb98e25716a0fad0727e0cd64e3" data-buyout="1 alchemy" data-ign="DanForeverr" data-league="Standard" data-name="Sulphur Wastes Map" data-seller="NoCocent" data-sellerid="None" data-tab="~price 1 alch" data-x="6" data-y="7" id="item-container-0"> 
<tr class="first-line"> 
    <td class="icon-td"> 
    <div class="icon"> 
    <img alt="Item icon" src="https://web.poecdn.com/image/Art/2DItems/Maps/AtlasMaps/SulphurWastes3.png?scale=1&amp;w=1&amp;h=1&amp;v=48802019c4a2e88af038d75ec1e4b31e3"/> 
    \n 
    <div class="sockets" style="position: absolute;"> 
    \n 
    <div class="sockets-inner" style="position: relative; width:94px;"> 
     \n 
    </div> 
    \n 
    </div> 
    </div> 
    </td> 
    <td class="item-cell"> 
    <h5> 
    <a class="title itemframe0" href="#" onclick="return false;" target="_blank"> 
    Sulphur Wastes Map 
    </a> 
    <span class="found-time-ago"> 
    2 months ago 
    </span> 
    </h5> 
    <ul class="requirements proplist"> 
    <li> 
    <span class="sortable" data-name="ilvl"> 
     ilvl: 80 
    </span> 
    </li> 
    </ul> 
    <span class="sockets-raw" style="display:none"> 
    </span> 
    <ul class="item-mods"> 
    </ul> 
    </td> 
    <td class="table-stats"> 
    <table> 
    <tr class="calibrate"> 
    <th> 
    </th> 
    <th> 
    </th> 
    <th> 
    </th> 
    <th> 
    </th> 
    <th> 
    </th> 
    <th> 
    </th> 
    <th> 
    </th> 
    <th> 
    </th> 
    <th> 
    </th> 
    <th> 
    </th> 
    <th> 
    </th> 
    <th> 
    </th> 
    <th> 
    </th> 
    <th> 
    </th> 
    </tr> 
    <tr class="cell-first"> 
    <th class="disabled" colspan="2"> 
     Quality 
    </th> 
    <th class="disabled" colspan="2"> 
     Phys. 
    </th> 
    <th class="disabled" colspan="2"> 
     Elem. 
    </th> 
    <th class="disabled" colspan="2"> 
     APS 
    </th> 
    <th class="disabled" colspan="2"> 
     DPS 
    </th> 
    <th class="disabled" colspan="2"> 
     pDPS 
    </th> 
    <th class="disabled" colspan="2"> 
     eDPS 
    </th> 
    </tr> 
    <tr class="cell-first"> 
    <td class="sortable property " colspan="2" data-name="q" data-value="0"> 
     \xa0 
    </td> 
    <td class="sortable property " colspan="2" data-name="pd" data-value="0.0"> 
    </td> 
    <td class="sortable property " colspan="2" data-ed="" data-name="ed" data-value="0.0"> 
    </td> 
    <td class="sortable property " colspan="2" data-name="aps" data-value="0"> 
     \xa0 
    </td> 
    <td class="sortable property " colspan="2" data-name="dps" data-value="0.0"> 
     \xa0 
    </td> 
    <td class="sortable property " colspan="2" data-name="pdps" data-value="0.0"> 
     \xa0 
    </td> 
    <td class="sortable property " colspan="2" data-name="edps" data-value="0.0"> 
     \xa0 
    </td> 
    </tr> 
    <tr class="cell-second"> 
    <th class="cell-empty"> 
    </th> 
    <th class="disabled" colspan="2"> 
     Armour 
    </th> 
    <th class="disabled" colspan="2"> 
     Evasion 
    </th> 
    <th class="disabled" colspan="2"> 
     Shield 
    </th> 
    <th class="disabled" colspan="2"> 
     Block 
    </th> 
    <th class="disabled" colspan="2"> 
     Crit. 
    </th> 
    <th colspan="2"> 
     Tier 
    </th> 
    </tr> 
    <tr class="cell-second"> 
    <td class="cell-empty"> 
    </td> 
    <td class="sortable property " colspan="2" data-name="armour" data-value="0"> 
     \xa0 
    </td> 
    <td class="sortable property " colspan="2" data-name="evasion" data-value="0"> 
     \xa0 
    </td> 
    <td class="sortable property " colspan="2" data-name="shield" data-value="0"> 
     \xa0 
    </td> 
    <td class="sortable property " colspan="2" data-name="block" data-value="0"> 
     \xa0 
    </td> 
    <td class="sortable property " colspan="2" data-name="crit" data-value="0"> 
     \xa0 
    </td> 
    <td class="sortable property " colspan="2" data-name="level" data-value="13"> 
     13 
    </td> 
    </tr> 
    </table> 

回答

0

您正在嘗試在標記類中查找標記屬性,這不起作用。

爲什麼找不到ID?只要確保它包含前0

​​
0

那麼你不能真正做到這一點,你可以提取這樣的標籤的具體信息。

定義您發佈的內容,比如x = HTML:x = '''<tbody class="item item-live-c324ceb98e25716a0fad0727e0cd64e3" data-buyout="1 alchemy" data-ign="DanForeverr" data-league="Standard" data-name="Sulphur Wastes Map" data-seller="NoCocent" data-sellerid="None" data-tab="~price 1 alch" data-x="6" data-y="7" id="item-container-0">'''

soup = BeautifulSoup(x,'lxml') 

this_class = soup.findAll('tbody',{'class':'item item-live-c324ceb98e25716a0fad0727e0cd64e3'}) 
#This is used to pinpoint the exact tbody (you can do it your way), 
# but it's useful because you give it the exacty key-value. (Mostly can't miss) 

for i in this_class: 
    print(i['data-buyout']) 
    print(i['data-ign']) 
    print(i['data-name']) 
    print(i['id']) 

可以打印這些屬性的每一個值,但如果你使用soup.findAllsou.find只付印( 一個分支但也是整個(兒童)

0

下解決了我的問題的combonation的子

parsed = soup.select("tbody[id*=item-container-]") 
for i in parsed: 
    print(i['data-buyout']) 
    print(i['data-ign']) 
    print(i['data-name']) 
    print(i['id']) 
相關問題