2017-04-26 252 views
1

我想從此表中提取顯示貨幣匯率的數據。從此表中獲取數據html python

訪問https://www.iceplc.com/travel-money/exchange-rates

我已經嘗試過這種方法,但它不工作

 table_id = driver.find_element(By.ID, 
    'data_configuration_feeds_ct_fields_body0') 
     rows = table_id.find_elements(By.TAG_NAME, "tr") # get all of the 
     rows in the table 
     for row in rows: 

     col = row.find_elements(By.TAG_NAME, "td")[1] #note: index start from 
     0, 1 is col 2 
     print(col.text) #prints text from the element 

這是HTML

</td> 

        <td valign="top" class="OuterProdCell test"> 

           <table class="ProductCell"> 
            <tr> 
            <td class="rateCountryFlag"> 
             <ul id="prodImages"> 
              <li> 
               <a href="/travel-money/buy-chilean-peso" title="Buy Chilean Peso" class="flags chilean-peso" ></a> 
              </li> 
             </ul> 
            </td> 

            <td class="ratesName"> 
            <a href="/travel-money/buy-chilean-peso" title="Buy Chilean Peso"> 
            Chilean Peso</a> 
            </td> 

            <td class="ratesClass"> 
            <a class="orderText" href="/travel-money/buy-chilean-peso" title="Buy Chilean Peso">774.8540</a> 
            </td> 
            <td class="orderNow">           
             <ul id="prodImages"> 
              <li> 
               <a class="reserveNow" href="/travel-money/buy-chilean-peso" title="Buy Chilean Peso">Order<br/>now</a> 
              </li> 
              <li> 
               <a href="/travel-money/buy-chilean-peso" title="Buy Chilean Peso" class="flags arrowGreen" ></a> 
              </li> 
             </ul> 
            </td> 
            </tr> 
           </table> 

我也試過蟒蛇硒的方法,但是我可以得到每一個的貨幣匯率,但不是名稱

   driver.get("https://www.iceplc.com/travel-money/exchange- 
      rates") 
      rates = driver.find_elements_by_class_name("ratesClass") 

      for rate in rates: 
      print(rate.text) 
+0

哪個名字?什麼是預期的輸出? –

+0

輸出歐元1.146 – xys234

+0

它的意思是以這種格式輸出整個表格,排列順序爲 – xys234

回答

1

如果您只是想獲得匯率,那麼您最好使用API​​,請參閱this question。網頁抓取會讓您容易受到破壞您的代碼的目標網頁的更改影響。

如果刮是你的目標,但你只需要重用你的硒方法,但搜索「ratesName」類。

例如:

driver.get("https://www.iceplc.com/travel-money/exchange-rates") 
rates.append((driver.find_elements_by_class_name("ratesName"), driver.find_elements_by_class_name("ratesClass"))) 

for rate in rates: 
print("Name: %s, Rate: %s" % (rate[0], rate[1])) 
1

通過分析網頁的結構,很明顯,你必須按行來分析行,你必須選擇列組件你有興趣。

對於每一行提取您通過使用find_element_by_tag_namefind_element_by_class_name

(文檔這裏http://selenium-python.readthedocs.io/locating-elements.html

driver.get("https://www.iceplc.com/travel-money/exchange-rates") 
rates=driver.find_elements_by_tag_name('tr') 
for i in rates: 
     print i.find_element_by_class_name('ratesName').text, i.find_element_by_class_name('ratesClass').text 

輸出感興趣的兩個要素:

US - Dollar 1.2536 
Croatia - Kuna 8.3997 
Canada - Dollar 1.7006 
Australia - Dollar 1.6647 
Euro - 1.1469 
...