2017-10-11 96 views
1

在src鏈接,這是HTML如何獲得了XPath

<div class="c" id="M_Fp01sdJgm"> 
    <div> 
     <a class="nk" href="https://weibo.cn/thebs">figre</a> 
      <img src="https://h5.sinaimg.cn/upload/2016/05/26/319/5338.gif" alt="V"/> 
      <img src="https://h5.sinaimg.cn/upload/2016/05/26/319/donate_btn_s.png" alt="M"/> 
     <span class="ctt"> 
        ":"resampling 
        <span class="kt">resampling</span> 
        ":Cleantech entrepreneurs are splicing genes in the search for greener fuels 
       ​</span>&nbsp; 
       [<a href="https://weibo.cn/mblog/picAll/Fp01sdJgm?rl=2">2 pieces of the package</a> 
       </div> 
    <div> 
     <a href="https://weibo.cn/mblog/pic/Fp01sdJgm?rl=1"> 
      <img src="http://wx1.sinaimg.cn/wap180/3ed2e6e8gy1fk7hohl2i5j219s0ps4qp.jpg" alt="images" class="ib" /> 
     </a>&nbsp; 
     <a href="https://weibo.cn/mblog/oripic?id=Fp01sdJgm&amp;u=3ed2e6e8gy1fk7hohl2i5j219s0ps4qp">image</a>&nbsp; 
     <a href="https://weibo.cn/attitude/Fp01sdJgm/add?uid=5757914684&amp;rl=1&amp;st=7b15a6">praise[28094]</a>&nbsp; 
     <a href="https://weibo.cn/repost/Fp01sdJgm?uid=1054009064&amp;rl=1">transmit[1164]</a>&nbsp; 
     <a href="https://weibo.cn/comment/Fp01sdJgm?uid=1054009064&amp;rl=1#cmtfrm" class="cc">comment[4097]</a>&nbsp;<a href="https://weibo.cn/fav/addFav/Fp01sdJgm?rl=1&amp;st=7b15a6">save</a> 
     "<!---->&nbsp;" 
     <span class="ct">10月05日 20:08&nbsp;from iPhone 7 Plus 

我嘗試寫以下,其他領域已經obtained.But「IMG」爲空

def get_user_data(self,start_url): 
    html = requests.get(url=start_url,headers=self.headers,cookies=self.cookies).content 
    selector = etree.fromstring(html,etree.HTMLParser(encoding='utf-8')) 
    all_user = selector.xpath('//div[contains(@class,"c") and contains(@id,"M")]') 
    for i in all_user: 
     user_id = i.xpath('./div[1]/a[@class="nk"]/@href') 
     content = i.xpath('./div[1]/span[1]')[0] 
     contents = content.xpath('string(.)') 
     if i.xpath('./div[2]'): 
      img = selector.xpath('./div[2]/a/img/@src')  #img is None 
      praise_num = i.xpath('./div[2]/a[3]/text()') 
      transmit_num = i.xpath('./div[2]/a[4]/text()') 
     else: 
      img = '' 
      praise_num = i.xpath('./div[2]/a[3]/text()') 
      transmit_num = i.xpath('./div[2]/a[4]/text()') 

我如何應該寫'img'? 然後我可以通過拉鍊()處理它們呢?因爲我要拯救MySQL的

回答

2

試試這個(你的形象正在DIV [1])

img = i.xpath('./div[1]/a/img/@src') 
+0

對不起,這是一個愚蠢的問題。當我寫這篇文章時我寫錯了,但我沒有找到它,謝謝 – kerberos