2017-08-09 73 views
0

出於某種原因,我無法從我的HTML代碼中提取所需的文本。作爲參考,我想從<獲得 「標題」 屬性的類=「一個鏈接正常S-訪問細節頁無法從JSoup提取資源

HTML代碼:

<div id="resultsCol" class='showRightCol'> 
 
    <div id="centerMinus" class='leftCol'> 
 
    <div id="atfResults" class="a-row s-result-list-parent-container"> 
 
     <ul id="s-results-list-atf" class="s-result-list s-col-1 s-col-ws-1 s-result-list-hgrid s-height-equalized s-list-view s-text-condensed"> 
 
     <li id="result_0" data-asin="B01KIZUF7Y" class="s-result-item celwidget "> 
 
      <div class="s-item-container"> 
 
      <div class="a-fixed-left-grid"> 
 
       <div class="a-fixed-left-grid-inner" style="padding-left:218px"> 
 
       <div class="a-fixed-left-grid-col a-col-left" style="width:218px;margin-left:-218px;_margin-left:-109px;float:left;"> 
 
        <div class="a-row"> 
 
        <div aria-hidden="true" class="a-column a-span12 a-text-center"> 
 
         <a class="a-link-normal a-text-normal" href="http://rads.stackoverflow.com/amzn/click/B01KIZUF7Y"> 
 
         <img src="https://images-na.ssl-images-amazon.com/images/I/51LKyGJpYJL._AC_US218_.jpg" srcset="https://images-na.ssl-images-amazon.com/images/I/51LKyGJpYJL._AC_US218_.jpg 1x, https://images-na.ssl-images-amazon.com/images/I/51LKyGJpYJL._AC_US327_FMwebp_QL65_.jpg 1.5x, https://images-na.ssl-images-amazon.com/images/I/51LKyGJpYJL._AC_US436_FMwebp_QL65_.jpg 2x, https://images-na.ssl-images-amazon.com/images/I/51LKyGJpYJL._AC_US500_FMwebp_QL65_.jpg 2.2935x" 
 
          width="218" height="218" alt="Product Details" class="s-access-image cfMarker" data-search-image-load> 
 
         </a> 
 
         <div class="a-section a-spacing-none a-text-center"> 
 
         </div> 
 
        </div> 
 
        </div> 
 
       </div> 
 
       <div class="a-fixed-left-grid-col a-col-right" style="padding-left:2%;*width:97.6%;float:left;"> 
 
        <div class="a-row a-spacing-small"> 
 
        <div class="a-row a-spacing-none scx-truncate-medium sx-line-clamp-3 s-list-title-long"> 
 
         (want to get the title attribute from here) 
 
         <a class="a-link-normal s-access-detail-page s-color-twister-title-link a-text-normal" title="MSI GAMING Radeon RX 480 GDDR5 4GB CrossFire VR Ready FinFET DirectX 12 Graphics Card (RX 480 GAMING X 4G)" href="http://rads.stackoverflow.com/amzn/click/B01KIZUF7Y"> 
 
         <h2 data-attribute="MSI GAMING Radeon RX 480 GDDR5 4GB CrossFire VR Ready FinFET DirectX 12 Graphics Card (RX 480 GAMING X 4G)" data-max-rows="3" class="a-size-medium s-inline s-access-title a-text-normal">MSI GAMING Radeon RX 480 GDDR5 4GB CrossFire VR Ready FinFET DirectX 12 Graphics Card (RX 480 GAMING X 4G) 
 
         </h2> 
 
         </a> 
 
        </div> 
 
        <div class="a-row a-spacing-none"> 
 
         <span class="a-size-small a-color-secondary">by </span> 
 
         <span class="a-size-small a-color-secondary">MSI</span> 
 
        </div> 
 
        </div> 
 
        <div class="a-row"> 
 
        <div class="a-column a-span7"> 
 
         <div class="a-row a-spacing-mini"> 
 
         <div class="a-row a-spacing-none"> 
 
          <a class="a-size-small a-link-normal a-text-normal" href="http://rads.stackoverflow.com/amzn/click/B01KIZUF7Y"> 
 
          <span class="a-color-secondary a-text-strike"></span> 
 
          <span class="a-size-base a-color-base">$349.99</span> 
 
          <span class="a-letter-space"></span>(6 used &amp new offers)</a> 
 
         </div> 
 
         </div> 
 
        </div> 
 
        <div class="a-column a-span5 a-span-last"> 
 
         <div class="a-row a-spacing-mini"> 
 
         <span name="B01KIZUF7Y">

Java代碼:

Elements basicLink = doc.select("div.showRightCol") 
         .select("div.leftCol") 
         .select("div.a-row.s-result-list-parent-container") 
         .select("ul.s-result-list.s-col-1.s-col-ws-1.s-result-list-hgrid.s-height-equalized.s-list-view.s-text-condensed") 
         .select("li.s-result-item.celwidget") 
         .select("div.s-item-container") 
         .select("div.a-fixed-left-grid") 
         .select("div.a-fixed-left-grid-inner");//start here to get to everything 

title = basicLink.select("div.a-fixed-left-grid-col.a-col-right") 
        .select("div.a-row.a-spacing-small") 
        .select("div.a-row.a-spacing-none.scx-truncate-medium.sx-line-clamp-3.s-list-title-long") 
        .select("a.a-link-normal.s-access-detail-page.s-color-twister-title-link.a-text-normal") 
        .attr("title"); 

有趣的是,其實我得到這個代碼工作,但由於某種原因停止運行後,我改了一行代碼,但後來恢復到原來的。它應該工作,但我不知道我的線路是否低效或者是否有什麼是錯誤的。謝謝你的時間!

回答

1

元素a有多個類。你有一個點來代替空格:

Element element = doc.select("a.a-link-normal.s-access-detail-page.s-color-twister-title-link.a-text-normal").first(); 
String title = element.attr("title"); 

併爲完整起見,因爲沒有其他元素具有title屬性,你可以這樣做:

Element element = doc.select("[title]").first();