2016-10-04 38 views
1

我想提取使用JSoup的link-Moto X上的產品審查,但它是拋出NullPointerException。另外,我想要點擊評論的「閱讀更多」鏈接後顯示的文字。Jsoup從div的兒童刮文本

import java.io.*; 
import org.jsoup.*; 
import org.jsoup.nodes.*; 
import org.jsoup.select.*; 

public class JSoupEx 
{ 
    public static void main(String[] args) throws IOException 
    { 
     Document doc = Jsoup.connect("https://www.flipkart.com/moto-x-play-with-turbo-charger-white-16-gb/product-reviews/itmefzwvdejejvth?pid=MOBEFM5HAFRNSJJA").get(); 
     Element ele = doc.select("div[class=qwjRop] > div").first(); 
     System.out.println(ele.text()); 
    } 
} 

任何解決方案?

回答

1

如黃瓜建議中,使用開發工具的網絡選項卡中,我們可以看到,接收的審查(JSON格式)的請求作爲響應:

https://www.flipkart.com/api/3/product/reviews?productId=MOBEFM5HAFRNSJJA&count=15&ratings=ALL&reviewerType=ALL&sortOrder=MOST_HELPFUL&start=0 

使用JSON解析器等JSON.simple我們可以提取評論作者,有用性和文本等信息。

示例代碼

String userAgent = "Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.143 Safari/537.36"; 
String reviewApiCall = "https://www.flipkart.com/api/3/product/reviews?productId=MOBEFM5HAFRNSJJA&count=15&ratings=ALL&reviewerType=ALL&sortOrder=MOST_HELPFUL&start="; 
String xUserAgent = userAgent + " FKUA/website/41/website/Desktop"; 
String referer = "https://www.flipkart.com/moto-x-play-with-turbo-charger-white-16-gb/product-reviews/itmefzwvdejejvth?pid=MOBEFM5HAFRNSJJA"; 
String host = "www.flipkart.com"; 
int numberOfPages = 2; // first two pages of results will be fetched 

try { 
    // loop for multiple review pages 
    for (int i = 0; i < numberOfPages; i++) { 
     // query reviews 
     Response response = Jsoup.connect(reviewApiCall+(i*15)).userAgent(userAgent).referrer(referer).timeout(5000) 
       .header("x-user-agent", xUserAgent).header("host", host).ignoreContentType(true).execute(); 

     System.out.println("Response in JSON format:\n\t" + response.body() + "\n"); 

     // parse json response 
     JSONObject jsonObject = (JSONObject) new JSONParser().parse(response.body().toString()); 
     jsonObject = (JSONObject) jsonObject.get("RESPONSE"); 
     JSONArray jsonArray = (JSONArray) jsonObject.get("data"); 

     for (Object object : jsonArray) { 
      jsonObject = (JSONObject) object; 
      jsonObject = (JSONObject) jsonObject.get("value"); 
      System.out.println("Author: " + jsonObject.get("author") + "\thelpful: " 
        + jsonObject.get("helpfulCount") + "\n\t" 
        + jsonObject.get("text").toString().replace("\n", "\n\t") + "\n"); 
     } 
    } 
} catch (Exception e) { 
    e.printStackTrace(); 
} 

輸出

Response in JSON format: 
    {"CACHE_INVALIDATION_TTL":"132568825671","REQUEST":null,"REQUEST-ID": [...] } 

Author: Flipkart Customer helpful: 140 
    A great phone at an affordable price with 
    -an outstanding camera 
    -great battery life 
    -an excellent display 
    -premium looks 
    the flipkart delivery was also fast and perfect. 

Author: Vaibhav Yadav helpful: 518 
    I m writing this review after using 2 months.. 
    First of all ..I must say this is one of the best product ..camera quality is best in natural lights or daytime..but in low light and in the night..camera quality is not so good but it's ok.. 
    It has good battery backup ..last one day on 3g usage ..while using 4g ..it lasts for about 10-12 hour.. 
    Turbo charges is good..although ..my charger is not working.. 
    Only problem in this phone is ..while charging..this phone heats a lot..this may b becoz of turbo charger..if u r using other charger than it does not heat.. 

Author: KAPIL CHOPRA helpful: 9 
[...] 

注意:輸出截斷([...])

+0

它的工作!謝謝:) – Abhishek

+0

我需要得到其他網頁上的所有評論。我嘗試修改引用字符串以訪問頁面2(參數'頁面')上的評論,但它仍在第一頁上顯示評論。有什麼建議麼? – Abhishek

+0

添加開始參數並將值設置爲15的倍數,因此第二頁上的評論如下:https://www.flipkart.com/api/3/product/reviews?productId=MOBEFM5HAFRNSJJA&count=15&ratings=ALL&reviewerType= ALL&sortOrder = MOST_HELPFUL&start = 15 –

1

JSoup只能解析HTML,不能運行JavaScript,但是您要查找的內容是通過JavaScript添加到頁面的,Jsoup並不知道這些內容。

你需要像硒這樣的東西來獲得你要找的東西,然而對於你想解析的這個特定的網站,它的網絡活動的一個快速分析告訴你所有你要查找的內容是從後端獲取的API調用,您可以使用它並使內容更容易訪問,而無需使用Jsoup。

+0

的Flipkart.com具有以提取API來自後端的產品,但它沒有檢索產品評論的方法。任何其他可用於取回評論而不使用硒的替代品? – Abhishek

+0

只需使用chrome devtool進行網絡分析! – gherkin

+1

我無法找出請求,但我現在可以。感謝您的幫助:) – Abhishek