2016-11-13 71 views
1

我不能完全肯定它如何詞組這個問題,或者使這裏標題不言而喻。我使用jsoup來解析網頁(http://champion.gg/statistics/),我試圖抓住使用此代碼從表中的統計數據。解析PHP數據與jsoup

public void connect(String url) { 
    try { 
     Document doc = Jsoup.connect(url).userAgent("Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/33.0.1750.152 Safari/537.36").get(); 
     System.out.println(doc.toString()); 
     Element table = doc.select("table[class=table table-striped]").first(); 
     Element tbody = table.select("tbody").first(); 
     Iterator<Element> rows = tbody.select("tr").iterator(); 
     rows.forEachRemaining(row -> { 
      System.out.println(row.toString()); 
     }); 
    } catch(IOException exception) { 
     if(Settings.DEBUG) { 
      Program.LOGGER.log(Level.SEVERE, "There was an error reading the document with the supplied URL!", exception); 
     } 
     Program.alert("Error loading webpage!"); 
    } 
} 

,它是產生這一結果

<tr ng-repeat="champion in filteredChampions = (championData | startsWith:search.title | filter:roleSort | orderBy:[order+sortExpression.sortBy,order+sortExpression.lastSortBy])"> 
<td class="rank">{{indexNumber($index, filteredChampions.length)}}</td> 
<td ng-class="{'selected-column':determineSelected('title')}"> <a href="/champion/{{champion.key}}/{{champion.role}}"> 
    <div class="tsm-tooltip tsm-angular-champion-tt" data-type="champions" data-name="{{champion.key}}" data-id="{{matchupData}}"> 
    <div class="matchup-champion {{champion.key}}"></div> 
    <span class="stat-champ-title">{{champion.title}}</span> 
    </div> </a> </td> 
<td class="stats-role-title" ng-class="{'selected-column':determineSelected('role')}">{{champion.role}}</td> 
<td ng-class="{'selected-column':determineSelected('winPercent')}"> <span ng-class="{'top-half': (champion.general.winPercent >= 50), 'bottom-half': (champion.general.winPercent < 50)}">{{champion.general.winPercent}}%</span> </td> 
<td ng-class="{'selected-column':determineSelected('playPercent')}">{{champion.general.playPercent}}%</td> 
<td ng-class="{'selected-column':determineSelected('banRate')}">{{champion.general.banRate}}%</td> 
<td ng-class="{'selected-column':determineSelected('experience')}">{{champion.general.experience}}</td> 
<td ng-class="{'selected-column':determineSelected('kills')}">{{champion.general.kills}}</td> 
<td ng-class="{'selected-column':determineSelected('deaths')}">{{champion.general.deaths}}</td> 
<td ng-class="{'selected-column':determineSelected('assists')}">{{champion.general.assists}}</td> 
<td ng-class="{'selected-column':determineSelected('largestKillingSpree')}">{{champion.general.largestKillingSpree}}</td> 
<td ng-class="{'selected-column':determineSelected('totalDamageDealtToChampions')}">{{champion.general.totalDamageDealtToChampions}}</td> 
<td ng-class="{'selected-column':determineSelected('totalDamageTaken')}">{{champion.general.totalDamageTaken}}</td> 
<td ng-class="{'selected-column':determineSelected('totalHeal')}">{{champion.general.totalHeal}}</td> 
<td ng-class="{'selected-column':determineSelected('minionsKilled')}">{{champion.general.minionsKilled}}</td> 
<td ng-class="{'selected-column':determineSelected('neutralMinionsKilledEnemyJungle')}">{{champion.general.neutralMinionsKilledEnemyJungle}}</td> 
<td ng-class="{'selected-column':determineSelected('neutralMinionsKilledTeamJungle')}">{{champion.general.neutralMinionsKilledTeamJungle}}</td> 
<td ng-class="{'selected-column':determineSelected('goldEarned')}">{{champion.general.goldEarned}}</td> 
<td ng-class="{'selected-column':determineSelected('overallPosition')}">{{champion.general.overallPosition}}</td> 
<td ng-class="{'selected-column':determineSelected('overallPositionChange')}"><span class="glyphicon" ng-class="{'glyphicon-arrow-up': (champion.general.overallPositionChange > 0), 'glyphicon-arrow-down': (champion.general.overallPositionChange < 0), 'same-position': (champion.general.overallPositionChange === 0)}">{{Math.abs(champion.general.overallPositionChange)}}</span></td> 
</tr> 

現在不是產生結果的平均量殺死特定的冠軍已經它會說champion.general.kills的結果,我得到。如何解析頁面,以便代替champion.general.kills它會給出一個實際的結果,如8?

+0

它看起來像網站使用角度注入在視圖中的統計信息。也許[這個答案](http://stackoverflow.com/questions/14904776/parse-javascript-with-jsoup)可以幫助你。 –

回答

0

當涉及到數據提取出來的網頁,你必須去的地方的數據。在這種情況下,數據仍在網頁中,這很好。您需要獲取包含數據的腳本標記並解析該標記。現在,此示例代碼假定它是在指數腳本標籤11

public static void main(String[] args) 
{ 
    try 
    { 
     Document doc = Jsoup 
       .connect("http://champion.gg/statistics/") 
       .userAgent(
         "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/33.0.1750.152 Safari/537.36") 
       .get(); 
     System.out.println(doc.toString()); 
     Elements table = doc.select("script"); 
     Element script = table.get(11); 
     parseText(script); 
    } 
    catch (IOException exception) 
    { 

    } 
} 

public static void parseText(Element script) 
{ 
    String text = ((DataNode) script.childNode(0)).toString().trim(); 
    int index = text.indexOf("_id"); 
    while (index > 0) 
    { 
     index += 6;// Beginning of value 
     int endQuote = text.indexOf("\"", index); 
     String id = text.substring(index, endQuote); 
     index = text.indexOf("\"key\":\"", endQuote); 
     endQuote = text.indexOf("\"", index + 8); 
     String key = text.substring(index, endQuote); 
     index = text.indexOf("\"kills\":", endQuote); 
     endQuote = text.indexOf(",", index); 
     String kills = text.substring(index, endQuote); 
     text = text.substring(endQuote); 
     index = text.indexOf("_id", index); 
     System.out.println(id + key + kills); 
    } 
} 

輸出:

5812965753fa9743395ee93a 「關鍵」: 「厄加特」 殺死 「:6.47

5812965753fa9743395ee93b」 重點「: 「Aatrox」 殺死 「:5.8

5812965753fa9743395ee93d」 關鍵 「:」 Galio 「殺死」:4.58

5812965753fa9743395ee940 「關鍵」: 「Kled」 殺死「:7.3 ...

+0

雖然這對於20位冠軍來說是有效的(我誠實地說)並不完全理解你的代碼,但是我可以理解選擇腳本,但爲什麼你必須使用* .get(11); *這是幹什麼的?在此期間,我將嘗試自行研究,我也不明白你在使用什麼子字符串,不應該有更簡單的方法來讀取腳本中的數據嗎?它看起來像JSON,我希望我可以更容易地閱讀數據,因爲它看起來像腳本內的對象。非常感謝您的幫助! – Metorrite

+0

.get(11)獲取頁面上的第十二個腳本標記。之前有11個其他腳本標籤。可能有一種更簡單的方法,但是我對JSON不太瞭解,並且我採取了低級別的策略。 – ProgrammersBlock

0

我發現ProgrammersBlock的幫助答案。通過回顧腳本數據,我將它從JSON轉換爲完整的Java對象!

package com.databot.web.parser; 

import java.io.IOException; 
import java.io.StringReader; 
import java.util.ArrayList; 
import java.util.List; 
import java.util.logging.Level; 

import org.jsoup.Jsoup; 
import org.jsoup.nodes.DataNode; 
import org.jsoup.nodes.Document; 
import org.jsoup.nodes.Element; 
import org.jsoup.select.Elements; 

import com.databot.Program; 
import com.databot.Settings; 
import com.databot.champions.ChampionStats; 
import com.databot.champions.Champion; 
import com.google.gson.stream.JsonReader; 

public class WebParser { 

public void connect(String url) { 
    try { 
     Document doc = Jsoup.connect(url).userAgent("Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/33.0.1750.152 Safari/537.36").get(); 
     Elements table = doc.select("script"); 
     Element script = table.get(11); 
     parseText(script); 
    } catch(IOException exception) { 
     if(Settings.DEBUG) { 
      Program.LOGGER.log(Level.SEVERE, "There was an error reading the document with the supplied URL!", exception); 
     } 
     Program.alert("Error loading webpage!"); 
    } 
} 

public void parseText(Element script) 
{ 
    String text = ((DataNode) script.childNode(0)).toString().substring(22).trim(); 
    System.out.println(text); 
    List<Champion> champions = new ArrayList<>(); 
    try { 
     JsonReader reader = new JsonReader(new StringReader(text)); 
     reader.setLenient(true); 
     reader.beginArray(); 
     while(reader.hasNext()) { 
      reader.beginObject(); 
       String id = "", key = "", role = "", title = ""; 
       ChampionStats stats = new ChampionStats(0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0, 0); 
      while(reader.hasNext()) { 
       String name = reader.nextName(); 
       if(name.equalsIgnoreCase("_id")) { 
        id = reader.nextString(); 
       } else if(name.equalsIgnoreCase("key")) { 
        key = reader.nextString(); 
       } else if(name.equalsIgnoreCase("role")) { 
        role = reader.nextString(); 
       } else if(name.equalsIgnoreCase("title")) { 
        title = reader.nextString(); 
       } else if(name.equalsIgnoreCase("general")) { 
        double winPercent = 0, playPercent = 0, banRate = 0, experience = 0, kills = 0, deaths = 0, assists = 0, totalDamageDealtToChampions = 0, totalDamageTaken = 0, totalHeal = 0, largestKillingSpree = 0, minionsKilled = 0, neutralMinionsKilledTeamJungle = 0, neutralMinionsKilledEnemyJungle = 0, goldEarned = 0; 
        int overallPosition = 0, overallPositionChange = 0; 
         reader.beginObject(); 
         while(reader.hasNext()) { 
          String gName = reader.nextName(); 
          if(gName.equalsIgnoreCase("winPercent")) { 
           winPercent = reader.nextDouble(); 
          } else if(gName.equalsIgnoreCase("playPercent")) { 
           playPercent = reader.nextDouble(); 
          } else if(gName.equalsIgnoreCase("banRate")) { 
           banRate = reader.nextDouble(); 
          } else if(gName.equalsIgnoreCase("experience")) { 
           experience = reader.nextDouble(); 
          } else if(gName.equalsIgnoreCase("kills")) { 
           kills = reader.nextDouble(); 
          } else if(gName.equalsIgnoreCase("deaths")) { 
           deaths = reader.nextDouble(); 
          } else if(gName.equalsIgnoreCase("assists")) { 
           assists = reader.nextDouble(); 
          } else if(gName.equalsIgnoreCase("totalDamageDealtToChampions")) { 
           totalDamageDealtToChampions = reader.nextDouble(); 
          } else if(gName.equalsIgnoreCase("totalDamageTaken")) { 
           totalDamageTaken = reader.nextDouble(); 
          } else if(gName.equalsIgnoreCase("totalHeal")) { 
           totalHeal = reader.nextDouble(); 
          } else if(gName.equalsIgnoreCase("largestKillingSpree")) { 
           largestKillingSpree = reader.nextDouble(); 
          } else if(gName.equalsIgnoreCase("minionsKilled")) { 
           minionsKilled = reader.nextDouble(); 
          } else if(gName.equalsIgnoreCase("neutralMinionsKilledTeamJungle")) { 
           neutralMinionsKilledTeamJungle = reader.nextDouble(); 
          } else if(gName.equalsIgnoreCase("neutralMinionsKilledEnemyJungle")) { 
           neutralMinionsKilledEnemyJungle = reader.nextDouble(); 
          } else if(gName.equalsIgnoreCase("goldEarned")) { 
           goldEarned = reader.nextDouble(); 
          } else if(gName.equalsIgnoreCase("overallPosition")) { 
           overallPosition = reader.nextInt(); 
          } else if(gName.equalsIgnoreCase("overallPositionChange")) { 
           overallPositionChange = reader.nextInt(); 
          } else { 
           reader.skipValue(); 
          } 
         } 
         reader.endObject(); 
         stats = new ChampionStats(winPercent, playPercent, banRate, experience, kills, deaths, assists, totalDamageDealtToChampions, totalDamageTaken, totalHeal, largestKillingSpree, minionsKilled, neutralMinionsKilledTeamJungle, neutralMinionsKilledEnemyJungle, goldEarned, overallPosition, overallPositionChange); 
       } else { 
        reader.skipValue(); 
       } 
      } 
      reader.endObject(); 
      champions.add(new Champion(id, key, role, title, stats)); 
     } 
     reader.endArray(); 
     reader.close(); 
    } catch (Exception e) { 
     Program.alert("Error reading JSON data!"); 
     e.printStackTrace(); 
    } 
    champions.forEach(champion -> { 
     System.out.println(champion.toString()); 
    }); 
} 
} 

這是我的全WebParser類,如果有人有興趣,我確定有一個更好的方法或寫這更有效的方式,但是這是爲我工作,截至目前!