在問這個問題之前,我嘗試了幾種不同的方法,當然嘗試了一些方向/答案的谷歌搜索。我已經通過StackOverflow檢查,似乎無法找到解決方案。Java使用xpath與谷歌
基本上,我想創建一個工具,返回基於URL和XPath例如
URL: http://www.google.co.uk/search?q=wicked+games
XPath: id('rso')/li/div/h3/a
應該返回這些結果
我可以解析XML精細數據從其他網址的例如,如果我要抓住一個確切的XML文件,如http://renualsoft.com/jordon/person.xml但我不確定我會如何做到這一點谷歌?
我想這
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setNamespaceAware(true);
DocumentBuilder builder;
Document doc = null;
XPathExpression expr = null;
builder = factory.newDocumentBuilder();
doc = builder.parse("http://www.google.co.uk/search?q=wicked+games");
XPathFactory xFactory = XPathFactory.newInstance();
XPath xpath = xFactory.newXPath();
expr = xpath.compile("id('rso')/li/div/h3/a/@href");
Object result = expr.evaluate(doc, XPathConstants.NODESET);
NodeList nodes = (NodeList) result;
for (int i = 0; i < nodes.getLength(); i++) {
System.out.println(nodes.item(i).getNodeValue());
}
但是我得到這個例外
Exception in thread "main" java.io.IOException: Server returned HTTP response code: 403 for URL: http://www.google.co.uk/search?q=wicked+games
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1625)
at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.setupCurrentEntity(XMLEntityManager.java:633)
at com.sun.org.apache.xerces.internal.impl.XMLVersionDetector.determineDocVersion(XMLVersionDetector.java:189)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:799)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:764)
at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:123)
at com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(DOMParser.java:237)
at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:300)
at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:177)
at NewEmptyJUnitTest.query(NewEmptyJUnitTest.java:35)
at NewEmptyJUnitTest.main(NewEmptyJUnitTest.java:77)
Java Result: 1
任何幫助或指導將是巨大的感謝,我曾嘗試在其他地方尋找答案,但就像我說我不能」找到有用的東西。
我只注意到一個有趣的標籤說明。查看谷歌標籤。 – keyser
發生這種情況是因爲未設置用戶代理。 Google也不希望你以這種方式獲取他們的搜索結果。它反對他們的TOS。使用谷歌搜索API更好的更清潔的方式來搜索 –
@keyser y。好的發現;) –