從沒有API的網站獲取數據？

我想對房地產從該網站自動數據：從沒有API的網站獲取數據？

LINK

然而，他們沒有一個API。你通常會怎麼做？我很慶幸每一個迴應！

來源

2013-03-14 user2051347

你要使用的搜索詞是「網絡抓取」可以解析的頁面數據。 – 2013-03-14 07:27:16

看看這個http://stackoverflow.com/questions/2861/options-for-html-scraping – 2013-03-14 07:27:44

如果我使用這樣的軟件包，服務器是否會受到限制？ – user2051347 2013-03-14 07:28:39

你將不得不自己下載頁面，並自己解析所有信息。

你可能想看看Pattern類，看看一些regex，和URL和String類將是非常有用的。

您可以隨時下載一個html庫以使其更容易。可能是類似http://htmlparser.sourceforge.net/。

非常一般的問題很明顯，我不能提供相關的代碼，但這被稱爲刮。

來源

2013-03-14 07:28:45 Austin

我必須下載它或有任何方法只是發送http請求？ – user2051347 2013-03-14 07:30:58

@ user2051347您可以請求任何您想要的信息，但它不會奇蹟般地出現在您的數據中。我不確定你在問什麼。 – Austin 2013-03-14 07:31:35

我的意思是，我只是發送和HTTP請求，並返回HTML頁面，只是在代碼中搜索關鍵字，而沒有真正下載頁面。 – user2051347 2013-03-14 07:33:54

嗯，這是你如何從頁面的所有內容

那麼，只要你想

package farzi; 

import java.io.BufferedReader; 
import java.io.IOException; 
import java.io.InputStreamReader; 
import java.net.URISyntaxException; 

import org.apache.http.HttpException; 
import org.apache.http.HttpResponse; 
import org.apache.http.client.HttpClient; 
import org.apache.http.client.methods.HttpPost; 
import org.apache.http.impl.client.DefaultHttpClient; 

public class GetXMLTask 
{ 
    public static void main(String args[]) 
    { 
     try 
     { 
      HttpClient httpClient = new DefaultHttpClient(); 
      HttpPost httpPost = new HttpPost("http://derstandard.at/anzeiger/immoweb/Suchergebnis.aspx?Regionen=9&Bezirke=&Arten=&AngebotTyp=&timestamp=1363245585829"); 
      HttpResponse response; 
      StringBuilder builder= new StringBuilder(); 
      response = httpClient.execute(httpPost); 
      System.out.println(response.toString()); 
      BufferedReader in = new BufferedReader(new InputStreamReader(response.getEntity().getContent(), "UTF-8")); 
      char[] buf = new char[1000]; 
      int l = 0; 
       while (l >= 0) 
       { 
        builder.append(buf, 0, l); 
        l = in.read(buf); 
       } 
       System.out.println(builder.toString()); 
     } 
     catch (URISyntaxException e) { 
      System.out.println("URISyntaxException :"+e); 
      e.printStackTrace(); 
     } 
     catch (HttpException e) { 
      System.out.println("HttpException :"+e); 
      e.printStackTrace(); 
     } 
     catch (InterruptedException e) { 
      System.out.println("InterruptedException :"+e); 
      e.printStackTrace(); 
     } catch (IOException e) { 
      System.out.println("IOException :"+e); 
      e.printStackTrace(); 
     } 
    } 
}

來源

2013-03-14 08:42:48

從沒有API的網站獲取數據？

回答

相關問題