我試圖從web site retrive一些數據。HTTPclient POST與有問題的網站
我寫了一個java類,它似乎很多網站工作得很好,但它不適用於這個特殊的網站,它使用廣泛的JavaScript在輸入法。
正如您從代碼中所看到的那樣,我指定了從HTML源代碼獲取名稱的輸入字段,但是也許該網站不接受此類POST請求?
如何模擬用戶交互來檢索生成的HTML?
package com.transport.urlRetriver;
import java.io.BufferedReader;
import java.io.BufferedWriter;
import java.io.FileWriter;
import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.util.ArrayList;
import org.apache.http.HttpEntity;
import org.apache.http.HttpResponse;
import org.apache.http.NameValuePair;
import org.apache.http.client.entity.UrlEncodedFormEntity;
import org.apache.http.client.methods.HttpPost;
import org.apache.http.impl.client.DefaultHttpClient;
import org.apache.http.message.BasicNameValuePair;
public class UrlRetriver {
String stationPoller (String url, ArrayList<NameValuePair> params) {
HttpPost postRequest;
HttpResponse response;
HttpEntity entity;
String result = null;
DefaultHttpClient httpClient = new DefaultHttpClient();
try {
postRequest = new HttpPost(url);
postRequest.setEntity((HttpEntity) new UrlEncodedFormEntity(params));
response = httpClient.execute(postRequest);
entity = response.getEntity();
if(entity != null){
InputStream inputStream = entity.getContent();
result = convertStreamToString(inputStream);
}
} catch (Exception e) {
result = "We had a problem";
} finally {
httpClient.getConnectionManager().shutdown();
}
return result;
}
void ATMtravelPoller() {
ArrayList<NameValuePair> params = new ArrayList<NameValuePair>(2);
String url = "http://www.atm-mi.it/it/Pagine/default.aspx";
params.add(new BasicNameValuePair("ctl00$SPWebPartManager1$g_afa5adbb_5b60_4e50_8da2_212a1d36e49c$txt_address_s", "Viale romagna 1"));
params.add(new BasicNameValuePair("ctl00$SPWebPartManager1$g_afa5adbb_5b60_4e50_8da2_212a1d36e49c$txt_address_e", "Viale Toscana 20"));
params.add(new BasicNameValuePair("sf_method", "POST"));
String result = stationPoller(url, params);
saveToFile(result, "/home/rachele/Documents/atm/out4.html");
}
static void saveToFile(String toFile, String pos){
try{
// Create file
FileWriter fstream = new FileWriter(pos);
BufferedWriter out = new BufferedWriter(fstream);
out.write(toFile);
//Close the output stream
out.close();
}catch (Exception e){//Catch exception if any
System.err.println("Error: " + e.getMessage());
}
}
private static String convertStreamToString(InputStream is) {
BufferedReader reader = new BufferedReader(new InputStreamReader(is));
StringBuilder stringBuilder = new StringBuilder();
String line = null;
try {
while ((line = reader.readLine()) != null) {
stringBuilder.append(line + "\n");
}
} catch (IOException e) {
e.printStackTrace();
} finally {
try {
is.close();
} catch (IOException e) {
e.printStackTrace();
}
}
return stringBuilder.toString();
}
}
這不是一個答案,而是描述發生了什麼。您需要提交大約30個參數,並且動態生成一些參數名稱/值以防止通過腳本或程序獲取內容。每次獲取內容時,您都會對參數名稱進行硬編碼。這些參數不會相同。 – gigadot
不是你的JavaScript的東西(因此評論)的答案,但...請注意,對於很多網站,你需要從Java僞造你的「用戶代理」,否則你不會得到真正的網站。在那裏,這樣做,你**必須**僞造用戶代理;) – SyntaxT3rr0r
對於這個網站,你是否發送用戶代理也沒有什麼不同。我通過從我的Firefox中篩選出用戶代理標題來測試它,結果沒有什麼不同。 – gigadot