我被困在這個問題上好幾天了,我的眼睛開始因嘗試不同組合的時間而受到傷害,但沒有成功。事情是,我正在製作一個應用程序,它必須從互聯網上獲取數據,解析並顯示給用戶。我已經嘗試了幾種方法來實現這一點,並且使用JSOUP非常有用,特別是解析數據並將結果從數據中提取出來。將Cookie傳遞給GET請求的問題(POST後)
但是,有一個問題我無法解決。我嘗試過使用常規HTTPClient和JSOUP,但是我無法成功獲取所需的數據。這裏是我的代碼(JSOUP版):
public void bht_ht(Context c, int pozivni, int broj) throws IOException {
//this is the first connection, to get the cookies (I have tried the version without this method separate, but it's the same
Connection.Response resCookie = Jsoup.connect("http://www.bhtelecom.ba/imenik_telefon.html")
.method(Method.GET)
.execute();
String sessionId = resCookie.cookie("PHPSESSID");
String fetypo = resCookie.cookie("fe_typo_user");
//these two above are the cookies
//the POST request, with the data asked
Connection.Response res = Jsoup.connect("http://www.bhtelecom.ba/imenik_telefon.html?a=search")
.data("di", some_data)
.data("br", some_data)
.data("btnSearch","Tra%C5%BEi")
.cookie("PHPSESSID", sessionId)
.cookie("fe_typo_user", fetypo)
.method(Method.POST)
.execute();
Document dok = res.parse();
//So, here is the GET request for the site which contains the results, and this site is redirected to with HTTP 302 response after the POSt result
Document doc = Jsoup.connect("http://www.bhtelecom.ba/index.php?id=3226&")
.cookie("PHPSESSID", sessionId)
.cookie("fe_typo_user", fetypo)
.referrer("http://www.bhtelecom.ba/imenik_telefon.html")
.get();
Document doc = res2.parse();
Element elemenat = doc.select("div.boxtexter").get(0);
String ime = elemenat.text();
}
所以,最終的結果將是其中包含返回數據的字符串。但是,無論我嘗試什麼,我都會看到「空白」頁面,它是解析的文本,並且我模擬了瀏覽器請求的所有內容。
這裏是POST和GET瀏覽器捕獲的原始標題: (後)
> POST /imenik_telefon.html?a=search HTTP/1.1 Host: www.bhtelecom.ba
> Content-Length: 56 Cache-Control: max-age=0 Origin:
> http://www.bhtelecom.ba User-Agent: Mozilla/5.0 (Windows NT 6.1;
> WOW64) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/14.0.835.202
> Safari/535.1 Content-Type: application/x-www-form-urlencoded Accept:
> text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
> Referer: http://www.bhtelecom.ba/index.php?id=3226& Accept-Encoding:
> gzip,deflate,sdch Accept-Language: en-US,en;q=0.8 Accept-Charset:
> ISO-8859-1,utf-8;q=0.7,*;q=0.3 Cookie:
> PHPSESSID=opavncj3317uidbt93t9bie980;
> fe_typo_user=332a76d0b1d4944bdbbcd28d63d62d75;
> __utma=206281024.1997742542.1319583563.1319583563.1319588786.2; __utmb=206281024.1.10.1319588786; __utmc=206281024; __utmz=206281024.1319583563.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none)
>
> di=033&br=123456&_uqid=&_cdt=&_hsh=&btnSearch=Tra%C5%BEi
(獲得)
> GET /index.php?id=3226& HTTP/1.1 Host: www.bhtelecom.ba Cache-Control:
> max-age=0 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64)
> AppleWebKit/535.1 (KHTML, like Gecko) Chrome/14.0.835.202 Safari/535.1
> Accept:
> text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
> Referer: http://www.bhtelecom.ba/index.php?id=3226& Accept-Encoding:
> gzip,deflate,sdch Accept-Language: en-US,en;q=0.8 Accept-Charset:
> ISO-8859-1,utf-8;q=0.7,*;q=0.3 Cookie:
> PHPSESSID=opavncj3317uidbt93t9bie980;
> __utma=206281024.1997742542.1319583563.1319583563.1319588786.2; __utmb=206281024.1.10.1319588786; __utmc=206281024; __utmz=206281024.1319583563.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none); fe_typo_user=07745dd2a36a23c64c2297026061a2c2
在此GET,(它的響應),數據I需要位於,但是參數,cookie或我嘗試的所有內容的任意組合,我無法讓它「認爲」我發佈了POST並且現在想要這些數據。
這裏是我的代碼沒有JSOUP解析器的版本,但我無法讓它工作,雖然當我檢查這些cookie時,它們都OK,POST和GET也是如此,但沒有成功。
DefaultHttpClient client = new DefaultHttpClient();
String postURL = "http://www.bhtelecom.ba/imenik_telefon.html?a=search";
HttpPost post = new HttpPost(postURL);
post.getParams().setParameter(CoreProtocolPNames.USE_EXPECT_CONTINUE, Boolean.FALSE);
List<NameValuePair> params = new ArrayList<NameValuePair>();
params.add(new BasicNameValuePair("di", "035"));
params.add(new BasicNameValuePair("br", "819443"));
params.add(new BasicNameValuePair("btnSearch","Tra%C5%BEi"));
UrlEncodedFormEntity ent = new UrlEncodedFormEntity(params,HTTP.UTF_8);
post.setEntity(ent);
HttpResponse responsePOST = client.execute(post);
HttpEntity resEntity = responsePOST.getEntity();
if (resEntity != null) {
//todo
}
//checking for cookies, they are OK
List<Cookie> cookies = client.getCookieStore().getCookies();
if (cookies.isEmpty()) {
Log.d(TAG, "no cookies");
} else {
for (int i = 0; i < cookies.size(); i++) {
Log.d(TAG, "cookies: " + cookies.get(i).toString());
}
}
resEntity.consumeContent();
HttpGet get = new HttpGet("http://www.bhtelecom.ba/index.php?id=3226&");
get.getParams().setParameter(CoreProtocolPNames.USE_EXPECT_CONTINUE, Boolean.FALSE);
HttpResponse responseGET = client.execute(get);
HttpEntity entityGET = responseGET.getEntity();
List<Cookie> cookiesGet = client.getCookieStore().getCookies();
if (cookies.isEmpty()) {
Log.d(TAG, "no cookies");
} else {
for (int i = 0; i < cookiesGet.size(); i++) {
Log.d(TAG, "cookies GET: " + cookiesGet.get(i).toString());
}
}
//a method to check the data, I pass the InputStream to it, and do the operations, I've tried "manually", and passing the InputStream to JSOUP, but without success in either case.
samplemethod(entityGET.getContent());
client.getConnectionManager().shutdown();
} catch (Exception e) {
e.printStackTrace();
}
因此,如果任何人都可以找到自己設置錯誤,不然我找一個方法,使這兩個請求,然後獲取數據,HTTP實體,然後我可以作爲輸入使用(InputStream的)以可愛的JSOUP解析器,這將是驚人的。或者,也許我已經掌握了關於網頁需求的全部內容,而且我需要用不同的參數提出請求,我將不勝感激。我使用Wireshark和Charles Debugging Proxy來創建想法(嘗試這兩種方法,仔細檢查),並且僅找到了會話ID,fe_typo_user和用於跟蹤網站時間等的其他參數,我試過了傳遞給他們,「_ utma」「 _utmb」...等等。
我有一些其他的方法,使用「簡單」,只有POST方法與數據作爲響應,我已經成功地得到了這一點,但這個網站的具體問題讓我瘋狂。在此先感謝您的幫助。