2016-09-13 33 views
0

打開連接時,如何找出使用的最佳URL格式?使用URL.openConnection()時,處理URL變化(如「www」和「https」)的最佳方法是什麼?

許多網站返回基於URL是否使用「WWW」和/或「https」不同的結果。

例如,這裏有一個測試,我寫看到一些不同的結果:

import java.util.Scanner; 
import java.util.ArrayList; 
import java.net.*; 
import java.io.*; 

public class Test { 

    public static void main(String[] args) 
    { 
     String baseURL = "google.com"; 

     try 
     { 
     java.net.URL url = new java.net.URL("http://" + baseURL); 
     java.net.URLConnection connection = url.openConnection(); 
     connection.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36"); 
     BufferedReader in = new BufferedReader(new InputStreamReader(connection.getInputStream())); 

     String line; 
     int lineCount = 0; 

     while ((line = in.readLine()) != null) 
     { 
      lineCount++; 
     } 

     System.out.println("http://" + baseURL + " = " + lineCount + " lines"); 
     } 

     catch (Exception ex) 
     { 
     System.out.println("http://" + baseURL + " throws an error"); 
     } 



     try 
     { 
     java.net.URL url = new java.net.URL("http://www." + baseURL); 
     java.net.URLConnection connection = url.openConnection(); 
     connection.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36"); 
     BufferedReader in = new BufferedReader(new InputStreamReader(connection.getInputStream())); 

     String line; 
     int lineCount = 0; 

     while ((line = in.readLine()) != null) 
     { 
      lineCount++; 
     } 

     System.out.println("http://www." + baseURL + " = " + lineCount + " lines"); 
     } 

     catch(Exception ex) 
     { 
     System.out.println("http://www." + baseURL + " throws an error"); 
     } 







     try 
     { 
     java.net.URL url = new java.net.URL("https://" + baseURL); 
     java.net.URLConnection connection = url.openConnection(); 
     connection.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36"); 
     BufferedReader in = new BufferedReader(new InputStreamReader(connection.getInputStream())); 

     String line; 
     int lineCount = 0; 

     while ((line = in.readLine()) != null) 
     { 
      lineCount++; 
     } 

     System.out.println("https://" + baseURL + " = " + lineCount + " lines"); 
     } 

     catch (Exception ex) 
     { 
     System.out.println("https://" + baseURL + " throws an error"); 
     } 



     try 
     { 
     java.net.URL url = new java.net.URL("https://www." + baseURL); 
     java.net.URLConnection connection = url.openConnection(); 
     connection.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36"); 
     BufferedReader in = new BufferedReader(new InputStreamReader(connection.getInputStream())); 

     String line; 
     int lineCount = 0; 

     while ((line = in.readLine()) != null) 
     { 
      lineCount++; 
     } 

     System.out.println("https://www." + baseURL + " = " + lineCount + " lines"); 
     } 

     catch (Exception ex) 
     { 
     System.out.println("https://www." + baseURL + " throws an error"); 
     } 
    } 
} 

這裏是4個不同網站運行它的結果:

http://stackoverflow.com = 4205 lines 
http://www.stackoverflow.com = 4205 lines 
https://stackoverflow.com = 4205 lines 
https://www.stackoverflow.com = 2 lines 

 

http://qvc.com = 2438 lines 
http://www.qvc.com = 2438 lines 
https://qvc.com throws an error 
https://www.qvc.com = 0 lines 

 

http://facebook.com = 0 lines 
http://www.facebook.com = 0 lines 
https://facebook.com = 25 lines 
https://www.facebook.com = 25 lines 

 

http://google.com = 6 lines 
http://www.google.com = 6 lines 
https://google.com = 343 lines 
https://www.google.com = 343 lines 

給定一個基礎URL,如「google.com」,有什麼檢查,看看我應該使用的網站,格式的正確方法是什麼?

+0

據推測,在http答覆是重定向到安全的HTTPS協議。 –

+0

檢查響應碼。如果你得到一個重定向,那麼你可能使用了錯誤的格式。例如'www.stackoverflow.com'將發佈301重定向到'stackoverflow.com'。 –

+0

@MarcB - 是的,我覺得它會是這樣的。你能把它作爲答案發布嗎? – Pikamander2

回答

0

讀馬克·B的回答,其他幾個StackOverflow的線程(這是我在原來的問題的評論鏈接),和this guide後,這裏就是我想出了:

String baseURL = "google.com"; 

try 
{ 
    java.net.URL url = new java.net.URL("http://" + baseURL); 
    java.net.HttpURLConnection connection = (HttpURLConnection)url.openConnection(); 
    connection.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36"); 

    int response = connection.getResponseCode(); 
    System.out.println("Response code: " + response); 

    if (response == 301 || response == 302 || response == 303) 
    { 
      System.out.println("Redirect location: " + connection.getHeaderField("Location")); 
    } 

    else 
    { 
      BufferedReader in = new BufferedReader(new InputStreamReader(connection.getInputStream())); 

      String line; 
      int lineCount = 0; 

      while ((line = in.readLine()) != null) 
      { 
      lineCount++; 
      } 

      System.out.println("http://" + baseURL + " = " + lineCount + " lines\n"); 
    } 
} 

catch (Exception ex) 
{ 
    System.out.println("http://" + baseURL + " throws an error\n"); 
} 

它輸出這樣的:

Response code: 302 
Redirect location: https://www.google.com/?gws_rd=ssl 

您也可以使用HttpURLConnection.HTTP_MOVED_TEMPHttpURLConnection.HTTP_MOVED_PERMHttpURLConnection.HTTP_SEE_OTHER代替數字響應代碼。實際上,這可能是一種更好的做法。

1

檢查HTTP響應代碼。如果您獲得重定向,那麼您可能使用了錯誤的格式。例如http://www.stackoverflow.com會做301重定向到只有http://stackoverflow.com

+1

沒有辦法告訴請求遵循重定向嗎? –

+0

也許,但我不做java,所以不知道它會是什麼選項。 –

+0

'難道沒有辦法告訴遵循重定向請求?'鑄的URLConnection到[HttpURLConnect(https://docs.oracle.com/javase/7/docs/api/java/net/HttpURLConnection。 HTML)和調用'setFollowRedirects(真);' – copeg

相關問題