2012-07-17 45 views
0

我使用代理登錄安全站點,並希望能夠將所有文件和文件夾下載到本地光盤。這是我迄今爲止。使用java從網站下載多個文件

編輯 - **目前下面的代碼將從給定的根目錄開始,並下載所有子目錄中的所有文件...很酷:)但它不重複我所需要的文件夾結構。請幫忙嗎? **編輯

首先我拿到了4個參數

1)目錄我想下載 2)安全登錄 3的用戶名的URL)PSW(等都可以在Linux上CMD線使用)我在哪裏想保存我的本地磁盤

 public class ApacheUrl4 
{ 
// this is the entry point for what I want the instase of the class to do 
    public static void main(String args[]) throws Exception { 

     String url = args[0]; 
     final String username = args[1]; 
     final String password1 = args[2]; 
     String directory = args[3]; 

     checkArguments(args); 

     ApacheUrl4 max = new ApacheUrl4(); 
     max.process(url, username, password1, directory); 

    } 
    public void process (String url, String username1, String password1, String directory) throws Exception { 

     final char[] password = password1.toCharArray(); 
     final String username = username1; 
     Authenticator.setDefault(new Authenticator(){ 
       protected PasswordAuthentication getPasswordAuthentication(){ 
       PasswordAuthentication p=new PasswordAuthentication(username , password); 
       return p; 
       } 
      }); 


     BufferedInputStream in = null; 
     BufferedInputStream in2 = null; 
     FileOutputStream fout = null; 
    // proxy 
     String proxyip = "000.000.000" ; 
     int proxyport = 8080; 
     Proxy proxy = new Proxy(Proxy.Type.HTTP, new InetSocketAddress(proxyip, proxyport)); 
    // URL connection to file 
     URL file = new URL(url); 
     URLConnection connection = file.openConnection(proxy);  
     ((HttpURLConnection)connection).getResponseCode(); 
     int reponsecode = ((HttpURLConnection)connection).getResponseCode(); 
     System.out.println("response code " + reponsecode); 


     if (reponsecode == HttpURLConnection.HTTP_FORBIDDEN){ 
      System.out.println("Invalid username or psw"); 
      return; 
     } 
     if (reponsecode != HttpURLConnection.HTTP_OK){ 
      System.out.println("Unable to find response"); 
      return; 
     } 





     //Save the file into the chosen folder 
     in = new BufferedInputStream(connection.getInputStream()); 

     //Create instance of DocumentBuilderFactory 
     DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance(); 
     //Get the DocumentBuilder 
     DocumentBuilder docBuilder = factory.newDocumentBuilder(); 
     //Using existing XML Document 
     Document doc = docBuilder.parse(in); 

     //create the root element 
     Element root = doc.getDocumentElement(); 
     NodeList nodeList = root.getElementsByTagName("li"); 


     for(int i=0; i<nodeList.getLength(); i++){ 
      Node childNode = nodeList.item(i); 
      if (childNode.getTextContent().contains("/")) { 


      // System.out.println(url + childNode.getTextContent()); 
       process(url + childNode.getTextContent(), username, password1, directory);       

     } 

    if (childNode.getTextContent().contains(".") && !childNode.getTextContent().contains("..")) { 


      String textcon = url + childNode.getTextContent(); 
      System.out.println("aaa " + textcon); 

      if (url.endsWith("/")) { 
       System.out.println("ends with a /");  
      } 

      textcon = textcon.replace(" ", "%20"); 
      URL file2 = new URL(textcon); 

      String[] urlparts = textcon.split("/"); 
      int urllength = urlparts.length; 
      String lastarray = urlparts[urllength-2]; 
      System.out.println("last array " + lastarray); 


      URLConnection connection2 = file2.openConnection(proxy);   
      in2 = new BufferedInputStream(connection2.getInputStream()); 
      String test2 = childNode.getTextContent(); 
      System.out.println("eeee " + childNode.getTextContent()); 

      String filename = (directory + test2); 
       File f=new File(filename); 
        if(f.isDirectory()) 
        continue; 





       //InputStream inputStream= new FileInputStream("InputStreamToFile.java"); 
       OutputStream out=new FileOutputStream(f); 
       byte buf[]=new byte[1024]; 
       int len; 
       while((len=in2.read(buf))>0) 
       out.write(buf,0,len); 
       out.close(); 
       in2.close(); 


     } 
    } 
} 




    // this is part of the validation of arguments provided by user 
    private static void checkArguments(String[] args) { 
     while (args.length < 4 || args[0].isEmpty() || args.length > 4) { 
       System.out.println("Please specify five arguments in the following format \n " + 
       " URL USERNAME PASWORD FILEPATH FILENAME " + 
       "EG: \"java helloW http://www.google.com user_name password C:\\path/dir/ filename.exe\" "); 
       System.exit(1); 
     } 
    } 
} 
+0

您讀取的服務器是否允許目錄瀏覽?我的意思是,如果您使用瀏覽器訪問它,您是否看到目錄列表? – 2012-07-17 11:26:49

回答

0

上爲了下載目錄中的文件的文件 4)目錄,你首先需要的目錄列表。這是由服務器自動生成的,如果允許的話。首先,使用瀏覽器檢查這臺服務器是否屬於這種情況。

然後,您將需要解析列表頁面,並下載每個網址。壞消息是這些頁面沒有標準。好消息是,大多數互聯網都託管在apache或IIS上,所以如果你可以管理這兩個,你已經覆蓋了很多部分。

您可能只是將文件解析爲xml(xhtml)並使用xpath恢復所有url。

+0

謝謝Joeri,這幫助我走上了正確的道路。隨着文件的下載,我只需要一點幫助即可創建文件夾。目前下面的代碼將從給定的根目錄開始,並下載所有子目錄中的所有文件......非常酷:)但它不重複我所需要的文件夾結構。請幫忙嗎? – 2012-08-08 15:44:21

+1

只需製作一個像'downloadContent(URL源,文件目標)'的方法。如果在directorylisting中你想要一個子文件夾,則對'downloadContent(source +「/」+ folderName,new File(target,folderName))'執行遞歸調用。 – 2012-08-09 08:43:49