2013-01-21 104 views
0

我想從本網站收集HTML http://movies.about.com/od/actorsalphalist/Actors_Detailed_Movie_News_Interviews_Websites.htmJava屏幕使用套接字刮擦?

我打開一個套接字並嘗試讀取並打印HTML頁面的每一行。當我運行它時,我只會得到「EOF爲假」,然後結果爲「1」。

我不確定什麼是錯誤的,因爲我知道這應該在另一個例子中工作......非常感謝您的幫助!

import java.net.*; 
import java.io.*; 
import java.util.*; 

public class Twitter { 

static final int DEFAULT_PORT = 80; 

protected DataInputStream reply = null; 
protected PrintStream send = null; 
protected Socket sock = null; 

// *********************************************************** 
// *** The constructors create the socket and set up the input 
// *** and output channels on that socket. 

public Twitter() throws UnknownHostException, IOException { 
    this(DEFAULT_PORT); 
} 

public Twitter(int port) throws UnknownHostException, IOException { 
    sock = new Socket("movies.about.com", port); 
    System.out.println(sock); 
    reply = new DataInputStream(sock.getInputStream()); 
    System.out.println(); 
    send = new PrintStream(sock.getOutputStream()); 
} 

// *********************************************************** 
// *** forecast uses the socket that has already been created 
// *** to carry on a conversation with the Web server that it 
// *** has been contacted through the socket. 

public void forecast() { 
    int i; 
    String HTMLline; 
    boolean eof, gotone; 

    // *** This issues the same query that a Web browser would issue 
    // *** to the Web server. 

    try { 
     send.println("GET /od/actorsalphalist/Actors_Detailed_Movie_News_Interviews_Websites.htm HTTP/1.1"); 
    } catch (Exception e) { 
     System.out.println("about.com server is down."); 
    } 

    // *** This section parses the response from the Web server. 
    // *** NOTE THAT "real" EOF does not occur until the Web server 
    // *** has closed the connection. 

    eof = false; 
    gotone = false; 
    while (!eof) { 
     System.out.println("EOF is false"); 
     try { 
      System.out.println("1"); 
      HTMLline = reply.readLine(); 
      System.out.println("2"); 
      System.out.println(HTMLline); 
      System.out.println("Here?"); 
      if (HTMLline != null) { 
       System.out.println("its not null"); 
      } 
      if (HTMLline == null) { 
       System.out.println("WTFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF"); 
      } else { 
       eof = true; 
       System.out.println("is it?"); 
      } 
     } catch (Exception e) { 
      System.out.println("this exception happend"); 
      e.printStackTrace(); 
      eof = true; 
     } 
    } 
} 

// *********************************************************** 
// *** We need to close the socket when this class is destroyed. 

protected void finalize() throws Throwable { 
    sock.close(); 
} 

// *********************************************************** 
// *** The main program creates a new Twitter class and 
// *** sends that class the command line args (via findNumber). 

public static void main(String[] args) { 
    Twitter aboutCom; 
    DataInputStream cin = new DataInputStream(System.in); 

    try { 
     aboutCom = new Twitter(); 
     aboutCom.forecast(); 
    } catch (Exception e) { 
     e.printStackTrace(); 
    } 
} 
} 

回答

1

您還沒有發送有效的HTTP請求,所以服務器仍在等待您完成它。 GET行必須以\ r \ n結尾,然後您需要另一行作爲空行來分隔請求標頭。

但是,你應該使用URL,openConnection(),getInputStream()等等,而不是多餘嘗試重新實現HTTP自己。正如你正在做的那樣,你所有的方式都是錯誤的機會。

+0

啊好的這是一個使用套接字的任務! –

+0

@RussellFyfe好吧,用write()代替println(),發送行終止符爲\ r \ n,併發送兩次。 – EJP