2013-10-16 61 views
0

我試圖獲得一個url的源代碼,但它給了我一個錯誤。請問你能幫幫我嗎?如何解析這些URL

curl -v http://www.segundamano.es/anuncios-madrid/ -m 10* About to connect() to www.segundamano.es port 80 (#0) 
* Trying 195.77.179.69... 
* Connected to www.segundamano.es (195.77.179.69) port 80 (#0) 
> GET /anuncios-madrid/ HTTP/1.1 
> User-Agent: curl/7.29.0 
> Host: www.segundamano.es 
> Accept: */* 
> 
* Empty reply from server 
* Connection #0 to host www.segundamano.es left intact 
curl: (52) Empty reply from server 

非常感謝和抱歉我的英語!

回答

1

看起來像這個域主動阻止curl(和wget)請求,如果你傳遞一個瀏覽器的UserAgent,看起來你可以解決這個問題(curl和wget使用用戶代理的相同命令行參數)。例如:

這不起作用:

C:\>wget http://www.segundamano.es/anuncios-madrid/ 
    SYSTEM_WGETRC = c:/progra~1/wget/etc/wgetrc 
    syswgetrc = C:\Program Files (x86)\GnuWin32/etc/wgetrc 
    --2013-10-16 10:06:13-- http://www.segundamano.es/anuncios-madrid/ 
    Resolving www.segundamano.es... 195.77.179.69, 213.4.96.70 
    Connecting to www.segundamano.es|195.77.179.69|:80... connected. 
    HTTP request sent, awaiting response... 502 Bad Gateway 
    2013-10-16 10:06:15 ERROR 502: Bad Gateway. 

但這:

C:\>wget --user-agent="Mozilla/5.0 (Windows NT 5.2; rv:2.0.1) Gecko/20100101 Firefox/4.0.1" http://www.segundamano.es/anuncios-madrid/ 
    SYSTEM_WGETRC = c:/progra~1/wget/etc/wgetrc 
    syswgetrc = C:\Program Files (x86)\GnuWin32/etc/wgetrc 
    --2013-10-16 10:06:29-- http://www.segundamano.es/anuncios-madrid/ 
    Resolving www.segundamano.es... 195.77.179.69, 213.4.96.70 
    Connecting to www.segundamano.es|195.77.179.69|:80... connected. 
    HTTP request sent, awaiting response... 200 OK 
    Length: unspecified [text/html] 
    Saving to: `index.html' 
    [<=>] 178,588  267K/s in 0.7s 

    2013-10-16 10:06:33 (267 KB/s) - `index.html' saved [178588]