2010-09-07 262 views
22

我正在嘗試發出POST請求來檢索有關圖書的信息。 這裏是返回HTTP代碼代碼:302,感動進行HTTP POST請求

import httplib, urllib 
params = urllib.urlencode({ 
    'isbn' : '9780131185838', 
    'catalogId' : '10001', 
    'schoolStoreId' : '15828', 
    'search' : 'Search' 
    }) 
headers = {"Content-type": "application/x-www-form-urlencoded", 
      "Accept": "text/plain"} 
conn = httplib.HTTPConnection("bkstr.com:80") 
conn.request("POST", "/webapp/wcs/stores/servlet/BuybackSearch", 
      params, headers) 
response = conn.getresponse() 
print response.status, response.reason 
data = response.read() 
conn.close() 

當我從一個瀏覽器試試,從這個頁面:http://www.bkstr.com/webapp/wcs/stores/servlet/BuybackMaterialsView?langId=-1&catalogId=10001&storeId=10051&schoolStoreId=15828,它的工作原理。我在代碼中缺少什麼?

編輯: 這是我得到的時候我打電話打印response.msg

302 Moved Date: Tue, 07 Sep 2010 16:54:29 GMT 
Vary: Host,Accept-Encoding,User-Agent 
Location: http://www.bkstr.com/webapp/wcs/stores/servlet/BuybackSearch 
X-UA-Compatible: IE=EmulateIE7 
Content-Length: 0 
Content-Type: text/plain; charset=utf-8 

似乎位置指向同一個URL我試圖訪問擺在首位?

EDIT2:

我使用的urllib2這裏建議嘗試。下面是代碼:

import urllib, urllib2 

url = 'http://www.bkstr.com/webapp/wcs/stores/servlet/BuybackSearch' 
values = {'isbn' : '9780131185838', 
      'catalogId' : '10001', 
      'schoolStoreId' : '15828', 
      'search' : 'Search' } 


data = urllib.urlencode(values) 
req = urllib2.Request(url, data) 
response = urllib2.urlopen(req) 
print response.geturl() 
print response.info() 
the_page = response.read() 
print the_page 

這裏是輸出:

http://www.bkstr.com/webapp/wcs/stores/servlet/BuybackSearch 
Date: Tue, 07 Sep 2010 16:58:35 GMT 
Pragma: No-cache 
Cache-Control: no-cache 
Expires: Thu, 01 Jan 1970 00:00:00 GMT 
Set-Cookie: JSESSIONID=0001REjqgX2axkzlR6SvIJlgJkt:1311s25dm; Path=/ 
Vary: Accept-Encoding,User-Agent 
X-UA-Compatible: IE=EmulateIE7 
Content-Length: 0 
Connection: close 
Content-Type: text/html; charset=utf-8 
Content-Language: en-US 
Set-Cookie: TSde3575=225ec58bcb0fdddfad7332c2816f1f152224db2f71e1b0474c866f3b; Path=/ 
+0

302響應還表明它被移動到哪裏 - 找到該URL並使用它。 – adamk 2010-09-07 14:39:41

回答

26

他們的服務器似乎希望你獲得正確的cookie。這工作:

import urllib, urllib2, cookielib 

cookie_jar = cookielib.CookieJar() 
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cookie_jar)) 
urllib2.install_opener(opener) 

# acquire cookie 
url_1 = 'http://www.bkstr.com/webapp/wcs/stores/servlet/BuybackMaterialsView?langId=-1&catalogId=10001&storeId=10051&schoolStoreId=15828' 
req = urllib2.Request(url_1) 
rsp = urllib2.urlopen(req) 

# do POST 
url_2 = 'http://www.bkstr.com/webapp/wcs/stores/servlet/BuybackSearch' 
values = dict(isbn='9780131185838', schoolStoreId='15828', catalogId='10001') 
data = urllib.urlencode(values) 
req = urllib2.Request(url_2, data) 
rsp = urllib2.urlopen(req) 
content = rsp.read() 

# print result 
import re 
pat = re.compile('Title:.*') 
print pat.search(content).group() 

# OUTPUT: Title:&nbsp;&nbsp;Statics & Strength of Materials for Arch (w/CD)<br /> 
+0

它確實有效!非常感謝你! – infrared 2010-09-07 21:39:16

+6

@infrared:很高興幫助。我可能應該補充說,解決這些類型的一種方法是運行一個HTTP代理,它向您顯示請求/響應的跟蹤。然後,使用瀏覽器和您的代碼,並比較兩條痕跡。通常,您正在尋找cookie或標頭之間的差異。有時需要一些試驗和錯誤。我喜歡使用Fiddler,但任何這樣的工具都可以。 – ars 2010-09-08 07:40:19

1
  1. 也許這就是瀏覽器獲得什麼,你就得跟着302重定向。

  2. 如果一切都失敗了,您可以使用FireBug或tcpdump或wireshark監視Firefox和Web服務器之間的對話,並查看哪些HTTP標頭不同。可能它只是User Agent:標題。