Python：上傳文件後抓取數據

我試圖根據上傳到文件的文件上傳一個網站的響應提取。網站有以下形式。Python：上傳文件後抓取數據

<html> 
<head> 
<meta http-equiv="content-type" content="text/html; charset=ISO-8859-1"> 
    </head> 
    <body> 
    <form method="POST" action="http://somewebsite.com/imgdigest" enctype="multipart/form-data"> 
     quality:<input type="text" name="quality" value="2"><br> 
     category:<input type="text" name="category" value="1"><br> 
     debug:<input type="text" name="debug" value="1"><br> 
     image:<input type="file" name="image"><br> 
     <input type="submit" value="Submit"> 
    </form> 
    </body> 
</html>

我想要做的是上傳文件，提交表單並提取響應。

我開始看一個例子，我想我成功地設法上傳工作。因爲當我跑這個時，我沒有得到任何錯誤。

import urllib2_file 
import urllib2 
import request 
import lxml.html as lh 

data = {'name': 'image', 
     'file': open('/user/mydir/21T03NAPE7L._AA75_.jpg') 
     } 
urllib2.urlopen('http://localhost/imgdigestertest.html', data)

不幸的是，我在這裏沒有做出請求來獲取響應。我不知道我該如何做出這樣的迴應。一旦我得到了答案，我應該能夠通過一些我感到舒適的模式匹配來提取數據。

根據所提供的答案嘗試下面的代碼：

import requests 

url = 'http://somesite.com:61235/imgdigest' 
files = {'file': ('21e1LOPiuyL._SL160_AA115_.jpg', 
        open('/usr/local/21e1LOPiuyL._SL160_AA115_.jpg', 'rb'))} 
other_fields = {"quality": "2", 
       "category": "1", 
       "debug": "0" 
       } 
headers={'content-type': 'text/html; charset=ISO-8859-1'} 

response = requests.post(url, data=other_fields, files=files, headers=headers) 

print response.text

現在我得到以下錯誤：它告訴我一些如何圖像文件不會被正確安裝。我們是否必須指定文件類型？

Image::Image(...): bufSize = 0. Can not load image data. Image size = 0. DigestServiceProvider.hpp::Handle(...) |

來源

2012-07-13 Null-Hypothesis

什麼是urllib2.urlopen（'http：//localhost/imgdigestertest.html'，data）.read（）返回？ – RickyA 2012-07-13 21:24:15

我得到了相同的html我發佈的結果，我試着下面的答案和結果是相同的 – 2012-07-13 22:43:24

什麼是data2 = urllib.urlencode（數據） req = urllib2.Request（url，data2）呢？ – RickyA 2012-07-13 22:48:29

使用請求庫（pip install requests，如果您使用pip）。

對於他們爲榜樣，在這裏看到： http://docs.python-requests.org/en/latest/user/quickstart/#post-a-multipart-encoded-file

要自定義看起來像你的：

import requests 
url = 'http://localhost:8080/test_meth' 
files = {'file': ('21T03NAPE7L._AA75_.jpg', 
        open('./text.data', 'rb'))} 
other_fields = {"quality": "2", 
       "category": "1", 
       "debug": "1" 
       } 
response = requests.post(url, data=other_fields, files=files) 
print response.text

在我的本地系統，text.data包含此：

Data in a test file.

我用cherrypy寫了一個server.py（pip install cherrypy）來測試我上面給出的客戶端。下面是server.py來源：

import cherrypy 
class Hello(object): 
    def test_meth(self, category, debug, quality, file): 
     print "Form values:", category, debug, quality 
     print "File name:", file.filename 
     print "File data:", file.file.read() 
     return "More stuff." 
    test_meth.exposed = True 
cherrypy.quickstart(Hello())

當我運行上面client.py，它打印：

More stuff.

正如你可以在server.py例子中看到的是什麼，是回。

同時，服務器說：

Form values: 1 1 2 
File name: 21T03NAPE7L._AA75_.jpg 
File data: Data in a test file. 

127.0.0.1 - - [14/Jul/2012:00:00:35] "POST /test_meth HTTP/1.1" 200 11 "" "python-requests/0.13.3 CPython/2.7.3 Linux/3.2.0-26-generic"

因此，你可以看到，客戶端發佈的文件名作爲代碼和指定的本地文件的文件內容描述。

有一件事要指出，在這篇文章的開頭我說要使用請求庫。這不會與您在原始問題中導入的urllib請求混淆。

來源

2012-07-13 21:23:21

裏面的文件字典你寫的文件名字符串應該是字段的名稱而不是'21T03NAPE7L._AA75_.jpg' – 2012-07-13 22:25:57

什麼原因，我會得到相同的HTML我已發佈結果。 – 2012-07-13 22:43:53

我不確定你的意思是通過獲取你發佈的相同的HTML。如果response.text與您獲得的原始HTML表單相同，那麼這意味着服務器將返回與帖子響應相同的表單。 – 2012-07-14 03:55:38

Python：上傳文件後抓取數據

回答

相關問題