2012-04-15 53 views
10

我想找到有關使用curl網頁上的信息,但在Python,所以到目前爲止,我有這樣的:如何從Python腳本捕捉捲曲的輸出

os.system("curl --head www.google.com") 

如果我運行的是,它打印出:

HTTP/1.1 200 OK 
Date: Sun, 15 Apr 2012 00:50:13 GMT 
Expires: -1 
Cache-Control: private, max-age=0 
Content-Type: text/html; charset=ISO-8859-1 
Set-Cookie: PREF=ID=3e39ad65c9fa03f3:FF=0:TM=1334451013:LM=1334451013:S=IyFnmKZh0Ck4xfJ4; expires=Tue, 15-Apr-2014 00:50:13 GMT; path=/; domain=.google.com 
Set-Cookie: NID=58=Giz8e5-6p4cDNmx9j9QLwCbqhRksc907LDDO6WYeeV-hRbugTLTLvyjswf6Vk1xd6FPAGi8VOPaJVXm14TBm-0Seu1_331zS6gPHfFp4u4rRkXtSR9Un0hg-smEqByZO; expires=Mon, 15-Oct-2012 00:50:13 GMT; path=/; domain=.google.com; HttpOnly 
P3P: CP="This is not a P3P policy! See http://www.google.com/support/accounts/bin/answer.py?hl=en&answer=151657 for more info." 
Server: gws 
X-XSS-Protection: 1; mode=block 
X-Frame-Options: SAMEORIGIN 
Transfer-Encoding: chunked 

我想要做的,是能夠使用正則表達式(我不需要與幫助)在它的200匹配,但是,我不能找到一種方法,所有的文本轉換上面放入一個字符串。我怎麼做? 我試過:info = os.system("curl --head www.google.com")info只是0

+0

「子進程模塊爲產生新進程和檢索其結果提供了更強大的功能;使用該模塊比使用此函數更可取,請參閱Replac使用子過程文檔中的子流程模塊部分的舊功能以獲得一些有用的配方。「 -http://docs.python.org/library/os.html#os.system – 2012-04-15 01:02:21

回答

2

試試這個:

import httplib 
conn = httplib.HTTPConnection("www.python.org") 
conn.request("GET", "/index.html") 
r1 = conn.getresponse() 
print r1.status, r1.reason 
+8

這並沒有真正回答關於如何從curl捕獲輸出的問題。通常你需要curl發送特定的cookie和其他參數。 – 576i 2014-01-21 10:55:43

17

試試這個,使用subprocess.Popen()

import subprocess 
proc = subprocess.Popen(["curl", "--head", "www.google.com"], stdout=subprocess.PIPE) 
(out, err) = proc.communicate() 
print out 

由於在規定的documentation

的子模塊,可以讓你產生新的進程,連接到它們的輸入/輸出/錯誤管道,獲取他們的返回碼。該模塊打算更換其他幾個,舊的模塊和功能,如:

os.system 
os.spawn* 
os.popen* 
popen2.* 
commands.* 
+0

爲什麼?解釋plz – Billjk 2012-04-15 01:03:49

+0

@ user1333973:因爲'subprocess'工作,'os.system()'不。 – 2012-04-15 01:04:38

+0

@ user1333973添加鏈接到文檔 – 2012-04-15 01:06:38

0

你可以使用一個HTTP庫或HTTP客戶端庫在Python,而不是調用一個curl命令。事實上,你可以安裝一個curl庫(只要你在你的OS上有一個編譯器)。

其他選擇是httplib2(推薦),它是一個相當完整的支持緩存的http協議客戶端,也可以是純粹的httplib或名爲Request的庫。

如果你真的想只是運行curl命令並捕獲它的輸出,那麼你就可以POPEN這裏記錄的內置子模塊中做到這一點:http://docs.python.org/library/subprocess.html

0

好吧,有一個更容易閱讀,但更混亂的方式來做到這一點。那就是:

import os 
outfile='' #put your file path there 
os.system("curl --head www.google.com>>{x}".format(x=str(outfile)) #Outputs command to log file (and creates it if it doesnt exist). 
readOut=open("{z}".format(z=str(outfile),"r") #Opens file in reading mode. 
for line in readOut: 
    print line #Prints lines in file 
readOut.close() #Closes file 
os.system("del {c}".format(c=str(outfile)) #This is optional, as it just deletes the log file after use. 

這應該爲您的需求正常工作。 :)

8

出於某種原因...我需要用捲曲(無pycurl,httplib2的...),也許這可以幫助別人:

import os 
result = os.popen("curl http://google.es").read() 
print result 
+2

感謝這比其他答案更直觀,方便骯髒/快速創建的腳本:) – 2016-09-05 18:49:16

2
import os 
cmd = 'curl https://randomuser.me/api/' 
os.system(cmd) 

結果

{"results":[{"gender":"male","name":{"title":"mr","first":"çetin","last":"nebioğlu"},"location":{"street":"5919 abanoz sk","city":"adana","state":"kayseri","postcode":53537},"email":"çetin.nebioğ[email protected]","login":{"username":"heavyleopard188","password":"forgot","salt":"91TJOXWX","md5":"2b1124732ed2716af7d87ff3b140d178","sha1":"cb13fddef0e2ce14fa08a1731b66f5a603e32abe","sha256":"cbc252db886cc20e13f1fe000af1762be9f05e4f6372c289f993b89f1013a68c"},"dob":"1977-05-10 18:26:56","registered":"2009-09-08 15:57:32","phone":"(518)-816-4122","cell":"(605)-165-1900","id":{"name":"","value":null},"picture":{"large":"https://randomuser.me/api/portraits/men/38.jpg","medium":"https://randomuser.me/api/portraits/med/men/38.jpg","thumbnail":"https://randomuser.me/api/portraits/thumb/men/38.jpg"},"nat":"TR"}],"info":{"seed":"0b38b702ef718e83","results":1,"page":1,"version":"1.1"}}