2014-01-13 90 views
0

即時通訊新的編碼和嘗試學習,因爲我去。pycurl無限循環和getopt問題

我試圖創建一個python腳本,將抓取和打印所有的頭文件中的所有網址在一個txt文件中的URL。

它似乎到達那裏,但我陷入了一個無限循環與網址之一,我不知道爲什麼和由於某種原因,「-h或--help」不會返回usage()。任何幫助,將不勝感激。

下面是我迄今爲止

#!/usr/bin/python 

import pycurl 
import cStringIO 
import sys, getopt 

buf = cStringIO.StringIO() 
c = pycurl.Curl() 

def usage(): 
    print "-h --help, -i --urlist, -o --proxy" 
    sys.exit() 

def main(argv): 
    iurlist = None 
    proxy = None 
    try: 
     opts, args = getopt.getopt(argv,"hi:o:t",["help", "iurlist=","proxy="]) 
     if not opts: 
     print "No options supplied" 
     print "Type -h for help" 
     sys.exit() 
    except getopt.GetoptError as err: 
     print str(err) 
     usage() 
     sys.exit(2) 

    for opt, arg in opts: 
     if opt == ("-h", "--help"): 
      usage() 
      sys.exit() 
     elif opt in ("-i", "--iurlist"): 
      iurlist = arg 
     elif opt in ("-o", "--proxy"): 
      proxy = arg 
     else: 
      assert False, "Unhandeled option" 

with open(iurlist) as f: 
     iurlist = f.readlines() 
     print iurlist 

try: 
     for i in iurlist: 
      c.setopt(c.URL, i) 
      c.setopt(c.PROXY, proxy) 
      c.setopt(c.HEADER, 1) 
      c.setopt(c.FOLLOWLOCATION, 1) 
      c.setopt(c.MAXREDIRS, 30) 
      c.setopt(c.USERAGENT, 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:24.0) Gecko/20100101 Firefox/24.0') 
      c.setopt(c.TIMEOUT, 8) 
      c.setopt(c.CONNECTTIMEOUT, 5) 
      c.setopt(c.NOBODY, 1) 
      c.setopt(c.PROXY, proxy) 
      c.setopt(c.WRITEFUNCTION, buf.write) 
      c.setopt(c.SSL_VERIFYPEER, 0) 
      c.perform() 
      print buf.getvalue() 
      buf.close 

    except pycurl.error, error: 
     errno, errstr = error 
     print 'An error has occurred: ', errstr 

if __name__ == "__main__": 
    main(sys.argv[1:]) 

這是最新的代碼

#!/usr/bin/python 

import pycurl 
import cStringIO 
import sys, getopt 

c = pycurl.Curl() 

def usage(): 
    print "-h --help, -i --urlist, -o --proxy" 
    print "Example Usage: cURLdect.py -i urlist.txt -o http://192.168.1.64:8080" 
    sys.exit() 

def main(argv): 
    iurlist = None 
    proxy = None 
    try: 
     opts, args = getopt.getopt(argv,"hi:o:t",["help", "iurlist=","proxy="]) 
     if not opts: 
     print "No options supplied" 
     print "Type -h for help" 
     sys.exit() 
    except getopt.GetoptError as err: 
     print str(err) 
     usage() 
     sys.exit(2) 

    for opt, arg in opts: 
     if opt in ("-h", "--help"): 
      usage() 
      sys.exit() 
     elif opt in ("-i", "--iurlist"): 
      iurlist = arg 
     elif opt in ("-o", "--proxy"): 
      proxy = arg 
     else: 
      assert False, "Unhandeled option" 

    with open(iurlist) as f: 
     iurlist = f.readlines() 
     print iurlist 

    try: 
     for i in iurlist: 
      buf = cStringIO.StringIO() 
      c.setopt(c.WRITEFUNCTION, buf.write) 
      c.setopt(c.PROXY, proxy) 
      c.setopt(c.HEADER, 1) 
      c.setopt(c.FOLLOWLOCATION, 1) 
      c.setopt(c.MAXREDIRS, 300) 
      c.setopt(c.USERAGENT, 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:24.0) Gecko/20100101 Firefox/24.0') 
      c.setopt(c.TIMEOUT, 8) 
      c.setopt(c.CONNECTTIMEOUT, 5) 
      c.setopt(c.NOBODY, 1) 
      c.setopt(c.SSL_VERIFYPEER, 0) 
      c.setopt(c.URL, i) 
      c.perform() 
      print buf.getvalue() 
      buf.close() 
    except pycurl.error, error: 
     errno, errstr = error 
     print 'An error has occurred: ', errstr 

if __name__ == "__main__": 
    main(sys.argv[1:]) 
+0

我已經找出了一種方法來解決有關使用()的getopt問題。我對代碼進行了如下更改:for opt,arg in opts: if opt ==「-h」: usage() sys。退出() elif opt in(「--help」): usage() sys.ext()' – LearningCode

+0

您正在濫用buf。不帶大括號的''buf.close'不會關閉它,返回一個函數。 – xbello

+0

@xbello對不起,我該如何關閉它? – LearningCode

回答

0

如果你正在學習,pycurl可能不是最好的選擇。他們假設你熟悉libcurl庫。從http://pycurl.sourceforge.net/

PycURL是針對一個先進開發商 - 如果你需要幾十個併發,快速和可靠的連接,或上述任何然後PycURL是爲你列出的複雜的功能。

PycURL的主要缺點是它是一個比libcurl相對較薄的層,沒有任何這些不錯的Pythonic類層次結構。這意味着它有一個陡峭的學習曲線,除非您已經熟悉libcurl的C API。

這是他們是如何做到多取:https://github.com/pycurl/pycurl/blob/master/examples/retriever-multi.py


要取頭一拉蟒蛇,安裝requests庫,只是做:

for url in list_of_urls: 
    r = requests.get(url) 
    print r.headers 

要處理命令行參數,請使用python附帶的電池中的argparser

+0

我明天再試試這個謝謝!) – LearningCode

0

您使用

如果選擇==(「-h 「,」--help「):

的幫助選項,但

如果選擇在(....)

所有其他選項。 opt要麼是-h要麼是--help,但不是兩者,所以您需要使用in來檢查opt是否也是。

+0

謝謝你會等待其他問題的答案和選擇相應的答案 – LearningCode