Python捲曲寫函數不能在第二次調用

我已經用Python編寫了一個簡單的腳本。Python捲曲寫函數不能在第二次調用

它解析網頁的超鏈接，然後檢索這些鏈接來解析一些信息。

我有類似的腳本運行和重新使用writefunction沒有任何問題，由於某種原因失敗，我不明白爲什麼。

一般捲曲的init：

storage = StringIO.StringIO() 
c = pycurl.Curl() 
c.setopt(pycurl.USERAGENT, USER_AGENT) 
c.setopt(pycurl.COOKIEFILE, "") 
c.setopt(pycurl.POST, 0) 
c.setopt(pycurl.FOLLOWLOCATION, 1) 
#Similar scripts are working this way, why this script not? 
c.setopt(c.WRITEFUNCTION, storage.write)

第一次調用中檢索鏈接：

URL = "http://whatever" 
REFERER = URL 

c.setopt(pycurl.URL, URL) 
c.setopt(pycurl.REFERER, REFERER) 
c.perform() 

#Write page to file 
content = storage.getvalue() 
f = open("updates.html", "w") 
f.writelines(content) 
f.close() 
... Here the magic happens and links are extracted ...

現在循環這些鏈接：

for i, member in enumerate(urls): 
    URL = urls[i] 
    print "url:", URL 
    c.setopt(pycurl.URL, URL) 
    c.perform() 

    #Write page to file 
    #Still the data from previous! 
    content = storage.getvalue() 
    f = open("update.html", "w") 
    f.writelines(content) 
    f.close() 
    #print content 
    ... Gather some information ... 
    ... Close objects etc ...

來源

2013-05-05 honda4life

您可以在循環中嘗試'c.setopt（c.WRITEFUNCTION，f.write）'以避免將數據附加到同一個對象。 'Curl（）'是可重用的，這可能就足夠了。 – jfs 2013-05-05 22:55:46

沒有，這不起作用，我以前試過，我認爲這只是通過參考。是否有可能從第一頁開始的字符串長度太大（與使用Curl和Python進行檢索的其他內容相比，網頁非常大） – honda4life 2013-05-06 17:18:20

如果你想下載的URL到不同的文件中序列（無併發連接）：

for i, url in enumerate(urls): 
    c.setopt(pycurl.URL, url) 
    with open("output%d.html" % i, "w") as f: 
     c.setopt(c.WRITEDATA, f) # c.setopt(c.WRITEFUNCTION, f.write) also works 
     c.perform()

注：

storage.getvalue()返回從它產生的那一刻寫入storage一切。在你的情況，你應該找到多個URL在它的輸出
open(filename, "w")覆蓋文件（以前的內容消失了），即update.html包含無論是在content上最後迭代循環的

來源

2013-05-06 19:18:35 jfs

「storage.getvalue（）返回從現在開始寫入存儲的所有內容被建造。」這就是我想聽到的，可能我沒有注意到它在我的其他腳本中，當用瀏覽器打開它時可能會被忽略，當用文本編輯器打開時它可能是可見的或類似的東西。 – honda4life 2013-05-06 19:55:32

Python捲曲寫函數不能在第二次調用

回答

相關問題