我工作的一個網絡爬蟲會派生7成一線,每個查詢唯一的網址的XML文件。當每個查詢收到響應,事實證明這種反應到一個XML樹,像這樣:
conn = http.client.HTTPSConnection(host = uHost, port = uPort)
conn.request('GET', url = '/some/url/file.xml')
resp = conn.getresponse()
tree = xml.etree.ElementTree.parse(resp)
當每個線程啓動時,它被賦予一個queue.Queue()
作爲參數,以便它可以把tree
到其中,因此__main__
是寫入文件的唯一線程。從上面繼續:
__main__
def receive(q):
while True:
try:
uTree = q.get()
uTree.write('/some/path/file.xml')
except queue.Empty:
pass
催生
conn = http.client.HTTPSConnection(host = uHost, port = uPort)
conn.request('GET', url = '/some/url/file.xml')
resp = conn.getresponse()
tree = xml.etree.ElementTree.parse(resp)
q.put_nowait(tree)
不過,我開始接受AttributeError: 'NoneType' object has no attribute 'write'
調用uTree.write()
時。的uTree.write()
以print(type(uTree))
快速變化表明,對象有時會保持不變,其他時間他們成爲NoneType
:
<class 'xml.etree.ElementTree.ElementTree'>
<class 'xml.etree.ElementTree.ElementTree'>
<class 'xml.etree.ElementTree.ElementTree'>
<class 'xml.etree.ElementTree.ElementTree'>
<class 'NoneType'>
<class 'NoneType'>
<class 'xml.etree.ElementTree.ElementTree'>
<class 'xml.etree.ElementTree.ElementTree'>
問題:
爲什麼從threading.Thread()
傳遞的對象爲queue.Queue()
[駐留在__main__
]改爲NoneType
不一致?
我該如何解決這個問題?
完整的代碼(如果需要):
main.py
import queue
import crawl # custom module
import threading
def crawler(query):
while True:
try:
query.connect()
break
except:
pass
def receive(q):
while True:
try:
uQuery = q.get()
uTree = uQuery.tree
uTree.write('/some/path/file.xml')
except queue.Empty:
pass
urls = ['/url1.xml', '/url2.xml', ...]
q = queue.Queue()
queries = [Query(url, q) for url in urls]
threads = [threading.Thread(target = crawler, args = (query,)) for query in queres]
for t in threads:
t.start()
receive(q)
crawl.py
import http.client
import xml.etree.ElementTree as ET
class Query:
def __init__(self, url, q):
self.url = url
self.queue = q
self.tree = None
def connect():
conn = http.Client.HTTPConnect(host = 'something.com', port = '80')
conn.request('GET', url = self.url)
resp = conn.getresponse()
self.tree = ET.parse(resp)
self.queue.put_nowait(self)
conn.close()