2012-10-14 23 views
-2

我試圖創建將改變用戶代理urllib2.build_opener()方法需要幫助我的班相結合的Python

吳丹是我走到這一步:

Crawler.py

import urllib, urllib2, cookielib 
from bs4 import BeautifulSoup 
import urlopener 
import re, os 

class Crawler(): 
    def __init__(self): 
     # Web site that contains all the browser headers 
     self.url = 'http://somewebsite' 
     self.opener = urlopener.opener() 
     self.web_page=self.opener.open(self.url) 
     self.soup=BeautifulSoup(self.web_page.read()) 
def current_browser(self): 
     try: 
      web_page=self.opener.open(self.url) 
      soup=BeautifulSoup(web_page.read()) 
      return soup.find(id='uas_textfeld').string 
     except urllib2.HTTPError: 
      print 'ERROR' 

urlopener:

import cookielib 
import urllib, urllib2 
import linecache, random 

cj=cookielib.CookieJar() 
useragent='Mozilla/5.0 (BlackBerry; U; BlackBerry 9850; en-US) AppleWebKit/534.11+ (KHTML, like Gecko) Version/7.0.0.115 Mobile Safari/534.11+' 

def opener(): 
    #Process Hadlers 
    opener=urllib2.build_opener(urllib2.HTTPCookieProcessor(cj)) 
    opener.addheaders=[ 
        ('User-Agent', useragent), 
        ('Accept', 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8'), 
        ('Accept-Language', 'en-gb,en;q=0.5'), 
        ('Accept-Charset', 'ISO-8859-1,utf-8;q=0.7,*;q=0.7'), 
        ('Keep-Alive', '115'), 
        ('Connection', 'keep-alive'), 
        ('Cache-Control', 'max-age=0'), 
       ] 
    return opener 

#randomly change browser 
def browser_change(f_path): 
    #f_path is a path to the file that contains browsers 
    #To get the file uncoment next lines 
    #c=Crawler() 
    #c.get_to_the_mobile_browser_list() 
    f=open(f_path, 'r+') 
    count=0 
    for line in f.xreadlines(): count+=1 
    br_num=random.randint(1,count) 
    useragent=linecache.getline(f_path, br_num) 
    return opener() 

現在看到的我怎麼測試Crawler.py:

c=Crawler() 
print 'Current Browser :\n',c.current_browser() 
f_path='/home/vor/mob_brows.txt' 
opener=urlopener.browser_change(f_path) # The problem is right here!!!!! 
b=Crawler() 
print 'New Browser:\n',b.current_browser() 

在我的輸出電流的瀏覽器和新的瀏覽器是一樣的

Current Browser : 
Mozilla/5.0 (BlackBerry; U; BlackBerry 9850; en-US) AppleWebKit/534.11+ (KHTML, like Gecko) Version/7.0.0.115 Mobile Safari/534.11+ 
New Browser: 
Mozilla/5.0 (BlackBerry; U; BlackBerry 9850; en-US) AppleWebKit/534.11+ (KHTML, like Gecko) Version/7.0.0.115 Mobile Safari/534.11+ 

文件mob_brows.txt包含這樣的信息:

Mozilla/5.0 (Linux; U; Android 2.3.3; zh-tw; HTC_Pyramid Build/GRI40) AppleWebKit/533.1 (KHTML, like Gecko) Version/4.0 Mobile Safari/533.1 
Mozilla/5.0 (Linux; U; Android 2.3.3; zh-tw; HTC_Pyramid Build/GRI40) AppleWebKit/533.1 (KHTML, like Gecko) Version/4.0 Mobile Safari 
Mozilla/5.0 (Linux; U; Android 2.3.3; zh-tw; HTC Pyramid Build/GRI40) AppleWebKit/533.1 (KHTML, like Gecko) Version/4.0 Mobile Safari/533.1 
Mozilla/5.0 (Linux; U; Android 2.3.3; ko-kr; LG-LU3000 Build/GRI40) AppleWebKit/533.1 (KHTML, like Gecko) Version/4.0 Mobile Safari/533.1 
Mozilla/5.0 (Linux; U; Android 2.3.3; en-us; HTC_DesireS_S510e Build/GRI40) AppleWebKit/533.1 (KHTML, like Gecko) Version/4.0 Mobile 
+4

是什麼問題? – root

+0

我需要在我的代碼中更改以更改用戶代理 – Vor

回答

1

修改opener接受用戶代理作爲一個參數...

def opener(user_agent): 
    #Process Hadlers 
    opener=urllib2.build_opener(urllib2.HTTPCookieProcessor(cj)) 
    opener.addheaders=[ 
        ('User-Agent', user_agent), 
       # snip... 
       ] 
    return opener 

然後生成不同的用戶代理字符串歎爲觀止的列表...

# this could be nicer, but demonstrates the point 
openers = [opener(agent) for agent in open('your_f_path')] 

然後使用隨機模塊中的choice挑選開場器,在Crawler類中分配self.opener = urlopener.opener()

from random import choice 
use_to_open = choice(openers) 
+0

非常感謝 – Vor