Python - 使腳本循環，直到條件滿足，併爲每個循環使用不同的代理地址

我是noob的定義。我對python幾乎一無所知，並且正在尋求幫助。我可以閱讀足夠的代碼來改變變量以適應我的需求，但是當我做一些原始代碼不需要的東西...我迷路了。Python - 使腳本循環，直到條件滿足，併爲每個循環使用不同的代理地址

所以這裏是交易，我找到了一個craigslist（CL）標記腳本，最初搜索所有CL網站和標記的帖子，其中包含一個特定的關鍵字（它被寫爲標記所有提到scienceology的帖子）。

我改變它只在我的一般區域（15個網站而不是437）搜索CL網站，但它仍然會查找已更改的特定關鍵字。我想自動標記持續垃圾郵件的人，並且很難排序，因爲我在CL上做了很多業務，從郵件中排序。

我想讓腳本執行循環，直到它不能再在每個循環之後找到滿足標準更改代理服務器的帖子。並在劇本里面放置代理/ IP地址的地方

我期待着您的回覆。

這裏是改變的代碼，我有：

#!/usr/bin/env python 
# -*- coding: utf-8 -*- 


import urllib 
from twill.commands import * # gives us go() 

areas = ['sfbay', 'chico', 'fresno', 'goldcountry', 'humboldt', 'mendocino', 'modesto', 'monterey', 'redding', 'reno', 'sacramento', 'siskiyou', 'stockton', 'yubasutter', 'reno'] 

def expunge(url, area): 
    page = urllib.urlopen(url).read() # <-- and v and vv gets you urls of ind. postings 
    page = page[page.index('<hr>'):].split('\n')[0] 
    page = [i[:i.index('">')] for i in page.split('href="')[1:-1] if '<font size="-1">' in i] 

    for u in page: 
     num = u[u.rfind('/')+1:u.index('.html')] # the number of the posting (like 34235235252) 
     spam = 'https://post.craigslist.org/flag?flagCode=15&amppostingID='+num # url for flagging as spam 
     go(spam) # flag it 


print 'Checking ' + str(len(areas)) + ' areas...' 

for area in ['http://' + a + '.craigslist.org/' for a in areas]: 
    ujam = area + 'search/?query=james+"916+821+0590"+&catAbb=hhh' 
    udre = area + 'search/?query="DRE+%23+01902542+"&catAbb=hhh' 
    try: 
     jam = urllib.urlopen(ujam).read() 
     dre = urllib.urlopen(udre).read() 
    except: 
     print 'tl;dr error for ' + area 

    if 'Found: ' in jam: 
     print 'Found results for "James 916 821 0590" in ' + area 
     expunge(ujam, area) 
     print 'All "James 916 821 0590" listings marked as spam for area' 

    if 'Found: ' in dre: 
     print 'Found results for "DRE # 01902542" in ' + area 
     expunge(udre, area) 
     print 'All "DRE # 01902542" listings marked as spam for area'

來源

2013-02-19 Timothy Core

如果你只使用'go'，只導入'go'：'從twill.commands導入go' – askewchan 2013-02-19 21:17:19

導入錯誤：沒有名爲模塊去 – 2013-02-19 22:18:10

奇怪：HTTP：//斜紋.idyll.org/python-api.html說：'從twill.commands進口去' – askewchan 2013-02-19 22:24:07

您可以創建一個恆定的循環是這樣的：

while True: 
    if condition : 
     break

Itertools有技巧的去重複http://docs.python.org/2/library/itertools.html

特別是屈指可數，退房itertools.cycle

（這些都是指向正確方向的指針。你可以制定一個解決方案，其他，甚至兩個）

來源

2013-02-19 21:18:51

對不起，我不明白它..我試圖添加repeat（）進入代碼，但我不斷得到Traceback（最近一次調用最後）：文件「/ home/quonundrum/Desktop/CL。py'，第43行，在 repeat（'spam，4'） NameError：name'repeat'is not defined >>> – 2013-02-19 21:57:40

'import itertools as it' then call'it.repeat（）' – askewchan 2013-02-19 22:27:31

I' （'go，4'），it.repeat（'go（spam），4'），it.repeat（'expunge'），it.repeat（'ujam'）..和一大堆其他人......這不是重複，但也沒有給出任何錯誤 – 2013-02-19 22:54:55

我對你的代碼做了一些改變。在我看來，函數expunge已經循環遍歷頁面中的所有結果，所以我不確定你需要做什麼循環，但是有一個例子說明如何在結束時檢查結果是否被找到，但沒有循環可以打破。

不知道如何更改代理/ IP。

順便說一句，你有'reno'兩次。

#!/usr/bin/env python 
# -*- coding: utf-8 -*- 

import urllib 
from twill.commands import go 

areas = ['sfbay', 'chico', 'fresno', 'goldcountry', 'humboldt', 
     'mendocino', 'modesto', 'monterey', 'redding', 'reno', 
     'sacramento', 'siskiyou', 'stockton', 'yubasutter'] 
queries = ['james+"916+821+0590"','"DRE+%23+01902542"'] 

def expunge(url, area): 
    page = urllib.urlopen(url).read() # <-- and v and vv gets you urls of ind. postings 
    page = page[page.index('<hr>'):].split('\n')[0] 
    page = [i[:i.index('">')] for i in page.split('href="')[1:-1] if '<font size="-1">' in i] 

    for u in page: 
     num = u[u.rfind('/')+1:u.index('.html')] # the number of the posting (like 34235235252) 
     spam = 'https://post.craigslist.org/flag?flagCode=15&amppostingID='+num # url for flagging as spam 
     go(spam) # flag it 

print 'Checking ' + str(len(areas)) + ' areas...' 

for area in areas: 
    for query in queries: 
     qurl = 'http://' + area + '.craigslist.org/search/?query=' + query + '+&catAbb=hhh' 
     try: 
      q = urllib.urlopen(qurl).read() 
     except: 
      print 'tl;dr error for {} in {}'.format(query, area) 
      break 

     if 'Found: ' in q: 
      print 'Found results for {} in {}'.format(query, area) 
      expunge(qurl, area) 
      print 'All {} listings marked as spam for area'.format(query) 
     elif 'Nothing found for that search' in q: 
      print 'No results for {} in {}'.format(query, area) 
      break 
     else: 
      break

來源

2013-02-19 23:31:43 askewchan

酷，看起來好多了。有沒有辦法讓它繼續運行，直到它沒有得到任何結果？ – 2013-02-19 23:57:53

你意味着你期望結果頁面在程序運行時改變？ – askewchan 2013-02-20 00:02:55

在shell中顯示它發現/標記的東西，所以我想知道是否有腳本繼續運行，直到沒有更多結果對於被搜索的關鍵字（IE瀏覽器的所有結果）重新標記直到刪除）。 – 2013-02-20 00:27:52

我做了一些改變...不知道他們工作得如何，但我沒有得到任何錯誤。請讓我知道，如果你發現任何錯誤/缺少的東西。 - 感謝

#!/usr/bin/env python 
# -*- coding: utf-8 -*- 

import urllib, urllib2 
from twill.commands import go 


proxy = urllib2.ProxyHandler({'https': '108.60.219.136:8080'}) 
opener = urllib2.build_opener(proxy) 
urllib2.install_opener(opener) 
proxy2 = urllib2.ProxyHandler({'https': '198.144.186.98:3128'}) 
opener2 = urllib2.build_opener(proxy2) 
urllib2.install_opener(opener2) 
proxy3 = urllib2.ProxyHandler({'https': '66.55.153.226:8080'}) 
opener3 = urllib2.build_opener(proxy3) 
urllib2.install_opener(opener3) 
proxy4 = urllib2.ProxyHandler({'https': '173.213.113.111:8080'}) 
opener4 = urllib2.build_opener(proxy4) 
urllib2.install_opener(opener4) 
proxy5 = urllib2.ProxyHandler({'https': '198.154.114.118:3128'}) 
opener5 = urllib2.build_opener(proxy5) 
urllib2.install_opener(opener5) 


    areas = ['sfbay', 'chico', 'fresno', 'goldcountry', 'humboldt', 
    'mendocino', 'modesto', 'monterey', 'redding', 'reno', 
    'sacramento', 'siskiyou', 'stockton', 'yubasutter'] 
queries = ['james+"916+821+0590"','"DRE+%23+01902542"'] 

    def expunge(url, area): 
page = urllib.urlopen(url).read() # <-- and v and vv gets you urls of ind. postings 
page = page[page.index('<hr>'):].split('\n')[0] 
page = [i[:i.index('">')] for i in page.split('href="')[1:-1] if '<font size="-1">' in i] 

    for u in page: 
    num = u[u.rfind('/')+1:u.index('.html')] # the number of the posting (like 34235235252) 
    spam = urllib2.urlopen('https://post.craigslist.org/flag?flagCode=15&amppostingID='+num) 
    spam2 = urllib2.urlopen('https://post.craigslist.org/flag?flagCode=28&amppostingID='+num) 
    spam3 = urllib2.urlopen('https://post.craigslist.org/flag?flagCode=16&amppostingID='+num) 
    go(spam) # flag it 
    go(spam2) # flag it 
    go(spam3) # flag it 

print 'Checking ' + str(len(areas)) + ' areas...' 

    for area in areas: 
for query in queries: 
    qurl = 'http://' + area + '.craigslist.org/search/?query=' + query + '+&catAbb=hhh' 
    try: 
     q = urllib.urlopen(qurl).read() 
    except: 
     print 'tl;dr error for {} in {}'.format(query, area) 
     break 

    if 'Found: ' in q: 
     print 'Found results for {} in {}'.format(query, area) 
     expunge(qurl, area) 
     print 'All {} listings marked as spam for {}'.format(query, area) 
     print '' 
     print '' 
    elif 'Nothing found for that search' in q: 
     print 'No results for {} in {}'.format(query, area) 
     print '' 
     print '' 
     break 
    else: 
     break

來源

2013-02-20 19:11:38

#!/usr/bin/env python 
# -*- coding: utf-8 -*- 

import urllib, urllib2 
from twill.commands import go 


proxy = urllib2.ProxyHandler({'https': '108.60.219.136:8080'}) 
opener = urllib2.build_opener(proxy) 
urllib2.install_opener(opener) 
proxy2 = urllib2.ProxyHandler({'https': '198.144.186.98:3128'}) 
opener2 = urllib2.build_opener(proxy2) 
urllib2.install_opener(opener2) 
proxy3 = urllib2.ProxyHandler({'https': '66.55.153.226:8080'}) 
opener3 = urllib2.build_opener(proxy3) 
urllib2.install_opener(opener3) 
proxy4 = urllib2.ProxyHandler({'https': '173.213.113.111:8080'}) 
opener4 = urllib2.build_opener(proxy4) 
urllib2.install_opener(opener4) 
proxy5 = urllib2.ProxyHandler({'https': '198.154.114.118:3128'}) 
opener5 = urllib2.build_opener(proxy5) 
urllib2.install_opener(opener5) 


areas = ['capecod'] 
queries = ['rent','rental','home','year','falmouth','lease','credit','tenant','apartment','bedroom','bed','bath'] 

    def expunge(url, area): 
page = urllib.urlopen(url).read() # <-- and v and vv gets you urls of ind. postings 
page = page[page.index('<hr>'):].split('\n')[0] 
page = [i[:i.index('">')] for i in page.split('href="')[1:-1] if '<font size="-1">' in i] 

    for u in page: 
    num = u[u.rfind('/')+1:u.index('.html')] # the number of the posting (like 34235235252) 
    spam = urllib2.urlopen('https://post.craigslist.org/flag?flagCode=15&amppostingID='+num) 
    spam2 = urllib2.urlopen('https://post.craigslist.org/flag?flagCode=28&amppostingID='+num) 
    spam3 = urllib2.urlopen('https://post.craigslist.org/flag?flagCode=16&amppostingID='+num) 
    go(spam) # flag it 
    go(spam2) # flag it 
    go(spam3) # flag it 

print 'Checking ' + str(len(areas)) + ' areas...' 

    for area in areas: 
for query in queries: 
    qurl = 'http://' + area + '.craigslist.org/search/?query=' + query + '+&catAbb=hhh' 
    try: 
     q = urllib.urlopen(qurl).read() 
    except: 
     print 'tl;dr error for {} in {}'.format(query, area) 
     break 

    if 'Found: ' in q: 
     print 'Found results for {} in {}'.format(query, area) 
     expunge(qurl, area) 
     print 'All {} listings marked as spam for {}'.format(query, area) 
     print '' 
     print '' 
    elif 'Nothing found for that search' in q: 
     print 'No results for {} in {}'.format(query, area) 
     print '' 
     print '' 
     break 
    else: 
     break

來源

2014-05-17 18:26:54 user3648249

Python - 使腳本循環，直到條件滿足，併爲每個循環使用不同的代理地址

回答

相關問題