Python的正則表列入另一個列表

我目前有一段代碼主要運行，因爲我期望它只打印出原始列表和已被過濾的代碼。基本上我想要做的是從網頁中讀取URL並將它們存儲到列表中（稱爲匹配，這部分工作正常），然後將該列表過濾到新列表中（稱爲fltrmtch），因爲原始包含所有額外的href標籤等。Python的正則表列入另一個列表

例如目前它只會B之後打印出A和B，但林：

甲Core Development '

乙' http://docs.python.org/devguide/「），

赫雷什代碼：

url = "URL WOULD BE IN HERE BUT NOT ALLOWED TO POST MULTIPLE LINKS" #Name of the url being searched 
webpage = urllib.urlopen(url) 

content = webpage.read() #places the read url contents into variable content 


import re # Imports the re module which allows seaching for matches. 
import pprint # This import allows all listitems to be printed on seperate lines. 

match = re.findall(r'\<a.*href\=.*http\:.+', content)#matches any content that begins with a href and ands in > 


def filterPick(list, filter): 
    return [(l, m.group(1)) for l in match for m in (filter(l),) if m] 

regex=re.compile(r'\"(.+?)\"').search 
fltrmtch = filterPick(match, regex) 

try: 

    if match: # defines that if there is a match the below is ran. 
     print "The number of URL's found is:" , len(match) 
     match.sort() 
     print "\nAnd here are the URL's found: " 
     pprint.pprint(fltrmtch) 


except: 
     print "No URL matches have been found, please try again!"

任何幫助將非常感激。

預先感謝您。

UPDATE：謝謝你不過頒發的答案，我設法找到破綻

回報[（1,1- m.group（1））l在匹配米（過濾器（L））如果m]

我只是不得不從[（1，m.group（1）））中刪除1。再次感謝。

來源

2013-11-26 user3034104

看起來代碼的底部大部分都是從頂部捕捉錯誤，並且您提供的正則表達式沒有捕獲組。這是修改後的例子：

import re 
url = "www.site.com" # place real web address here 
# read web page into string 
page = urllib.urlopen(url).read() 
# use regex to extract URLs from <a href=""> patterns 
matches = re.findall(r'''\<a\s[^\>]*?\bhref\=(['"])(.+?)\1[^\>]*?\>''', page, re.IGNORECASE) 
# keep only the second group of positive matches 
matches = sorted([match.group(2) for match in matches if match]) 
# print matches if they exist 
if matches: 
    print("The number of URL's found is:" + str(len(matches))) 
    print("\nAnd here are the URL's found:") 
    # print each match 
    print('\n'.join(matches)) 
else: 
    print 'No URL matches have been found, please try again!'

來源

2013-11-27 03:56:23

Python的正則表列入另一個列表

回答

相關問題