我無法在過濾器表達式中使用re.search。在Python過濾器函數中使用re.search
我想使用re.search從列表中提取href值,其中每個元素都是html行。
下面是我在做什麼:
>>> filter(lambda html_line: re.search('.*a href=\"([^\"]*).*', html_line), data)
[u'Directory Feb 28 23:57 <b><a href="/MyApp/LogBrowser?type=crawler/2014.02.28">2014.02.28</a></b>'
u'Directory Mar 01 23:59 <b><a href="/MyApp/LogBrowser?type=crawler/2014.03.01">2014.03.01</a></b>'
u'Directory Mar 02 23:50 <b><a href="/MyApp/LogBrowser?type=crawler/2014.03.02">2014.03.02</a></b>'
u'Directory Mar 03 23:59 <b><a href="/MyApp/LogBrowser?type=crawler/2014.03.03">2014.03.03</a></b>'
u'Directory Mar 04 23:50 <b><a href="/MyApp/LogBrowser?type=crawler/2014.03.04">2014.03.04</a></b>'
u'Directory Mar 05 23:50 <b><a href="/MyApp/LogBrowser?type=crawler/2014.03.05">2014.03.05</a></b>'
u'Directory Mar 06 23:50 <b><a href="/MyApp/LogBrowser?type=crawler/2014.03.06">2014.03.06</a></b>'
u'Directory Mar 07 23:50 <b><a href="/MyApp/LogBrowser?type=crawler/2014.03.07">2014.03.07</a></b>'
u'Directory Mar 08 23:50 <b><a href="/MyApp/LogBrowser?type=crawler/2014.03.08">2014.03.08</a></b>']
我re.search通話似乎正常工作。
例如,這個工程:
>>> for html_line in data:
print re.search('.*a href=\"([^\"]*).*', html_line).group(1)
/MyApp/LogBrowser?type=crawler/2014.02.28
/MyApp/LogBrowser?type=crawler/2014.03.01
/MyApp/LogBrowser?type=crawler/2014.03.02
/MyApp/LogBrowser?type=crawler/2014.03.03
/MyApp/LogBrowser?type=crawler/2014.03.04
/MyApp/LogBrowser?type=crawler/2014.03.05
/MyApp/LogBrowser?type=crawler/2014.03.06
/MyApp/LogBrowser?type=crawler/2014.03.07
/MyApp/LogBrowser?type=crawler/2014.03.08
什麼你的'過濾器'的預期輸出? – Hyperboreus
href值或匹配對象的列表。 –