所以我試圖匹配使用Python和正則表達式在亞馬遜項目頁中的貨幣字符串。Python的正則表達式不匹配所有的字符串
我當前的代碼,因爲它代表:
import csv
import requests as rq
import re
import lxml
from bs4 import BeautifulSoup as bs
i = 0
urls = csv.reader(open('/Users/Fuck/Documents/Amazon/HTML_Parsetest/urls.csv'))
for url in urls:
r=rq.get(url[0],stream=True)
for chunk in r.iter_content(chunk_size=2048):
if chunk:
data = chunk
soup=bs(data, "lxml")
elem=soup.find_all('td',attrs={'class':'a-text-right dp-used-col'})
print(elem)
if elem!=[]:
i = i + 1
s=re.findall('(\£\d+\.\d+)+',str(elem[0]))
print (i,"Price:", s[0].split()[0])
當前打印出從first url:
[<td class="a-text-right dp-used-col">
<a class="a-link-normal" href="/gp/offer-listing/019859660X/ref=tmm_hrd_used_olp_0?ie=UTF8&condition=used&qid=&sr=">
<span>£51.70</span>
</a>
</td>]
1 Price: £51.70
[<td class="a-text-right dp-used-col">
<a class="a-link-normal" href="/gp/offer-listing/0198596790/ref=tmm_pap_used_olp_sr?ie=UTF8&condition=used&qid=&sr=">
<span>£35.15</span>
</a>
</td>]
2 Price: £35.15
從second url打印出來:
[<td class="a-text-right dp-used-col">
<a class="a-link-normal" href="/gp/offer-listing/0521254167/ref=tmm_hrd_used_olp_0?ie=UTF8&condition=used&qid=&sr=">
<span>£355.37</span>
</a>
</td>, <td class="a-text-right dp-used-col">
<a class="a-link-normal" href="/gp/offer-listing/0521274249/ref=tmm_pap_used_olp_sr?ie=UTF8&condition=used&qid=&sr=">
<span>£29.93</span>
</a>
</td>]
3 Price: £355.37
在第二url運行,它發現整個TD塊作爲一個實體,而在第一個我噸發現他們作爲單獨的塊,我不知道爲什麼。 所以看來我的正則表達式只會在每個塊中找到一個字符串實例。
如何在第二個網址找到兩個字符串£355.37和£29.93?
我發現[在線正則表達式測試儀(https://regex101.com/)通常是有幫助的 – miraculixx
@miraculixx正則表達式似乎是罰款。 – taleinat
價格總是以'£'爲單位嗎? –