0
我有定義爲字符串類型變量的網頁的源代碼。我知道某個日期會在源代碼上出現。我想打印出該日期之前出現的第一個鏈接。此鏈接可以在單引號(""
)之間找到,下面的代碼:使用搜索方法打印子字符串
import requests
from datetime import date
import re
link = "https://www.google.com.mx/search?biw=1535&bih=799&tbm=nws&q=%22New+Strong+Buy%22+site%3A+zacks.com&oq=%22New+Strong+Buy%22+site%3A+zacks.com&gs_l=serp.3...1632004.1638057.0.1638325.24.24.0.0.0.0.257.2605.0j15j2.17.0....0...1c.1.64.serp..8.0.0.Nl4BZQWwR3o"
fetch_data =requests.get(link)
content = str((fetch_data.content))
#this is the source code as a string
Months = ["January","February","March","April","May","June","July","August","September","October","November","December"]
today = date.today()
A= ("%s %s" % (Months[today.month - 1],today.day))
a=today.day
B= A in content
if B == True:
B = ("%s %s" % (Months[today.month - 1], a))
else:
while B == False:
a = a - 1
B = ("%s %s" % (Months[today.month - 1], a))
#the B variable is the string date that will appear in the variable string content
c= ('"https:')
Z= ("%s(.*)%s" % (c,B))
result = re.search(Z, content)
print (result)
這就是我想:我所期望的是變量之間的串c
和B
,代碼沒發現什麼
如果有人尋找源代碼the link你會發現,今天的日期「12月27日」中只出現一次,而且我很感興趣的鏈接顯示爲「https://www.zacks.com/commentary/98986/new-strong-buy-stocks-for-december-27th」之前。
人可以幫我自動蟒蛇來定義這個鏈接,並打印了嗎?
的'而B ==假:'循環永遠不會搜索'B'在'content'。 – Barmar
使用正則表達式來解析HTML通常是一個壞主意。使用DOM解析器庫。 – Barmar