從變量中獲取python中日誌文件的值

-2

<!DOCTYPE html><html><head><title>Intro</title></head><body><a href='/name=t1.304.log'>Test</a>. </body></html>

我想從上面的行中提取t1.304.log。我正在使用打印log_name.split(".log",1)[0]，但它是我第一個完整的部分。

來源

2015-09-27 Aquarius24

你可以詳細說明你的意思嗎？你想提取任何看起來像「something.log」的字符串嗎？ – Leo

是以.log結尾的任何字符串。它只會來一次 – Aquarius24

「只有一次」，你的意思只是第一個匹配的子字符串？或者你想確保字符串只包含一個匹配？ – Leo

如果您只是想快速做到這一點，您可以使用記錄的split()函數here。

log_name.split("'")[1].split("=")[1]

但是這樣做在一個可重用的方式尋找到一個工具，像beautifulsoup

編輯以

根據您的意見，你可以這樣做補充：

print(log_name.split(".log",1)[0].rsplit("=",1)[1] + ".log")

來源

2015-09-27 20:45:07 dstudeba

即不是字符串，我是從源代碼取值 – Aquarius24

進口的urllib URL =「http://www.google.com」日誌文件=了urllib.urlopen（URL）日誌=日誌文件.read（） logfile = logfile.split（「。log」，1）[0] .rsplit（「=」，1）[1] +「.log」） – Aquarius24

import re 
    st = " <!DOCTYPE html><html><head><title>Intro</title></head><body><a href='/name=t1.304.log'>Test</a>. </body></html>" 

    mo = re.search('(t\S*log)', st) 

    print(mo.group())

輸出

t1.304.log

來源

2015-09-27 20:49:00 LetzerWille

你可以使用正則表達式（與re模塊），假設你的字符串變量爲page_source：

>>> import re 
>>> re.findall('.*=(.*.log)', page_source) 
['t1.304.log']

這給你所有匹配「* .LOG」子列表。

但是，請注意，顯然不建議使用正則表達式來解析HTML - 請參閱this discussion。

實際上，不要這樣做，請使用alecxe's answer。

來源

2015-09-27 20:50:47 Leo

爲什麼不用一個HTML parser解析HTML？

>>> from bs4 import BeautifulSoup 
>>> data = "<!DOCTYPE html><html><head><title>Intro</title></head><body><a href='/name=t1.304.log'>Test</a>. </body></html>" 
>>> BeautifulSoup(data).a["href"].split("=")[-1] 
't1.304.log'

來源

2015-09-27 20:51:05 alecxe

從變量中獲取python中日誌文件的值

回答

相關問題