Python。如何查找匹配子串的所有匹配項？

-2

我有一個很大的字符串 - html頁面。我需要找到閃存驅動器的所有名稱，即，即我需要在雙引號之間獲得內容：data-name="USB Flash-drive Leef Fuse 32Gb">。所以我需要一個data-name="和">之間的字符串。請不要提及BeautifulSoup，我需要在不使用BeautifulSoup的情況下完成，沒有正則表達式就更好，但是也可以接受正則表達式。Python。如何查找匹配子串的所有匹配項？

我試圖用這樣的：

p = re.compile('(?<=")[^,]+(?=")') 
result = p.match(html_str) 
print(result)

，但結果是沒有。但在regex101.com它的工作：

來源

2016-06-22 George J

在HTML上使用DOM解析器來提取屬性的值有什麼問題？ –

@Vasili Syrakis我有一定的任務 - 使用python。 –

fyi bs4 = python;看到這個鏈接的第一段：https：//www.crummy.com/software/BeautifulSoup/bs4/doc/ –

PY2：https://docs.python.org/2/library/htmlparser.html

PY3：https://docs.python.org/3/library/html.parser.html

from html.parser import HTMLParser 

class MyHTMLParser(HTMLParser): 
    def handle_starttag(self, tag, attrs): 
     # tag = 'sometag' 
     for attr in attrs: 
      # attr = ('data-name', 'USB Flash-drive Leef Fuse 32Gb') 
      if attr[0] == 'data-name': 
       print(attr[1]) 

parser = MyHTMLParser() 
parser.feed('<sometag data-name="USB Flash-drive Leef Fuse 32Gb">hello world</sometag>')

輸出：

USB Flash-drive Leef Fuse 32Gb

我添加了一些意見到代碼sh你知道解析器返回什麼樣的數據結構。

從這裏開始建造應該很容易。

只要在HTML中提供，它就會解析它。參考文檔，並繼續嘗試。

來源

2016-06-22 12:31:16

非常感謝，上帝保佑你。 –

如果你想與基本的Python字符串解析這裏做它是一種

s="html string" 
start = s.find('data-name="') 
end = s.find('">') 
output = s[start:end]

這是在我的Python殼發生

>>> s='junk...data-name="USB Flash-drive Leef Fuse 32Gb">...junk' 
>>> start = s.find('data-name="') 
>>> end = s.find('">') 
>>> output = s[start:end] 
>>> output 
'data-name="USB Flash-drive Leef Fuse 32Gb'

讓我知道如果腳本作品，這部分單獨使用

來源

2016-06-22 12:43:57 user3404344

不起作用，輸出爲空 –

您的示例工程，但與我的大html字符串 –

如果你仍然在替代解決方案後，你可以粘貼你的長HTML字符串，我測試 – user3404344

Python。如何查找匹配子串的所有匹配項？

回答

相關問題