如何使用美麗的湯<script>標籤提取字符串？

在一個給定的.html頁面，我有一個腳本標籤，像這樣：如何使用美麗的湯<script>標籤提取字符串？

 <script>jQuery(window).load(function() { 
    setTimeout(function(){ 
    jQuery("input[name=Email]").val("[email protected]"); 
    }, 1000); 
});</script>

如何使用美麗的湯提取電子郵件地址？

來源

2016-07-24 dundonian

要添加更多一點的@Bob's answer和假設您還需要找到其中可能有其他script標籤的HTML標籤script。

的思想是定義的正則表達式將用於既locating the element with BeautifulSoup並提取email值：

import re 

from bs4 import BeautifulSoup 


data = """ 
<body> 
    <script>jQuery(window).load(function() { 
     setTimeout(function(){ 
     jQuery("input[name=Email]").val("[email protected]"); 
     }, 1000); 
    });</script> 
</body> 
""" 
pattern = re.compile(r'\.val\("([^@][email protected][^@]+\.[^@]+)"\);', re.MULTILINE | re.DOTALL) 
soup = BeautifulSoup(data, "html.parser") 

script = soup.find("script", text=pattern) 
if script: 
    match = pattern.search(script.text) 
    if match: 
     email = match.group(1) 
     print(email)

打印：[email protected]。

在這裏，我們使用的是simple regular expression for the email address，但我們可以走得更遠，並更加嚴格，但我懷疑這將是實際需要的這個問題。

來源

2016-07-24 07:22:39 alecxe

不可能只使用BeautifulSoup，但你可以做到這一點，例如與BS +正則表達式

import re 
from bs4 import BeautifulSoup as BS 

html = """<script> ... </script>""" 

bs = BS(html) 

txt = bs.script.get_text() 

email = re.match(r'.+val\("(.+?)"\);', txt).group(1)

或像這樣：

... 

email = txt.split('.val("')[1].split('");')[0]

來源

2016-07-24 01:34:18 Bob

如何使用美麗的湯<script>標籤提取字符串？

回答

相關問題