如何使用正則表達式搜索短語？

請幫助修復腳本。如何使用正則表達式搜索短語？

import urllib 
import re 
import os 
import pprint 

import requests 
import bs4 

stringHtml = urllib.request.urlopen('http://forum.saransk.ru/user/2018-sergey-kalinin/').read().decode('utf-8') 
#print(stringHtml) 
stringPattern = 'url\suid"\shref="http://vkontakte.ru/id10550933"' 
result = re.search(stringPattern, stringHtml) 
if result: 
    print(result.group()) 
else: 
    print('no result')

問題是腳本顯示「無結果」。正確編譯正則表達式。請幫助找到一個錯誤

來源

2014-02-20 Sergey

爲什麼不使用bs4進口？

如果你想打印帶有uid類和url的a元素href屬性，你可以使用select method (which accept css selector)。

import urllib.request 

import bs4 

stringHtml = urllib.request.urlopen('http://forum.saransk.ru/user/2018-sergey-kalinin/').read()#.decode('utf-8') 
soup = bs4.BeautifulSoup(stringHtml) 
for a in soup.select('a.url.uid'): 
    print(a.get('href')) 

# If you want to check whether the a tag with `href="http://vkontakte..."` exist, 
# use following lines instead. 
# (CSS Selector `a.url.uid[href="..."]` does not work with bs4. 
# bs4 supports most commonly-used CSS selectors, not all of them) 
#print(any(a.get('href') == 'http://vkontakte.ru/id10550933' 
#  for a in soup.select('a.url.uid')))

輸出：

http://vkontakte.ru/id10550933

來源

2014-02-20 15:34:09 falsetru

我想他試圖檢查它是否存在，把是的，這是一個比正則表達式更好的方法 –

@ RyanO'Donnell，如果這是OP想要的，用print（any（a.get （'href'）=='http://vkontakte.ru/id10550933'for a soup.select（'a.url.uid'）））'會完成這項工作。 – falsetru

謝謝，但我知道如何使用模塊「beautifulSoup」。正則表達式的好奇決定 – Sergey

我敢肯定，你的錯誤在你的正則表達式中。您正在尋找的文字：

網址UID的」 href：//vkontakte.ru/id10550933"

貌似空白錯誤？

來源

2014-02-20 15:27:50 user590028

頁面的源代碼顯示

<a class="url uid" rel="external me" href="http://vkontakte.ru/id10550933">http://vkontakte.ru/id10550933</a>

所以你想要的東西是一樣的東西

import bs4 
import requests 

url = 'http://forum.saransk.ru/user/2018-sergey-kalinin/' 
html = requests.get(url).content 
page = bs4.BeautifulSoup(html) 
link = page.find("a", {"class": "url uid"}) 
print(link["href"])

這給

http://vkontakte.ru/id10550933

來源

2014-02-20 15:44:13

如何使用正則表達式搜索短語？

回答

相關問題