未找到python html解析器數據

因此，我正在製作一個網頁「抓取工具」，用於解析網頁，然後在網頁中搜索單詞或單詞集。這裏出現了我的問題，我查找的數據包含在解析後的網頁中（我使用特定的單詞作爲測試運行它），但它表示它所查找的數據尚未找到。未找到python html解析器數據

from html.parser import HTMLParser 
from urllib import * 

class dataFinder(HTMLParser): 
    def open_webpage(self): 
     import urllib.request 
     request = urllib.request.Request('https://www.summet.com/dmsi/html/readingTheWeb.html')#Insert Webpage 
     response = urllib.request .urlopen(request) 
     web_page = response.read() 
     self.webpage_text = web_page.decode() 
     return self.webpage_text 


    def handle_data(self, data): 
     wordtofind = 'PaperBackSwap.com' 
     if data == wordtofind: 
      print('Match found:',data) 
     else: 
      print('No matches found') 



p = dataFinder() 
print(p.open_webpage()) 
p.handle_data(p.webpage_text)

我已經運行該程序沒有打開的網頁功能使用提要方法，它的工作原理和發現數據，但現在不起作用。

解決這個問題的任何幫助表示讚賞

來源

2017-08-14 S0lo

究竟是什麼，你是從網站上提取？來自href標籤的鏈接？ –

我只是試圖從頁面中找到文本，無論是在href標記還是在p標記中 – S0lo

您正在嘗試比較html頁面和字符串，當然他們不是呈三角所以你得到了「沒有發現匹配」。要在字符串中查找字符串，可以使用str.find()方法。它返回文本的第一個找到位置else的位置。

正確的代碼：

from html.parser import HTMLParser 
from urllib import * 

class dataFinder(HTMLParser): 
    def open_webpage(self): 
     import urllib.request 
     request = urllib.request.Request('https://www.summet.com/dmsi/html/readingTheWeb.html')#Insert Webpage 
     response = urllib.request .urlopen(request) 
     web_page = response.read() 
     self.webpage_text = web_page.decode() 
     return self.webpage_text 

    def handle_data(self, data): 
     wordtofind = 'PaperBackSwap.com' 
     if data.find(wordtofind) != -1: 
      print('Match found position:', data.find(wordtofind)) 
     else: 
      print('No matches found') 

p = dataFinder() 
print(p.open_webpage()) 
p.handle_data(p.webpage_text)

來源

2017-08-14 10:25:13 Mentos

這是行不通的，我必須感謝您向我介紹這一點。我對編程相當陌生，因此沒有機會深入探索文檔，如果有人能指出我在文檔中的位置，那麼我會非常感激。還有，你說它返回它的第一個找到的位置，有沒有什麼辦法讓它返回字的所有位置 – S0lo

@ S0lo你可以使用這個函數 - http://code.activestate.com/recipes/ 499314-find-all-indices-of-substring-in-a-given-string /＃c1用於獲取子串的所有位置。你可以像這樣使用它：'allindices（data，wordtofind）' – Mentos

我能夠分析和發現從HTML內容與Beautifulsoup文本，請看看它是否適合你。以下是您的案例的示例代碼。

from bs4 import BeautifulSoup 

soup= BeautifulSoup(web_page,'html.parser') 
for s in soup.findAll(wordtofind): 
    if data == wordtofind: 
     print('Match found:',data) 
    else: 
     print('No matches found')

來源

2017-08-14 10:30:06 SeJaPy

Late to the party, but I would strongly advise using the requests module for HTTP interactions.它會讓你的生活變得更輕鬆。

import requests 
from html.parser import HTMLParser 

class dataFinder(HTMLParser): 
    def open_webpage(self): 
     request = requests.get('https://www.summet.com/dmsi/html/readingTheWeb.html') 
     self.webpage_text = request.text 
     return self.webpage_text

來源

2017-08-14 14:07:11

未找到python html解析器數據

回答

相關問題