2016-05-02 223 views
0

我在嘗試獲取頻道標題時遇到了網頁抓取問題。我不知道如何解決這個問題,但是通過使用頻道功能進行一些測試,似乎視頻鏈接與它一起工作,只有頻道鏈接應該與YoutubeChannel功能一起使用。Python - 網頁抓取問題

關於如何解決它的任何想法?

#Required Modules 
import urllib 
import re 

#Defining the YouTube Video function 
def YoutubeVideo(): 
    #Making videoLink equal to whatever the user enters as their video link 
    videoLink = input ('\nWhat is your video link? (In quotations, with http included)\n') 

    #Goes to the video URL, opens it and reads the HTML file 
    htmlfile = urllib.urlopen(videoLink) #Searches for this URL 
    htmltext = htmlfile.read() #Reads the HTML file and sets it to htmltext 

    #Setup for the view counter 
    regexView = "<div class=\"watch-view-count\">(.+?)</div>" #Searches for the view count number and sets it to regexView 
    pattern = re.compile(regexView) 
    viewCount = re.findall(pattern, htmltext) 

    #Setup for the video title 
    regexTitle = "<title>(.+?)</title>" #Searches for the title of the video 
    patternTitle = re.compile(regexTitle) 
    videoTitle = re.findall(patternTitle, htmltext) 

    #Setup for the video upload date 
    regexUpload = "<strong class=\"watch-time-text\">(.+?)</strong>" 
    patternUpload = re.compile(regexUpload) 
    videoUpload = re.findall(patternUpload, htmltext) 

    print ("\n%s" % (videoLink)) #Prints the video link, primarily for testing 
    print ("\nThe title of your video is %s and has %s views.\nIt was %s." % (videoTitle, viewCount, videoUpload)) #Prints the information about the video 


#Defining the YouTube Channel function 
def YoutubeChannel(): 
    #Making channelLink equal to whatever the user enters as their video link 
    channelLink = input ('\nWhat is your channel link? (In quotations, with http included)\n') 

    #Goes to the video URL, opens it and reads the HTML file 
    htmlfile = urllib.urlopen(channelLink) #Searches for this URL 
    htmltext = htmlfile.read() #Reads the HTML file and sets it to htmltext 

    #Setup for the channel name 
    channelTitle = "<title>(.+?)</title>" #Searches for the title of the video 
    patternChannelTitle = re.compile(channelTitle) 
    channelTitle = re.findall(patternChannelTitle, htmltext) 

    print (channelTitle) 



ans = True 
while ans: 
    print ("\n[1] Get information regarding a YouTube video.") 
    print ("\n[2] Get information regarding a YouTube channel.") 
    print ("\n[Q] Quit the application.") 

    ans = raw_input("\nWhat would you like to do now? ") 
    if ans == "1": 
     YoutubeVideo() 
    elif ans == "2": 
     YoutubeChannel() 
    elif ans == "q": 
     sys.exit(0) 
    elif ans != "": 
     print "Not a valid choice, try again." 
+4

使用'BeautifulSoup'或類似的東西,至少,而不是正則表達式可以很容易地安裝解析html。 – Pythonista

+3

也是,使用[Youtube API](https://developers.google.com/youtube/)會不會更容易?我確定有很多腳本可以讓你的生活更輕鬆。 – patrick

回答

0

IM不熟悉你使用的是什麼來解析HTML內容 但你可以使用BeautifulSoup這是很容易

import requests 
from bs4 import BeautifulSoup 

# channel url = https://www.youtube.com/channel/XXXXXX 

url = "your channel link" 
page = requests.get(url) 
plain_text = page.text 
soup = BeautifulSoup(plain_text,"html.parser") 
span = soup.find('span',{'class' : 'qualified-channel-title-text'}) 
title =soup.find('a',{'class' : 'spf-link branded-page-header-title-link yt- uix-sessionlink'}) 
title = title.get('title') 
print(title) 

你可以看到使用 跨度整個標題的HTML標籤IM與鏈接和小圖片 和一個文本中的標題和它使用類「spf鏈接品牌頁面標題鏈接yt- uix會話鏈接」 然後即時獲取標題屬性:)

的希望,如果你要運行這個,這是非常有用

注意,您必須安裝beautifulsoup並要求 以及那些可以用管道