0
我在嘗試獲取頻道標題時遇到了網頁抓取問題。我不知道如何解決這個問題,但是通過使用頻道功能進行一些測試,似乎視頻鏈接與它一起工作,只有頻道鏈接應該與YoutubeChannel功能一起使用。Python - 網頁抓取問題
關於如何解決它的任何想法?
#Required Modules
import urllib
import re
#Defining the YouTube Video function
def YoutubeVideo():
#Making videoLink equal to whatever the user enters as their video link
videoLink = input ('\nWhat is your video link? (In quotations, with http included)\n')
#Goes to the video URL, opens it and reads the HTML file
htmlfile = urllib.urlopen(videoLink) #Searches for this URL
htmltext = htmlfile.read() #Reads the HTML file and sets it to htmltext
#Setup for the view counter
regexView = "<div class=\"watch-view-count\">(.+?)</div>" #Searches for the view count number and sets it to regexView
pattern = re.compile(regexView)
viewCount = re.findall(pattern, htmltext)
#Setup for the video title
regexTitle = "<title>(.+?)</title>" #Searches for the title of the video
patternTitle = re.compile(regexTitle)
videoTitle = re.findall(patternTitle, htmltext)
#Setup for the video upload date
regexUpload = "<strong class=\"watch-time-text\">(.+?)</strong>"
patternUpload = re.compile(regexUpload)
videoUpload = re.findall(patternUpload, htmltext)
print ("\n%s" % (videoLink)) #Prints the video link, primarily for testing
print ("\nThe title of your video is %s and has %s views.\nIt was %s." % (videoTitle, viewCount, videoUpload)) #Prints the information about the video
#Defining the YouTube Channel function
def YoutubeChannel():
#Making channelLink equal to whatever the user enters as their video link
channelLink = input ('\nWhat is your channel link? (In quotations, with http included)\n')
#Goes to the video URL, opens it and reads the HTML file
htmlfile = urllib.urlopen(channelLink) #Searches for this URL
htmltext = htmlfile.read() #Reads the HTML file and sets it to htmltext
#Setup for the channel name
channelTitle = "<title>(.+?)</title>" #Searches for the title of the video
patternChannelTitle = re.compile(channelTitle)
channelTitle = re.findall(patternChannelTitle, htmltext)
print (channelTitle)
ans = True
while ans:
print ("\n[1] Get information regarding a YouTube video.")
print ("\n[2] Get information regarding a YouTube channel.")
print ("\n[Q] Quit the application.")
ans = raw_input("\nWhat would you like to do now? ")
if ans == "1":
YoutubeVideo()
elif ans == "2":
YoutubeChannel()
elif ans == "q":
sys.exit(0)
elif ans != "":
print "Not a valid choice, try again."
使用'BeautifulSoup'或類似的東西,至少,而不是正則表達式可以很容易地安裝解析html。 – Pythonista
也是,使用[Youtube API](https://developers.google.com/youtube/)會不會更容易?我確定有很多腳本可以讓你的生活更輕鬆。 – patrick