抓取YouTube用戶信息

我試圖抓取Youtube以檢索關於一組用戶（大約200人）的信息。抓取YouTube用戶信息

接觸
用戶
訂閱
他們在
評論什麼視頻等

我已經成功地：我在尋找的用戶之間的關係很感興趣獲取以下來源的聯繫信息：

import gdata.youtube 
import gdata.youtube.service 
from gdata.service import RequestError 
from pub_author import KEY, NAME_REGEX 
def get_details(name): 
    yt_service = gdata.youtube.service.YouTubeService() 
    yt_service.developer_key = KEY 
    contact_feed = yt_service.GetYouTubeContactFeed(username=name) 
    contacts = [ e.title.text for e in contact_feed.entry ] 
    return contacts

我似乎無法獲得我需要的其他信息。 reference guide表示我可以從http://gdata.youtube.com/feeds/api/users/username/subscriptions?v=2（對於某些任意用戶）獲取XML源。但是，如果我試圖讓其他用戶的訂閱，我得到了一個403錯誤，消息如下：

用戶必須先登錄才能訪問這些訂閱。

如果我使用GDATA API：

sub_feed = yt_service.GetYouTubeSubscriptionFeed(username=name) 
sub = [ e.title.text for e in contact_feed.entry ]

然後我得到了同樣的錯誤。

如何在不登錄的情況下獲得這些訂閱？這應該是可能的，因爲您可以在不登錄Youtube網站的情況下訪問這些信息。

此外，似乎沒有特定用戶的訂閱者的訂閱源。這些信息是否可以通過API獲得？

編輯

所以，看來這無法通過API來完成。我不得不這樣做快速和骯髒的方式：

for f in `cat users.txt`; do wget "www.youtube.com/profile?user=$f&view=subscriptions" --output-document subscriptions/$f.html; done

然後使用這個腳本從下載HTML文件脫身的用戶名：

"""Extract usernames from a Youtube profile using regex""" 
import re 
def main(): 
    import sys 
    lines = open(sys.argv[1]).read().split('\n') 
    # 
    # The html files has two <a href="..."> tags for each user: once for an 
    # image thumbnail, and once for a text link. 
    # 
    users = set() 
    for l in lines: 
     match = re.search('<a href="/user/(?P<name>[^"]+)" onmousedown', l) 
     if match: 
      users.add(match.group('name')) 
    users = list(users) 
    users.sort() 
    print users 
if __name__ == '__main__': 
    main()

來源

2011-06-04 misha

爲了訪問用戶的訂閱供稿沒有用戶登錄後，用戶必須檢查他的Account Sharing settings下的「訂閱頻道」複選框。

目前，沒有直接的方式通過gdata API獲取頻道的訂閱者。事實上，它已經有一個突出的功能請求，它已經超過3年了！見Retrieving a list of a user's subscribers?。

來源

2011-06-04 17:00:43 Gregg

抓取YouTube用戶信息

回答

相關問題