如何刮掉Quora個人資料頁面的「更多」部分？

爲了確定Quora上所有主題的列表，我決定從抓取個人資料頁面開始，其中有很多主題，例如， http://www.quora.com/Charlie-Cheever/topics。我從這個頁面上刪除了主題，但是現在我需要從頁面底部的「更多」按鈕上加載的Ajax頁面上抓取主題。我試圖找到點擊「更多」按鈕時執行的JavaScript功能，但沒有運氣。以下是這可能是相關的HTML網頁放置三個片段：如何刮掉Quora個人資料頁面的「更多」部分？

<div class=\"pager_next action_button\" id=\"__w2_mEaYKRZ_more\">More</div> 
{\"more_button\": \"mEaYKRZ\"} 

\"dPs6zd5\": {\"more_button\": \"more_button\"} 

new(PagedListMoreButton)(\"mEaYKRZ\",\"more_button\",{},\"live:ld_c5OMje_9424:cls:a.view.paged_list:PagedListMoreButton:/TW7WZFZNft72w\",{})

請問你們的人知道「更多」按鈕點擊時執行的JavaScript函數的名稱？任何幫助，將不勝感激:)

的Python腳本在這一點上（隨後this教程）看起來是這樣的：

#just prints topics followed by Charlie Cheevers from the 1st page 
#!/usr/bin/python 
import httplib2,time,re 
from BeautifulSoup import BeautifulSoup 
SCRAPING_CONN = httplib2.Http(".cache") 

def fetch(url,method="GET"): 
    return SCRAPING_CONN.request(url,method) 

def extractTopic(s): 
    d = {} 
    d['url'] = "http://www.quora.com" + s['href'] 
    d['topicName'] = s.findChildren()[0].string 
    return d 

def fetch_stories(): 
    page = fetch(u"http://www.quora.com/Charlie-Cheever/topics") 
    soup = BeautifulSoup(page[1]) 
    stories = soup.findAll('a', 'topic_name') 
    topics = [extractTopic(s) for s in stories] 
    for t in topics: 
     print u"%s, %s\n" % (t['topicName'],t['url']) 

stories = fetch_stories()

來源

2011-09-30 Arman

嗨阿曼，我正在做類似的事情。您是否找到解決方案？ –

你可以看到它在你的瀏覽器的DOM督察事件監聽器。這是一個匿名函數，看起來像這樣：

function(){return typeof d!=="undefined"&&!d.event.triggered?d.event.handle.apply(l.elem,arguments):b}

這看起來像一個困難的網站湊，你可能會考慮使用硒。

來源

2011-09-30 23:43:59 pguardiario

如何刮掉Quora個人資料頁面的「更多」部分？

回答

相關問題