Javascript的Python刮刀？

任何人都可以指導我一個很好的Python屏幕抓取JavaScript代碼庫（希望有一個很好的文檔/教程）？我想看看那裏有什麼選擇，但最重要的是以最快的結果學習最簡單......想知道是否有人有經驗。我聽說過一些關於spidermonkey的東西，但也許有更好的那些呢？Javascript的Python刮刀？

具體來說，我使用BeautifulSoup和機械化來到這裏，但需要一種方法來打開JavaScript彈出窗口，提交數據，並在javascript彈出窗口下載/解析結果。

<a href="javascript:openFindItem(12510109)" onclick="s_objectID=&quot;javascript:openFindItem(12510109)_1&quot;;return this.s_oc?this.s_oc(e):true">Find Item</a>

我想用Google App引擎和Django實現這個。謝謝！

來源

2010-05-28 Diego

我通常做的是在這些情況下自動實際瀏覽器，並從那裏抓取處理過的HTML。

編輯：

這裏是自動的InternetExplorer瀏覽到一個URL並抓住頁面加載後的名稱和位置的一個例子。

from win32com.client import Dispatch 

from ctypes import Structure, pointer, windll 
from ctypes import c_int, c_long, c_uint 
import win32con 
import pywintypes 

class POINT(Structure): 
    _fields_ = [('x', c_long), 
       ('y', c_long)] 
    def __init__(self, x=0, y=0): 
     self.x = x 
     self.y = y 

class MSG(Structure): 
    _fields_ = [('hwnd', c_int), 
       ('message', c_uint), 
       ('wParam', c_int), 
       ('lParam', c_int), 
       ('time', c_int), 
       ('pt', POINT)] 

def wait_until_ready(ie): 
    pMsg = pointer(MSG()) 
    NULL = c_int(win32con.NULL) 

    while True: 

     while windll.user32.PeekMessageW(pMsg, NULL, 0, 0, win32con.PM_REMOVE) != 0: 
      windll.user32.TranslateMessage(pMsg) 
      windll.user32.DispatchMessageW(pMsg) 

     if ie.ReadyState == 4: 
      break 


ie = Dispatch("InternetExplorer.Application") 

ie.Visible = True 

ie.Navigate("http://google.com/") 

wait_until_ready(ie) 

print "title:", ie.Document.Title 
print "location:", ie.Document.location

來源

2010-05-28 03:08:49

與硒相似嗎？我試過用這種方法自動化，但是在生成的python源代碼時遇到了一些麻煩。我需要遵循這種類型的所有JavaScript鏈接，並從每個 – Diego 2010-05-28 03:49:06

下載/解析數據我只是直接自動瀏覽器。在Windows上，您可以使用Internet Explorer執行此操作，也可以使用WebKit以跨平臺方式執行此操作。 – 2010-05-28 06:01:51

如何在linux中解決？ – 2010-11-03 14:42:33

我使用Python綁定到webkit來呈現基本JavaScript，並使用Chickenfoot進行更高級的交互。有關更多信息，請參閱this webkit example。

來源

2010-05-28 14:37:41 hoju

您還可以使用名爲Spynner的「程序化Web瀏覽器」。我發現這是最好的解決方案。相對容易使用。

來源

2011-05-30 06:13:57

Javascript的Python刮刀？

回答

相關問題