2015-06-24 72 views
1

我只需要簡單地處理USPTO商標網站以獲得簡單模式。使用Mechanize和python處理USPTO網站

#!/usr/bin/python 

import mechanize 
import cookielib 
br=mechanize.Browser() 
cg = cookielib.LWPCookieJar() 
br.set_cookiejar(cg); 

#br.set_all_readonly(False) 
br.set_handle_robots(False) 
br.set_handle_refresh(False) 
br.addheaders=[('User-agent', 'Firefox')] 

response=br.open("http://uspto.gov/trademarks-application-process/search-trademark-database") 

tess = 'TESS' 
start_search = 'Basic Word Mark Search (New User)' 

assert br.viewing_html() 
print br.title() 

for l in br.links(url_regex='tmsearch'): 
     if l.text == tess: 
       print l.url; 
       break 

br.follow_link(l) 
newlink=br.geturl() 
print newlink 

br.open(newlink) 
for link in br.links(): 
     if link.text == start_search: 
       print "Found Basic Search" 
       print link.text 
       print link.url 
       break; 
**#Why do we need the contactination. Witoug this it doesn't generate a full URL** 

newurl="http://tmsearch.uspto.gov" + link.url 
print newurl 
response1 = br.open(newurl); 

print response1.read() 

#for form in br.forms(): 
     #print "Form Name" form.name 

兩個問題。

  1. 沒有手動連接前綴,我沒有得到完整的網址在這一步。
  2. 節目的最後結束時,我得到一些警告時,它說的形式英寸
  3. 最後,我想輸入「搜索術語」一些搜索文本,我假定這是一個形式!但無法達到它。然後提交。接下來是跟進後面顯示的表格。

回答

0

那麼;

  1. 設置你的HTTP變量的一個變量,只是通過它作爲newurl = oldurl + link.url,你總是可以做到在開始br.open(oldurl + "w/e goes here")

  2. for i in response1.forms(): print "Form name:", i.name

  3. 需要選擇的形式,發送文本,然後點擊提交..這裏是一些提示:

    for form in br.forms(): 
        if form.attrs['id'] == 'search': 
        br.form = form 
        break 
    br["search"] = "text_search" 
    br.submit()