2
這是我的代碼從易趣即LINK3拿到物品的網址:如何讓scrapy中的start_urls獲取由另一個python函數生成的url?
def url_soup(url):
source=(urllib2.urlopen(url)).read()
soup=BeautifulSoup(source)
link=soup.select('a.ListItemLink')
for links in link:
link3=('http://www.ebay.com/'+'%s') % (links['href'])
Dept={"All Departments":"0","Apparel":"5438","Auto":"91083","Baby":"5427","Beauty":"1085666",
"Books":"3920","Electronics":"3944","Gifts":"1094765","Grocery":"976759","Health":"976760",
"Home":"4044","Home Improvement":"1072864","Jwelery":"3891","Movies":"4096","Music":"4104",
"Party":"2637","Patio":"5428","Pets":"5440","Pharmacy":"5431","Photo Center":"5426",
"Sports":"4125","Toys":"4171","Video Games":"2636"}
def gen_url(keyword,domain):
if domain in Dept.keys():
main_url=('http://www.ebay.com/search/search-ng.do?search_query='+'%s'+'&ic=16_0&Find=Find&search_constraint='+'%s') % (keyword,Dept.get(domain))
url_soup(main_url)
gen_url('Bags','Apparel')
現在我想我的蜘蛛來接start_urls
爲link3
每次。 P.s.我是scrapy的新手!
感謝您的幫助!現在發生了什麼事是請求URL parse_data它不起作用。同時給我的網址作爲輸出。手段爬行沒有正確發生,或者沒有得到這些特定網址的迴應。 – user3488659
@ user3488659我已經更新了代碼 - 顯示了我現在使用的內容。至少有一個問題:易趣在'start_requests'的搜索頁面顯示404:'http://www.ebay.com/search/search-ng.do ...'。你確定這是你需要的正確搜索網址嗎? – alecxe
哦,我很抱歉that.i剛剛註冊,所以SO不允許我粘貼兩個以上的鏈接,我將它們改爲example.com,但仍然無法繼續,所以試着用ebay完成。我真的很抱歉。這實際上是沃爾瑪。對造成的不便表示歉意。 – user3488659