Scrapy是節約網址三重斜槓///

我不知道爲什麼scrapy是這樣做的，但它發生在不同的地方兩次。Scrapy是節約網址三重斜槓///

我認爲兩次是因爲我試圖在http:添加到URL。

item['product_link'] = urljoin(ABS_URL,''.join(item['product_link']).replace('/', '').encode('utf-8').strip())

ABS被添加http: 還試圖將它添加那裏，但我一直都想與3 ///如果我不添加任何東西的項目只有一個/

來源

2017-08-12 Ignacio Art

那怎麼urljoin作品。如果基僅包含方案（而不是任何域部分），結果將包含三斜槓：

>>> urlparse.urljoin('http://', 'foo.html') 
'http:///foo.html' 
>>> urlparse.urljoin('http:', 'foo.html') 
'http:///foo.html' 
>>> urlparse.urljoin('http://foo', 'bar.html') 
'http://foo/bar.html'

從你的代碼看起來你用它只會增加計劃，以形成product_link。在這種情況下，簡單的拼接就足夠了：

item['product_link'] = 'http:' + ''.join(item['product_link']).replace('/', '').encode('utf-8').strip()

來源

2017-08-12 06:34:43

Scrapy是節約網址三重斜槓///

回答

相關問題