2017-04-19 132 views
0

UPDATE =我的腳本提取了以下文本,但我仍在努力獲取我需要的信息。使用Beaufifulsoup和請求從網站上刮掉內容

[<button class="button btn-cart" onclick="setLocation('http://www.urbanjunglestore.com/it/adidas-nmd-cs2-pk-ba7189.html')" title="Shop Now" type="button"><span><span>Shop Now</span></span></button>, <button class="button btn-cart" onclick="setLocation('http://www.urbanjunglestore.com/it/adidas-nmd-cs2-pk-w-ba7212.html')" title="Shop Now" type="button"><span><span>Shop Now</span></span></button>, <button class="button btn-cart" onclick="setLocation('http://www.urbanjunglestore.com/it/adidas-nmd-r2-pk-w-ba7560.html')" title="Shop Now" type="button"><span><span>Shop Now</span></span></button>, <button class="button btn-cart" onclick="setLocation('http://www.urbanjunglestore.com/it/adidas-ultraboost-x-bb0879.html')" title="Shop Now" type="button"><span><span>Shop Now</span></span></button>, <button class="button btn-cart" onclick="setLocation('http://www.urbanjunglestore.com/it/books-all-gone-book-2016.html')" title="Shop Now" type="button"><span><span>Shop Now</span></span></button>, <button class="button btn-cart" onclick="setLocation('http://www.urbanjunglestore.com/it/converse-ctas-modern-hi-156645c.html')" title="Shop Now" type="button"><span><span>Shop Now</span></span></button>, <button class="button btn-cart" onclick="setLocation('http://www.urbanjunglestore.com/it/converse-ctas-modern-hi-156646c.html')" title="Shop Now" type="button"><span><span>Shop Now</span></span></button>, <button class="button btn-cart" onclick="setLocation('http://www.urbanjunglestore.com/it/new-balance-m576-lifestyle-m576-pgw.html')" title="Shop Now" type="button"><span><span>Shop Now</span></span></button>, <button class="button btn-cart" onclick="setLocation('http://www.urbanjunglestore.com/it/nike-air-jordan-13-retro-low-310810-407.html')" title="Shop Now" type="button"><span><span>Shop Now</span></span></button>, <button class="button btn-cart" onclick="setLocation('http://www.urbanjunglestore.com/it/nike-air-jordan-4-retro-308497-117.html')" title="Shop Now" type="button"><span><span>Shop Now</span></span></button>, <button class="button btn-cart" onclick="setLocation('http://www.urbanjunglestore.com/it/puma-clyde-cny-fm-363637-01.html')" title="Shop Now" type="button"><span><span>Shop Now</span></span></button>, <button class="button btn-cart" onclick="setLocation('http://www.urbanjunglestore.com/it/puma-creeper-white-black-364462-01.html')" title="Shop Now" type="button"><span><span>Shop Now</span></span></button>, <button class="button btn-cart" onclick="setLocation('http://www.urbanjunglestore.com/it/puma-creeper-wrinkled-patent-364465-01.html')" title="Shop Now" type="button"><span><span>Shop Now</span></span></button>, <button class="button btn-cart" onclick="setLocation('http://www.urbanjunglestore.com/it/reebok-zoku-runner-ultk-is-bd5852.html')" title="Shop Now" type="button"><span><span>Shop Now</span></span></button>, <button class="button btn-cart" onclick="setLocation('http://www.urbanjunglestore.com/it/staple-fila-solid-pique-polo-1702p3795-blk.html')" title="Shop Now" type="button"><span><span>Shop Now</span></span></button>, <button class="button btn-cart" onclick="setLocation('http://www.urbanjunglestore.com/it/staple-fila-camo-poly-jkt-170203584-camo.html')" title="Shop Now" type="button"><span><span>Shop Now</span></span></button>, <button class="button btn-cart" onclick="setLocation('http://www.urbanjunglestore.com/it/adidas-eqt-support-adv-bb2791.html')" title="Shop Now" type="button"><span><span>Shop Now</span></span></button>, <button class="button btn-cart" onclick="setLocation('http://www.urbanjunglestore.com/it/adidas-eqt-support-adv-pk-ba7496.html')" title="Shop Now" type="button"><span><span>Shop Now</span></span></button>, <button class="button btn-cart" onclick="setLocation('http://www.urbanjunglestore.com/it/adidas-equipment-support-ultra-ba7474.html')" title="Shop Now" type="button"><span><span>Shop Now</span></span></button>, <button class="button btn-cart" onclick="setLocation('http://www.urbanjunglestore.com/it/adidas-nmd-r2-pk-bb2910.html')" title="Shop Now" type="button"><span><span>Shop Now</span></span></button>, <button class="button btn-cart" onclick="setLocation('http://www.urbanjunglestore.com/it/asics-gel-kayano-trainer-knit-h7s4n-4545.html')" title="Shop Now" type="button"><span><span>Shop Now</span></span></button>, <button class="button btn-cart" onclick="setLocation('http://www.urbanjunglestore.com/it/nike-air-jordan-13-retro-414571-122.html')" title="Shop Now" type="button"><span><span>Shop Now</span></span></button>, <button class="button btn-cart" onclick="setLocation('http://www.urbanjunglestore.com/it/nike-air-jordan-15-retro-881429-400.html')" title="Shop Now" type="button"><span><span>Shop Now</span></span></button>, <button class="button btn-cart" onclick="setLocation('http://www.urbanjunglestore.com/it/nike-air-jordan-6-retro-384664-113.html')" title="Shop Now" type="button"><span><span>Shop Now</span></span></button>, <button class="button btn-cart" onclick="setLocation('http://www.urbanjunglestore.com/it/nike-air-max-woven-boot-921854-002.html')" title="Shop Now" type="button"><span><span>Shop Now</span></span></button>, <button class="button btn-cart" onclick="setLocation('http://www.urbanjunglestore.com/it/nike-air-max-woven-boot-921854-001.html')" title="Shop Now" type="button"><span><span>Shop Now</span></span></button>, <button class="button btn-cart" onclick="setLocation('http://www.urbanjunglestore.com/it/nike-air-sock-racer-og-875837-001.html')" title="Shop Now" type="button"><span><span>Shop Now</span></span></button>, <button class="button btn-cart" onclick="setLocation('http://www.urbanjunglestore.com/it/nike-nikelab-air-max-1-pinnacle-859554-400.html')" title="Shop Now" type="button"><span><span>Shop Now</span></span></button>, <button class="button btn-cart" onclick="setLocation('http://www.urbanjunglestore.com/it/puma-clyde-premium-core-362632-03.html')" title="Shop Now" type="button"><span><span>Shop Now</span></span></button>, <button class="button btn-cart" onclick="setLocation('http://www.urbanjunglestore.com/it/reebok-cl-lthr-golden-neutrals-bd3744.html')" title="Shop Now" type="button"><span><span>Shop Now</span></span></button>, <button class="button btn-cart" onclick="setLocation('http://www.urbanjunglestore.com/it/reebok-club-c-85-gum-bs7635.html')" title="Shop Now" type="button"><span><span>Shop Now</span></span></button>, <button class="button btn-cart" onclick="setLocation('http://www.urbanjunglestore.com/it/checkout/cart/add/uenc/aHR0cDovL3d3dy51cmJhbmp1bmdsZXN0b3JlLmNvbS9pdC8,/product/10356/form_key/Ayxpa0t2JpTEfPBd/')" title="SHOP NOW" type="button"><span><span>SHOP NOW</span></span></button>, <button class="button btn-cart" onclick="setLocation('http://www.urbanjunglestore.com/it/checkout/cart/add/uenc/aHR0cDovL3d3dy51cmJhbmp1bmdsZXN0b3JlLmNvbS9pdC8,/product/10346/form_key/Ayxpa0t2JpTEfPBd/')" title="SHOP NOW" type="button"><span><span>SHOP NOW</span></span></button>, <button class="button btn-cart" onclick="setLocation('http://www.urbanjunglestore.com/it/checkout/cart/add/uenc/aHR0cDovL3d3dy51cmJhbmp1bmdsZXN0b3JlLmNvbS9pdC8,/product/10341/form_key/Ayxpa0t2JpTEfPBd/')" title="SHOP NOW" type="button"><span><span>SHOP NOW</span></span></button>, <button class="button btn-cart" onclick="setLocation('http://www.urbanjunglestore.com/it/checkout/cart/add/uenc/aHR0cDovL3d3dy51cmJhbmp1bmdsZXN0b3JlLmNvbS9pdC8,/product/10336/form_key/Ayxpa0t2JpTEfPBd/')" title="SHOP NOW" type="button"><span><span>SHOP NOW</span></span></button>] 

我目前正試圖從刮取的文本中提取「form_key」信息。在這個例子中,表單鍵是「Ayxpa0t2JpTEfPBd」 - 這是我想提取和打印的文本

您能否告訴我如何提取和打印信息。提前致謝!

+0

只是爲了澄清,你試圖從你的例子中的元素''Ayxpa0t2JpTEfPBd'? –

+0

是的隊友,如果你加載頁面和查看源代碼,它每次都位於同一個地方。 – Larsson

+0

你會發布你的所有代碼來獲取這個按鈕嗎?我不想重複工作,但我想我可以幫助你! –

回答

0

在這裏,你走了,這個代碼搜索在頁面中選擇一個,獲取onclick屬性,然後獲取表單鍵。正則表達式來自Robert的回答,所以一定要感謝他!

import requests 
from bs4 import BeautifulSoup 
import re 

url = "http://www.urbanjunglestore.com/" 

headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'} 

req = requests.request("GET", url, headers=headers, verify=False) 
response = BeautifulSoup(req.content, 
         "html.parser") 

all_buttons = response.find_all("button", title="SHOP NOW") 

one_button = all_buttons[0] 

onclick_attribute = one_button['onclick'] # this gets the text of the onclick attribute 

def get_form_key_from_onclick_attr(attr_text): 
    """ use a regex to extract the form key from the onclick attribute text """ 
    results = re.search('.*/form_key/([^/]+)/.*', attr_text) 
    return results.group(1) 

get_form_key_from_onclick_attr(onclick_attribute) 
+1

哇!謝謝,這比我之前使用的方法簡單得多。這對我來說是睜大眼睛,我無法感謝你足夠 – Larsson

+0

沒問題!我很高興能幫助:) –

1

您可以提取form_key使用正則表達式:

In [1]: s = 'http://www.urbanjunglestore.com/it/checkout/cart/add/uenc/aHR0cDovL3d3dy51cmJhbmp1bmdsZXN0b3JlLmNvbS9pdC8,/product/10356/form_key/Ayxpa0t2JpTEfPBd/' 

In [2]: import re 

In [3]: m = re.search('.*/form_key/([^/]+)/.*', s) 

In [4]: m.group(1) 
Out[4]: 'Ayxpa0t2JpTEfPBd' 

因此,以配合您的示例,您可以執行以下操作:對於按鈕

import re 

s = """onclick="setLocation('http://www.urbanjunglestore.com/it/checkout/cart/add/uenc/aHR0cDovL3d3dy51cmJhbmp1bmdsZXN0b3JlLmNvbS9pdC8,/product/10356/form_key/Ayxpa0t2JpTEfPBd/')"><span><span>SHOP NOW</span></span></button>""" 
m = re.search('.*/form_key/([^/]+)/.*', s) 

if m: 
    print m.group(1) 
+0

我期待在源頁面找到它 – Larsson

+0

您提供的字符串是在頁面源代碼中嗎? –

+0

是的,如果你搜索「form_key」,你會看到它的位置 – Larsson