2017-03-08 47 views
1

的HTML元素的下方,Scrapy如何提取樣式屬性?

<div style="width: 80.42%;" class="classA"></div> 

使用這個代碼,我可以提取的整體風格元素:

response.xpath("//div[@class='classA']").xpath("@style").extract() 

但我希望得到的風格元素的寬度值,即80.42%,我能怎麼做?

回答

0

你可以使用.RE(),像這樣:

response.xpath("//div[@class='classA']").xpath("@style").re('width: (\d+\.\d+%)') 

它可以工作

2

你可以使用cssutils,先用安裝:

$ pip install cssutils 

然後在你的代碼中使用它:

import cssutils 
... 

css_style = response.xpath("//div[@class='classA']/@style").extract() 
parsed_css = cssutils.parseStyle(css_style) 
print parsed_css.width # 80.42% 
0

我也只是把它當作一個文本字符串,並根據需要把它分解:

text = '<div style="width: 80.42%;" class="classA"></div>' 

if "width:" in text: 
    # split at first occurance of "width:" take everything thereafter 
    text = text.split("width:",1)[1] 
    # split at first semicolon take everything before 
    text = text.split(";",1)[0] 
    # strip whitespace 
    text = " ".join(text.split()) 

print text 

>>>80.42% 

或使用百分號代替分號:

text = '<div style="width: 80.42%;" class="classA"></div>)' 

if "width:" in text: 
    # split after width 
    text = text.split("width:",1)[1] 
    # split before percent 
    text = text.split("%",1)[0] 
    # add back percent 
    text += '%' 
    # strip whitespace 
    text = " ".join(text.split()) 


print text 

>>>80.42% 

或簡潔

text = '<div style="width: 80.42%;" class="classA"></div>)' 

if "width:" in text: 
    text = " ".join(((text.split("width:",1)[1]).split("%",1)[0]+'%').split()) 

print text 

>>>80.42%