如何從具有class和id的html文件中選擇scrapy中的數據？

-1

<div class="section-body" id="section-2"><p>Most people with aortic stenosis do not develop symptoms until the disease is advanced. The diagnosis may have been made when the health care provider heard a heart murmur and performed tests.</p><p>Symptoms of aortic stenosis include:</p><ul><li>Chest discomfort: The chest pain may get worse with activity and reach into the arm, neck, or jaw. The chest may also feel tight or squeezed.</li><li>Cough, possibly bloody.</li><li>Breathing problems when exercising.</li><li>Becoming easily tired.</li><li>Feeling the heartbeat (palpitations).</li><li>Fainting, weakness, or dizziness with activity.</li></ul><p>In infants and children, symptoms include:</p><ul><li>Becoming easily tired with exertion (in mild cases)</li><li>Failure to gain weight</li><li>Poor feeding</li><li>Serious breathing problems that develop within days or weeks of birth (in severe cases)</li></ul><p>Children with mild or moderate aortic stenosis may get worse as they get older. They are also at risk for a heart infection called bacterial endocarditis.</p></div></div></section>

我上面的腳本，我想放棄在列表中的數據。即在我已經在scrapy中嘗試了以下命令，但無法正常工作。它將'[]'作爲輸出。

response.css("article div.section-body p").extract() <-- this is giving all info under section body but I want only under section-2 
    response.css("article div.section-body.section-2 p::text").extract() 
response.xpath("//article/*[contains(@id, 'setion-2')]").extract()

請幫我解壓。由於

來源

2017-03-04 Shubham B.

嘗試

response.css("article div.section-body#section-2 p::text").extract()

div.section-body#section-2是指同時具有section-body class和id section-2

注意，ID是由#選擇類是由.選擇......所以你的CSS選擇器張貼在選擇DIV你的問題是錯誤的。

來源

2017-03-04 15:50:04 Umair

進口scrapy 類QuotesSpider（scrapy.Spider）：名稱= 「醫學」 start_urls = [ 'https：//開頭medlineplus.gov/ency /條/ 000178.htm'] DEF解析（self，response）： yeild {主題：'response.css（'title :: text'）。extract_first文本「）。extract（） }當我運行這個 - > scrapy抓取醫療-o medical.json 它沒有給任何輸出ut在json文件中。 –

是否在CLI /終端中顯示Scrapy日誌中的抓取數據？ – Umair

不，它沒有顯示要抓取的數據，它在終端上顯示一些錯誤------ >> Traceback（最近呼叫的最後一個）：文件「c：\ python27 \ lib \ site-packages \ twisted \ internet \ defer.py「，第653行，在_runCallbacks中 current.result = callback（current.result，* args，** kw）文件」F：\ tutorial \ tutorial \ spiders \ quotes_spider.py「，第11行，在解析中 yeild NameError：全局名稱'yeild'未定義我嘗試過縮進校正但未起作用 –

如何從具有class和id的html文件中選擇scrapy中的數據？

回答

相關問題