這是我的代碼片段。我正在嘗試使用Scrapy刮取網站,然後將數據存儲在Elasticsearch中以進行索引。Scrapy:如何清理響應?
def parse(self, response):
for news in response.xpath('head'):
yield {
'pagetype': news.xpath('//meta[@name="pagetype"]/@content').extract(),
'description': news.xpath('//div[@class="module__content"]/*/node()/text()').extract(),
}
現在我的問題是保存在'description'字段中的值。
[u'\n \n ', u'"For\n many of us what we eat on Christmas day isn\'t what we would usually consume and\n that\u2019s perfectly ok," Dr said.', u'"However\n it is not uncommon for festive season celebrations to begin in November and\n continue well in to the New Year.', u'"So\n if health is on the agenda, being mindful about what we put into our bodies\n with a balanced approach, throughout the whole festive season, is important."', u"Dr\n , a lecturer at School\n Sciences, said balancing fresh, healthy food with being physically active was a\n good start.", u'"Whatever\n the celebration, try to limit processed foods, often high in fat, sugar and\n salt," she said.', u'"Taking\n time during holidays to prepare food and make the most of fresh ingredients is\n often a much healthier option than relying on convenience foods and take away.', u'"Being\n mindful about going back for seconds is important too.\xa0 We don\u2019t need to eat until we feel\n uncomfortable and eating the foods we enjoy doesn\'t necessarily mean we need to\n eat copious amounts."', u"Dr\n own healthy tips and substitutes for the Christmas season\n include:", u'But\n just because Dr is a dietitian, doesn\u2019t mean she doesn\u2019t enjoy a\n Christmas treat or two.', u'"I\n would have to say my sister in law\'s homemade rocky road is my favourite\n festive treat. She makes it every Christmas day and it gets better each year," she\n said.', u'"I\n also enjoy a summer cocktail every so often during the festive season and a\n mojito would be one of my favourites on Christmas day. We make it with extra\n mint from the garden which is a nice, fresh addition.', u'"Rather\n than focusing on food avoidance, moderation is the best approach.', u'"There\n are definitely some more healthy choices and some less healthy options when it\n comes to the typical Christmas day menu, but it\'s more important to be mindful\n of a healthy, balanced diet throughout the festive period, rather than avoiding\n specific foods on one day of the year."', u'\n ', u'\n \n ', u'\n ', u'\n \n ', u'\n ', u'\n ', u'\n ', u'\n ', u'\n ', u'\n ', u'\n ', u'Related News', u'\n ', u'\n ', u'\n ', u'\n ', u'\n ', u'\n ', u'Search for related news']
有很多空格的,換行代碼和「U」字母....
如何進一步處理這個代碼只包含普通文本,免費額外的空格,換行(\ n )代碼和'你'字母?
我讀到BeautifulSoup與Scrapy很好地合作,但我找不到任何有關如何將Scrapy與BeautifulSoup集成的例子。我也願意使用任何其他方法。任何幫助非常感謝。
感謝
相關:http://stackoverflow.com/q/21839877/4063051 – glS
'u'只是你在unicode列表中有文本的信息。如果你從列表中打印單個元素,那麼你會看到沒有'u' – furas
的文本很清楚,你只是想從這些字符串中刪除換行符和空格? – glS