我對scrapy和python一般都很新,但我真的很想學習,並且我已經給了這個很大的努力!我正在嘗試抓取eb5info.com,選擇每個區域中心,然後複製每個區域的電話號碼和電子郵件。但是,當我抓取時,它通知我有0個網站被抓取。任何幫助將非常感謝!通過web抓取scrapy/python來提取聯繫人信息
這裏是我的蜘蛛:
from scrapy.item import Item, Field
class Eb5Item(Item):
description = Field()
from scrapy.contrib.spiders import CrawlSpider, Rule
from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor
from scrapy.selector import HtmlXPathSelector
from parser_module.items import Eb5Item
class Eb5Spider(CrawlSpider):
name = 'eb5'
allowed_domains = ["eb5info.com"]
start_urls = ["http://eb5info.com/regional-centers"]
rules = (Rule(SgmlLinkExtractor(allow=[r'regional-centers/*$']), callback='parse_item'),)
def parse_item(self, response):
hxs = HtmlXPathSelector(response)
sites = hxs.select('//ul/li/a/@href')
items = []
for site in sites:
item = Eb5Item()
item['url'] = response.url
item['phone'] = site.select("()").extract()
items.append(item)
return (items)
,這裏是我的項目文件:
from scrapy.item import Item, Field
class Eb5Item(Item):
# define the fields for your item here like:
name = Field()
email = Field()
name = Field()
description = Field()
phone = Field()
pass
太謝謝你了!
我看到你刪除了你自己的問題並轉貼了它。如果沒有明顯的區別,請不要這樣做。 – Manhattan 2014-10-30 22:00:59
我的歉意是,對這個網站不熟悉,我無意中將一個極其無用的編輯標記爲我的問題的解決方案,並且一般擔心我的問題仍然會被註冊爲已解決。 – 2014-10-30 22:08:57