1
我試圖颳去 http://virtuacareers.com/new-jersey/staff-nurse/jobid3462987-registered-nurse-%28rn%29-jobsscrapy刮數據中包含的JavaScript
一個數據,我想從這個頁面的鏈接,但是當我看着我的csv文件,鏈接是:
javascript:GetApplyClickCount('https://careers-virtua.icims.com/jobs/5587/1024245/job?apply=yes&hashed=58168622', 'http://virtuacareers.com/list.aspx?state=voorhees&category=staff+nurse&jobtitle=registered+nurse+(rn)&jobid=3025458&dmaid=1286&dmaname=voorhees', 'SameWindow', 'scrollbars=1, toolbar=1, resizable=1, location=1, directories=1, status=1, menubar=1, copyhistory=1, fullscreen=1', 'true', '0', '0', 'virtuacareers.com', '', '', '3025458', 'Registered Nurse (RN)','212','True','','False');
什麼,我只希望得到的是:
https://careers-virtua.icims.com/jobs/5587/1024245/job?apply=yes&hashed=58168622
我應該怎麼做這些?這是我對這個
linker = hxs.select('//div[@class="box jobDesc"]/a')
item ["link"] = linker.select('@href').extract()
有錯誤發生 例外。 TypeError:期望的字符串或緩衝區 – chano
嘗試'item [「link」] = re.search(「\'(?P https?:// [^ \ s] +)\'」,鏈接[0])。 (「url」)' –
@alecxe ind eed;)不客氣 –