2012-09-08 87 views
2

我從一個股票網站刮板尋找像這樣的列表: [...... ' 'XLNX> XLNX <',' YHOO> YHOO <「]Python列表正則表達式

怎麼可以只用引號獲得字典?我知道這很簡單,但我可以使用一些幫助。謝謝

import urllib 
import re 

base_url = 'http://www.nasdaq.com/markets/indices/nasdaq-100.aspx' 
content = urllib.urlopen(base_url).read() 
list = re.findall('http://www.nasdaq.com/symbol/(.*)/a>', content) 
print list 

回答

2

你有一個列表,而不是一本詞典。此外,您不應將變量名稱list命名爲內置變量名稱。

>>> content 
['xlnx>XLNX<', 'yhoo>YHOO<'] 
>>> tickers = [] 
>>> for s in content: 
...  tickers.append(''.join(i for i in s if i.isupper())) 
... 
>>> tickers 
['XLNX', 'YHOO'] 
+0

爲什麼不能'.isupper()'? – DSM

+0

@DSM導致早上凌晨1點:) –

1

你應該用HTML解析器解析HTML(我總是建議BeautifulSoup),不與正則表達式:

import re, urllib2 
from BeautifulSoup import BeautifulSoup 

url = 'http://www.nasdaq.com/markets/indices/nasdaq-100.aspx' 
soup = BeautifulSoup(urllib2.urlopen(url)) 

for link in soup.findAll('a', href=re.compile('/symbol/'))[1:]: 
    print link.text 

輸出:

ATVI 
ADBE 
AKAM 
ALXN 
ALTR 
AMZN 
AMGN 
APOL 
AAPL 
AMAT 
ADSK 
ADP 
AVGO 
BIDU 
BBBY 
BIIB 
BMC 
BRCM 
CHRW 
CA 
CELG 
CERN 
CHKP 
CSCO 
CTXS 
CTSH 
CMCSA 
COST 
DELL 
XRAY 
DTV 
DLTR 
EBAY 
ERTS 
EXPE 
EXPD 
ESRX 
FFIV 
FAST 
FISV 
FLEX 
FOSL 
GRMN 
GILD 
GOOG 
GMCR 
HSIC 
INFY 
INTC 
INTU 
ISRG 
KLAC 
KFT 
LRCX 
LINTA 
LIFE 
LLTC 
MRVL 
MAT 
MXIM 
MCHP 
MU 
MSFT 
MNST 
MYL 
NTAP 
NFLX 
NUAN 
NVDA 
NWSA 
ORLY 
ORCL 
PCAR 
PAYX 
PCLN 
PRGO 
QCOM 
RIMM 
ROST 
SNDK 
STX 
SHLD 
SIAL 
SIRI 
SPLS 
SBUX 
SRCL 
SYMC 
TXN 
VRSN 
VRTX 
VIAB 
VMED 
VOD 
WCRX 
WFM 
WYNN 
XLNX 
YHOO 
+0

你的意思是美麗和一個正則表達式:-) – 6502

+0

哈我需要得到b湯。感謝壽 – user1657121

1

事情是這樣的:

>>> lis=['xlnx>XLNX<', 'yhoo>YHOO<'] 
>>> [x[x.index('>')+1:x.index('<')] for x in lis] 
['XLNX', 'YHOO']