我似乎無法像我想要的那樣使用正則表達式。正則表達式與美麗的湯,提取':'後的所有字母
當我運行這段代碼,我得到下面
for paragraph in soup.find_all('p'):
print(paragraph.find_all(text =re.compile(":*\w*")))
我得到的文本文字
Continuing our series of surfacing 2016 stinkers, here are the 25 Russell 2000 stocks that imploded in 2016. Further down, you'll find the 25 worst stocks excluding pharma. Ophthotech (NASDAQ:OPHT) -94% Galena Biopharma (NASDAQ:GALE) -93% Cempra (NASDAQ:CEMP) -91% Toaki Pharma (NASDAQ:TKAI) -89% Anthera Pharma (NASDAQ:ANTH) -86% Adeptus Health (NYSE:ADPT) -86% CytRx (NASDAQ:CYTR) -86% Novavax (NASDAQ:NVAX) -85%
這只是要提取股票代碼所以理想的輸出是:
OPHT
GALE
CEMP
TKAI
等等。
我想這些代碼的變化:
for paragraph in soup.find_all('p'):
print(paragraph.find_all(text =re.compile('(:\w+)')))
for paragraph in soup.find_all('p'):
print(paragraph.find_all(text =re.compile("(:*\w*)")))
for paragraph in soup.find_all('p'):
print(paragraph.find_all(text =re.compile('(:)?\w+')))
但大部分我結束了與
`['Continuing our ', 'series', " of surfacing 2016 stinkers, here are the 25 Russell 2000 stocks that imploded in 2016. Further down, you'll find the 25 worst stocks excluding pharma."]
['Ophthotech (NASDAQ:', 'OPHT', ') -94%']
['Galena Biopharma (NASDAQ:', 'GALE', ') -93%']
['Cempra (NASDAQ:', 'CEMP', ') -91%']
['Toaki Pharma (NASDAQ:', 'TKAI', ') -89%']
['Anthera Pharma (NASDAQ:', 'ANTH', ') -86%']
['Adeptus Health (NYSE:', 'ADPT', ') -86%']
['CytRx (NASDAQ:', 'CYTR', ') -86%']
['Novavax (NASDAQ:', 'NVAX', ') -85%']`
不知道我在做什麼錯輸出的時間。
謝謝。
是什麼,你正在試圖解析看原文喜歡? – serk