我需要從BeautifulSoup中已解析的四個字符串中提取數據。它們是:如何從無間隔字符串提取數據?
Arkansas72.21:59 AM76.29:04 AM5.22977.37:59 AM
Ashley71.93:39 AM78.78:59 AM0.53678.78:59 AM
Bradley72.64:49 AM77.28:59 AM2.41877.28:49 AM
Chicot-40.19:04 AM-40.19:04 AM2.573-40.112:09 AM
從第一串中的數據,例如,是阿肯色州,72.1,上午01時59分,76.2,上午09時04分,5.2,29,77.3,和上午07時59。有沒有簡單的方法來做到這一點?
編輯:全碼
import urllib2
from bs4 import BeautifulSoup
import time
def scraper():
#Arkansas State Plant Board Weather Web data
url1 = 'http://170.94.200.136/weather/Inversion.aspx'
#opens url and parses HTML into Unicode
page1 = urllib2.urlopen(url1)
soup1 = BeautifulSoup(page1, 'lxml')
#print(soup.get_text()) gives a single Unicode string of relevant data in strings from the url
#Without print(), returns everything in without proper spacing
sp1 = soup1.get_text()
#datasp1 is the chunk with the website data in it so the search for Arkansas doesn't return the header
#everything else finds locations for Unicode strings for first four stations
start1 = sp1.find('Today')
end1 = sp1.find('new Sys.')
datasp1 = sp1[start1:end1-10]
startArkansas = datasp1.find('Arkansas')
startAshley = datasp1.find('Ashley')
dataArkansas = datasp1[startArkansas:startAshley-2]
startBradley = datasp1.find('Bradley')
dataAshley = datasp1[startAshley:startBradley-2]
startChicot = datasp1.find('Chicot')
dataBradley = datasp1[startBradley:startChicot-2]
startCleveland = datasp1.find('Cleveland')
dataChicot = datasp1[startChicot:startCleveland-2]
print(dataArkansas)
print(dataAshley)
print(dataBradley)
print(dataChicot)
還可以顯示'BeautifulSoup'特定部分?我懷疑問題可能在於你如何從HTML中提取這些數據。 – alecxe
你可以做正則表達式 – Copperfield
@Copperfield:正則表達式符合法案。但我認爲alecxe是正確的,認爲這是一個[XY問題](http://www.perlmonks.org/?node=XY+Problem)。 –