0
我是一個Python初學者,所以我想要做的就是用BeautifulSoup刮一個網站。在這個網頁源代碼的一小部分是HTML:用BeautifulSoup刮一條線
<table class="swift" width="100%">
<tr>
<th class="no">ID</th>
<th>Bank or Institution</th>
<th>City</th>
<th class="branch">Branch</th>
<th>Swift Code</th>
</tr> <tr>
<td align="center">101</td>
<td>BANK LEUMI ROMANIA S.A.</td>
<td>CONSTANTA</td>
<td>(CONSTANTA BRANCH)</td>
<td align="center"><a href="/romania/dafbro22cta/">DAFBRO22CTA</a></td>
</tr>
<tr>
<td align="center">102</td>
<td>BANK LEUMI ROMANIA S.A.</td>
<td>ORADEA</td>
<td>(ORADEA BRANCH)</td>
<td align="center"><a href="/romania/dafbro22ora/">DAFBRO22ORA</a></td>
</tr>
我設法爭取到了他們,但這樣的輸出:
ID
Bank or Institution
City
Branch
Swift Code
101
BANK LEUMI ROMANIA S.A.
CONSTANTA
(CONSTANTA BRANCH)
DAFBRO22CTA
102
BANK LEUMI ROMANIA S.A.
ORADEA
(ORADEA BRANCH)
DAFBRO22ORA
當我真正想要的是這樣的:
ID, Bank or Institution, City, Branch, Swift Code
101, BANK LEUMI ROMANIA S.A., CONSTANTA, (CONSTANTA BRANCH) ,DAFBRO22CTA
102, BANK LEUMI ROMANIA S.A., ORADEA, (ORADEA BRANCH), DAFBRO22ORA
這是我的代碼:
base_url = "https://www.theswiftcodes.com/"
nr = 0
page = 'page'
country = 'Romania'
while nr < 4:
url_country = base_url + country + '/' + 'page' + "/" + str(nr) + "/"
pages = requests.get(url_country)
soup = BeautifulSoup(pages.text, 'html.parser')
for script in soup.find_all('script'):
script.extract()
tabel = soup.find_all("table")
text = ("".join([p.get_text() for p in tabel]))
nr += 1
print(text)
file = open('swiftcodes.txt', 'a')
file.write(text)
file.close()
file = open('swiftcodes.txt', 'r')
for item in file:
print(item)
file.close()
你可以嘗試在代碼更新它嗎?像這樣理解它有點困難。 –
那麼代碼中只有2件事情。遍歷所有'tr'標籤。在'tr'標籤內迭代'td'標籤或'th'標籤,並將文本值存儲在'result'變量中。在'tr'迭代的每一端打印出來。 'strip'只是一個字符串操作來刪除逗號 –
所以你的代碼應該放在print(text)和file = open('swiftcodes.txt','a')之間 –