用BeautifulSoup刮一條線

我是一個Python初學者，所以我想要做的就是用BeautifulSoup刮一個網站。在這個網頁源代碼的一小部分是HTML：用BeautifulSoup刮一條線

<table class="swift" width="100%"> 
    <tr> 
    <th class="no">ID</th> 
    <th>Bank or Institution</th> 
    <th>City</th> 
    <th class="branch">Branch</th> 
    <th>Swift Code</th> 
    </tr> <tr> 
    <td align="center">101</td> 
    <td>BANK LEUMI ROMANIA S.A.</td> 
    <td>CONSTANTA</td> 
    <td>(CONSTANTA BRANCH)</td> 
    <td align="center"><a href="/romania/dafbro22cta/">DAFBRO22CTA</a></td> 
    </tr> 
    <tr> 
    <td align="center">102</td> 
    <td>BANK LEUMI ROMANIA S.A.</td> 
    <td>ORADEA</td> 
    <td>(ORADEA BRANCH)</td> 
    <td align="center"><a href="/romania/dafbro22ora/">DAFBRO22ORA</a></td> 
    </tr>

我設法爭取到了他們，但這樣的輸出：

ID 
Bank or Institution 
City 
Branch 
Swift Code 

101 
BANK LEUMI ROMANIA S.A. 
CONSTANTA 
(CONSTANTA BRANCH) 
DAFBRO22CTA 


102 
BANK LEUMI ROMANIA S.A. 
ORADEA 
(ORADEA BRANCH) 
DAFBRO22ORA

當我真正想要的是這樣的：

ID, Bank or Institution, City, Branch, Swift Code 

101, BANK LEUMI ROMANIA S.A., CONSTANTA, (CONSTANTA BRANCH) ,DAFBRO22CTA 

102, BANK LEUMI ROMANIA S.A., ORADEA, (ORADEA BRANCH), DAFBRO22ORA

這是我的代碼：

base_url = "https://www.theswiftcodes.com/" 
nr = 0 
page = 'page' 
country = 'Romania' 
while nr < 4: 
    url_country = base_url + country + '/' + 'page' + "/" + str(nr) + "/" 
    pages = requests.get(url_country) 
    soup = BeautifulSoup(pages.text, 'html.parser') 

    for script in soup.find_all('script'): 
     script.extract() 

    tabel = soup.find_all("table") 
    text = ("".join([p.get_text() for p in tabel])) 
    nr += 1 
    print(text) 

    file = open('swiftcodes.txt', 'a') 
    file.write(text) 
    file.close() 

    file = open('swiftcodes.txt', 'r') 
    for item in file: 
     print(item) 
    file.close()

來源

2016-11-29 Nita Alexandru

這應該做的伎倆

from bs4 import BeautifulSoup 

str = """<table class="swift" width="100%"> 
    <tr> 
    <th class="no">ID</th> 
    <th>Bank or Institution</th> 
    <th>City</th> 
    <th class="branch">Branch</th> 
    <th>Swift Code</th> 
    </tr> <tr> 
    <td align="center">101</td> 
    <td>BANK LEUMI ROMANIA S.A.</td> 
    <td>CONSTANTA</td> 
    <td>(CONSTANTA BRANCH)</td> 
    <td align="center"><a href="/romania/dafbro22cta/">DAFBRO22CTA</a></td> 
    </tr> 
    <tr> 
    <td align="center">102</td> 
    <td>BANK LEUMI ROMANIA S.A.</td> 
    <td>ORADEA</td> 
    <td>(ORADEA BRANCH)</td> 
    <td align="center"><a href="/romania/dafbro22ora/">DAFBRO22ORA</a></td> 
    </tr>""" 

soup = BeautifulSoup(str) 

for i in soup.find_all("tr"): 
    result = "" 
    for j in i.find_all("th"): # find all the header tags 
     result += j.text + ", " 
    for j in i.find_all("td"): # find the cell tags 
     result += j.text + ", " 
    print(result.rstrip(', '))

輸出：

ID, Bank or Institution, City, Branch, Swift Code 
101, BANK LEUMI ROMANIA S.A., CONSTANTA, (CONSTANTA BRANCH), DAFBRO22CTA 
102, BANK LEUMI ROMANIA S.A., ORADEA, (ORADEA BRANCH), DAFBRO22ORA

來源

2016-11-29 13:03:19

你可以嘗試在代碼更新它嗎？像這樣理解它有點困難。 –

那麼代碼中只有2件事情。遍歷所有'tr'標籤。在'tr'標籤內迭代'td'標籤或'th'標籤，並將文本值存儲在'result'變量中。在'tr'迭代的每一端打印出來。 'strip'只是一個字符串操作來刪除逗號 –

所以你的代碼應該放在print（text）和file = open（'swiftcodes.txt'，'a'）之間 –

from bs4 import BeautifulSoup 
import requests 
r = requests.get('https://www.theswiftcodes.com/united-states/') 
soup = BeautifulSoup(r.text, 'lxml') 
rows = soup.find(class_="swift").find_all('tr') 
th = [th.text for th in rows[0].find_all('th')] 
print(th) 
for row in rows[1:]: 
    cell = [i.text for i in row.find_all('td', colspan=False)] 
    print(cell)

出來：

['ID', 'Bank or Institution', 'City', 'Branch', 'Swift Code'] 
['1', '1ST CENTURY BANK, N.A.', 'LOS ANGELES,CA', '', 'CETYUS66'] 
['2', '1ST PMF BANCORP', 'LOS ANGELES,CA', '', 'PMFAUS66'] 
['3', '1ST PMF BANCORP', 'LOS ANGELES,CA', '', 'PMFAUS66HKG'] 
['4', '3M COMPANY', 'ST. PAUL,MN', '', 'MMMCUS44'] 
['5', 'ABACUS FEDERAL SAVINGS BANK', 'NEW YORK,NY', '', 'AFSBUS33'] 
[] 
['6', 'ABBEY NATIONAL TREASURY SERVICES LTD US BRANCH', 'STAMFORD,CT', '', 'ANTSUS33'] 
['7', 'ABBOTT LABORATORIES', 'ABBOTT PARK,IL', '', 'ABTTUS44'] 
['8', 'ABBVIE, INC.', 'CHICAGO,IL', '', 'ABBVUS44'] 
['9', 'ABEL/NOSER CORP', 'NEW YORK,NY', '', 'ABENUS3N']

來源

2016-11-30 11:17:52

用BeautifulSoup刮一條線

回答

相關問題