使用BeautifulSoup來提取ID和報告對，並返回一個字典上的ID作爲報告的值

我是Python編程新手，我想使用BeautifulSoup從HTML文件中提取案例ID和EKG報告對並返回作爲以ID爲鍵的ID字典，其值是報告。使用BeautifulSoup來提取ID和報告對，並返回一個字典上的ID作爲報告的值

我寫了下面的代碼，但它是令人沮喪的：

from bs4 import BeautifulSoup 
import urllib2 

def extractReports(filename): 
report = {} 

soup3 = BeautifulSoup(urllib2.urlopen(filename)) 
txt = soup3.get_text() 

for row in txt: 
    report[row[0]].append(row[1:]) 
return report

下面是HTML文件的原件的一部分，我想是這樣

{'344':'|Normal sinus rhythm|Right bundle branch block|Abnor', '345':'|Normal sinus rhythm|Left axis deviation','346':'|Normal sinus rhythm|Normal ECG|When compared with E'....}

能否請你幫我修復/改進我的代碼？非常感謝

enter image description here

來源

2014-02-25 neymar

現在的結果是什麼？ –

如果這是原始的html文件，那麼你有問題，我不會使用BS。 BS需要標籤來幫助建立一棵樹，然後使用這些標籤來標識你需要的樹的部分。當我第一次看到這個時，我認爲這很容易，因爲你會有類似tr的表格，td tr等的東西，但如果這是一個複雜設計的html頁面，你可能需要做一些文本處理，你可以發佈更多 - 我想查看哪種容器包含括號中的文本 – PyNEwbie

請提供準確的HTML以幫助您。 –

而無需HTML源代碼，它看起來像你可能需要更多的東西一樣：

def extractReports(filename): 
    report = {} 
    soup3 = BeautifulSoup(urllib2.urlopen(filename)) 
    txt = soup3.findall("tr") 
    for row in txt: 
     if some_condition: 
       children = txt.findChildren() 
       for child in children: 
        '''check if key/value, then add to dict''' 


    return report

這裏的關鍵是兩個部分：第一，使用findall()讓所有的行中的行，然後篩選那些你想要的行。一旦你有了這些行，使用findChildren()來獲得你需要填寫字典的<td>的實際內容。

來源

2014-02-25 21:33:59

from bs4 import BeautifulSoup 
import requests 

def extract_reports(url): 
    pg = requests.get(url) 
    bs = BeautifulSoup(pg.content) 
    reports = {} 
    for row in bs.findAll("tr"): 
     cells = [cell.text for cell in row.findAll("td")] 
     try: 
      reports[int(cells[0])] = cells[1] 
     except (IndexError, ValueError): 
      pass 
    return reports

來源

2014-02-25 23:05:48

使用BeautifulSoup來提取ID和報告對，並返回一個字典上的ID作爲報告的值

回答

相關問題