更優雅的方式來創建來自BeautifulSoup.findAll的行列表

我正在寫一個使用BeautifulSoup的web分析器。我創建了一個用bs.findAll(text=True)生成的行的列表，然後爲行分割線並在那裏應用我的邏輯。 html_payload是一個任意的網頁。更優雅的方式來創建來自BeautifulSoup.findAll的行列表

我到目前爲止的代碼有效，但它不是很漂亮，並且讓我覺得必須有更好，更精細的編寫方式。

data_to_parse = BeautifulSoup(html_payload) 
    lines_to_parse = [] 

    d = data_to_parse.findAll(text=True) 
    for line in d: 
     for line2 in line.strip().split('\n'): 
      if line2: 
       lines_to_parse.append(line2) 

    for line in lines_to_parse: 
     pass # here's where I start analyzing results

有沒有人可以提出更好的解決方法？

來源

2013-08-23 Petter H

只是get all the text at once並把它分解成線：

來源

2013-08-23 22:11:11

您可以使用列表理解的：

lines_to_parse = [line2 for line in data_to_parse.findAll(text=True) for line2 in line.strip().split('\n') if line2]

或者，你其實可以聯合收割機收集和分析步驟：

d = data_to_parse.findAll(text=True) 
for line in d: 
    for line2 in line.strip().split('\n'): 
     if line2: 
      # analyze here

或者，請記住，您沒有大量使用BeautifulSoup ，xmltodict可能會幫助你收集數據到列表中，看看。

希望有所幫助。

來源

2013-08-23 22:11:22 alecxe

@downvoter，downvote的原因是什麼？我們在這裏幫助並提出建議：在線索中提供更好的建議不是降低投票的理由。 – alecxe

更優雅的方式來創建來自BeautifulSoup.findAll的行列表

回答

相關問題