2015-11-20 26 views
1

出於某種原因,我無法從這個簡單的html表中提取表。如何從簡單的html表中提取行?

from bs4 import BeautifulSoup 
import requests 

def main(): 
    html_doc = requests.get(
    'http://www.wolfson.cam.ac.uk/old-site/cgi/catering-menu?week=0;style=/0,vertical') 

    soup = BeautifulSoup(html_doc.text, 'html.parser') 
    table = soup.find('table') 
    print table 


if __name__ == '__main__': 
    main() 

我有這張表,但我無法很好地理解beautifulsoup文檔,知道如何提取數據。數據在tr標籤。

該網站顯示一個簡單的HTML食品菜單。

我想輸出一週的這一天的白天和菜單:

Monday: 
    Lunch: some_lunch, Supper: some_food 
Tuesday: 
    Lunch: some_lunch, Supper: some_supper 

等一週中的每一天。 '正式大廳'可以忽略。

如何遍歷tr標籤,以便我可以創建此輸出?

+0

我剛剛檢查HTML源代碼,而我只能看到的' 很多的 '...是誰寫的? –

回答

1

我通常不提供直接解決方案。你應該試過一些代碼,如果你面對任何問題,然後張貼在這裏。但無論如何,這是我寫的,它應該有助於給你一個良好的開端。

 
soup = BeautifulSoup(r.content) 

rows = soup.findAll("tr") 

for i in xrange(1,8): 
    row = rows[i] 
    print row.find("th").text 
    for j in xrange(0,2): 
     print rows[0].findAll("th")[j+1].text.strip(), ": ", 
     td = row.findAll("td")[j] 
     for p in td.findAll("p"): 
      print p.text, ",", 
     print 
    print 

輸出會是這個樣子:

 
Monday 
Lunch: Leek and Potato Soup, Spaghetti Bolognese with Garlic Bread, Red Pepper and Chickpea Stroganoff with Brown Rice, Chicken Goujons with Garlic Mayonnaise Dip, Vegetable Grills with Sweet Chilli Sauce, Coffee and Walnut Sponge with Custard, 
Supper: Leek and Potato Soup, Breaded Haddock with Lemon and Tartare Sauce, Vegetable Samosa with Lentil Dahl, Chilli Beef Wraps, Steamed Strawberry Sponge with Custard, 

Tuesday 
Lunch: Tomato and Basil Soup, Pan-fried Harrisa Spiced Chicken with Roasted Vegetables, Vegetarian Spaghetti Bolognese with Garlic Bread, Jacket Potato with Various Fillings, Apple and Plum Pie with Custard, 
Supper: Tomato and Basil Soup, Lamb Tagine with Fruit Couscous, Vegetable Biryani with Naan Bread, Pan-fried Turkey Escalope, Raspberry Shortbread,