2016-03-04 39 views
2

我必須編寫一個程序,它將從此鏈接中讀取HTML(http://python-data.dr-chuck.net/known_by_Maira.html),從錨標記中提取href =值,掃描相對於名字的特定位置中的標記在列表中,按照該鏈接並重復該過程若干次並報告您找到的姓氏。Python中的以下鏈接

我應該找到位置18(第一個名稱爲1)的鏈接,然後按照該鏈接重複該過程7次。答案是我檢索的姓氏。

這是我找到的代碼,它工作得很好。

import urllib 

from BeautifulSoup import * 

url = raw_input("Enter URL: ") 
count = int(raw_input("Enter count: ")) 
position = int(raw_input("Enter position: ")) 

names = [] 

while count > 0: 
    print "retrieving: {0}".format(url) 
    page = urllib.urlopen(url) 
    soup = BeautifulSoup(page) 
    tag = soup('a') 
    name = tag[position-1].string 
    names.append(name) 
    url = tag[position-1]['href'] 
    count -= 1 

print names[-1] 

我真的很感激,如果有人可以像你對​​我會以一個10歲,這是怎麼回事while循環中解釋。我是Python新手,非常感謝這一指導。

非常感謝你提前

回答

1
while count > 0:       # because of `count -= 1` below, 
             # will run loop count times 

    print "retrieving: {0}".format(url) # just prints out the next web page 
             # you are going to get 

    page = urllib.urlopen(url)   # urls reference web pages (well, 
             # many types of web content but 
             # we'll stick with web pages) 

    soup = BeautifulSoup(page)   # web pages are frequently written 
             # in html which can be messy. this 
             # package "unmessifies" it 

    tag = soup('a')      # in html you can highlight text and 
             # reference other web pages with <a> 
             # tags. this get all of the <a> tags 
             # in a list 

    name = tag[position-1].string  # This gets the <a> tag at position-1 
             # and then gets its text value 

    names.append(name)     # this puts that value in your own 
             # list. 

    url = tag[position-1]['href']  # html tags can have attributes. On 
             # and <a> tag, the href="something" 
             # attribute references another web 
             # page. You store it in `url` so that 
             # its the page you grab on the next 
             # iteration of the loop. 
    count -= 1 
+0

哇!這真是一個很好的解釋! –

0

你輸入你想從一個頁面

0)檢索網址數量打印網址
1)打開URL
2)讀取源
BeautifulSoup docs

3)得到每一個a標籤
4)得到整個<a ...></a>我認爲
5)把它添加到列表names
6)從names最後一個項目得到的URL,即拉href<a ...></a>
7)打印最終名單names

+0

太謝謝你了。這真的很有幫助 –