Python中的以下鏈接

我必須編寫一個程序，它將從此鏈接中讀取HTML（http://python-data.dr-chuck.net/known_by_Maira.html），從錨標記中提取href =值，掃描相對於名字的特定位置中的標記在列表中，按照該鏈接並重復該過程若干次並報告您找到的姓氏。Python中的以下鏈接

我應該找到位置18（第一個名稱爲1）的鏈接，然後按照該鏈接重複該過程7次。答案是我檢索的姓氏。

這是我找到的代碼，它工作得很好。

import urllib 

from BeautifulSoup import * 

url = raw_input("Enter URL: ") 
count = int(raw_input("Enter count: ")) 
position = int(raw_input("Enter position: ")) 

names = [] 

while count > 0: 
    print "retrieving: {0}".format(url) 
    page = urllib.urlopen(url) 
    soup = BeautifulSoup(page) 
    tag = soup('a') 
    name = tag[position-1].string 
    names.append(name) 
    url = tag[position-1]['href'] 
    count -= 1 

print names[-1]

我真的很感激，如果有人可以像你對我會以一個10歲，這是怎麼回事while循環中解釋。我是Python新手，非常感謝這一指導。

非常感謝你提前

來源

2016-03-04 suyash gautam

while count > 0:       # because of `count -= 1` below, 
             # will run loop count times 

    print "retrieving: {0}".format(url) # just prints out the next web page 
             # you are going to get 

    page = urllib.urlopen(url)   # urls reference web pages (well, 
             # many types of web content but 
             # we'll stick with web pages) 

    soup = BeautifulSoup(page)   # web pages are frequently written 
             # in html which can be messy. this 
             # package "unmessifies" it 

    tag = soup('a')      # in html you can highlight text and 
             # reference other web pages with <a> 
             # tags. this get all of the <a> tags 
             # in a list 

    name = tag[position-1].string  # This gets the <a> tag at position-1 
             # and then gets its text value 

    names.append(name)     # this puts that value in your own 
             # list. 

    url = tag[position-1]['href']  # html tags can have attributes. On 
             # and <a> tag, the href="something" 
             # attribute references another web 
             # page. You store it in `url` so that 
             # its the page you grab on the next 
             # iteration of the loop. 
    count -= 1

來源

2016-03-04 07:27:39 tdelaney

哇！這真是一個很好的解釋！ –

你輸入你想從一個頁面

0）檢索網址數量打印網址
1）打開URL
2）讀取源
BeautifulSoup docs

3）得到每一個a標籤
4）得到整個<a ...></a>我認爲
5）把它添加到列表names
6）從names最後一個項目得到的URL，即拉href從<a ...></a>
7）打印最終名單names

來源

2016-03-04 07:22:55 KeyWeeUsr

太謝謝你了。這真的很有幫助 –

Python中的以下鏈接

回答

相關問題