返回多個「href」

2015-11-23 47 views 0 likes

我無法讓我的程序正常工作，而且我嘗試了這麼久。在這裏，它非常簡單，但我無法得到它。應該返回任何包含「html」的東西。這真的令人沮喪。這是一個命令行的Python 2.x的返回多個「href」

#!/usr/bin/env python 

import sys 
import re 

#Make this program work both on python 2.x and Python 3.x 
if (sys.version_info[0] == 3): raw_input = input 

import urllib2 
url = urllib2.urlopen('http://makeitwork.com/') 
data = url.read() 
urlsearch = re.findall(r'href=[\'"]?([^\'"]+)' , data) 

for x in urlsearch: 
    line = x.split() 
    print(" %s" %line[0])

來源

2015-11-23 Edward Thorn

尋求調試幫助的問題（**「爲什麼不是這個代碼工作？」**）必須包含所需的行爲，*特定的問題或錯誤*和*必要的最短代碼*來重現它**自問**。沒有**明確問題陳述**的問題對其他讀者沒有用處。請參閱：[如何創建最小，完整和可驗證示例]（http://stackoverflow.com/help/mcve）。 – MattDMo

回答

嘗試BeautifulSoup，Never use regex to parse HTML code：

import urllib2 
from bs4 import BeautifulSoup 

url = urllib2.urlopen('http://makeitwork.com/') 
data = url.read() 

soup = BeautifulSoup(data) 
for i in soup.find_all(a): 
    print(link.get('href'))

來源

2015-11-23 02:11:40

嘗試使用這個表達式

'r'a\shref="/?(.*)">'

的<a href HTML標記之後和之前基本上尋找什麼>閉幕聲明。

來源

2015-11-23 03:39:02 Scott

相關問題

1. 返回一個href假回到真
2. .attr（'href'）返回undefined
3. a.getAttribute（「href」）返回img src而不是href
4. .get（'href'）返回None而不是href
5. 多個返回
6. 多個返回??/
7. href什麼都不返回？
8. jQuery的HREF返回undefined
9. beautifulsoup4：獲得href，但返回「＃」
10. 返回多個coloumns

11. 返回多個ValidationExceptions
12. 返回多個值
13. Django：get（）返回多個
14. 返回多個sqlcommands
15. php多個返回
16. 返回多個值
17. 返回多個HtmlGenericControl
18. component.getMouseMotionListener返回多個
19. MultipleObjectsReturned：get（）返回多個權限 - 返回2
20. Django - get（）返回多個返回700
21. UNPIVOT多個列返回多個列
22. xmlhttprequest返回多個值
23. 返回多個列表c＃
24. 返回一個多態類
25. SQL MIN（）返回多個值？
26. 添加多個返回jquery
27. SQL返回多個變量
28. 多個UITextfields返回Null IOS
29. reactjs返回多個div
30. 多個返回不工作