如果你想要(如擺脫任何查詢字符串參數),urlparse是很好用的。
import urllib.parse
urls = [
'http://www.test.com/TEST1',
'http://www.test.com/page/TEST2',
'http://www.test.com/page/page/12345',
'http://www.test.com/page/page/12345?abc=123'
]
for i in urls:
url_parts = urllib.parse.urlparse(i)
path_parts = url_parts[2].rpartition('/')
print('URL: {}\nreturns: {}\n'.format(i, path_parts[2]))
輸出:
URL: http://www.test.com/TEST1
returns: TEST1
URL: http://www.test.com/page/TEST2
returns: TEST2
URL: http://www.test.com/page/page/12345
returns: 12345
URL: http://www.test.com/page/page/12345?abc=123
returns: 12345
如果URL中可能包含查詢字符串'...富= bar',你不想要這個?;我建議使用'urlparse'結合naeg的'basename'-建議。 – plundra
http://docs.python.org/library/urlparse.html#module-urlparse –