如何操作URL字符串以提取單個文件？

背景

我的程序接受的URL。我想從網址中提取用戶名。

用戶名是子域名。如果子域名是'www'，則用戶名應該是域的主要部分。域的其餘部分應丟棄（例如，「.COM /」，「.ORG /」）。

我已嘗試以下步驟：使用strip()

def get_username_from_url(url): 
    if url.startswith(r'http://www.'): 
     user = url.replace(r'http://www.', '', 1) 
     user = user.split('.')[0] 
     return user 
    elif url.startswith(r'http://'): 
     user = url.replace(r'http://', '', 1) 
     user = user.split('.')[0] 
     return user 

easy_url = "http://www.httpwwwweirdusername.com/"  
hard_url = "http://httpwwwweirdusername.blogger.com/" 

print get_username_from_url(easy_url) 
# output = httpwwwweirdusername (good! expected.) 

print get_username_from_url(hard_url) 
# output = weirdusername (bad! username should = httpwwwweirdusername)

我試過很多其他組合，split()和replace()。

你能否告訴我如何解決這個相對簡單的問題？

來源

2014-09-10 BBedit

無法重現 – vaultah 2014-09-10 18:53:35

你嘗試使用字符串模式和strpos（）...好像他們可以幫助你在你的問題 – 2014-09-10 18:58:19

您的代碼工作正常我。 – Zenadix 2014-09-10 19:10:00

可能用正則表達式來做到這一點（很可能修改正則表達式是更準確/高效）。

import re 
url_pattern = re.compile(r'.*/(?:www.)?(\w+)') 
def get_username_from_url(url): 
    match = re.match(url_pattern, url) 
    if match: 
     return match.group(1) 

easy_url = "http://www.httpwwwweirdusername.com/" 
hard_url = "http://httpwwwweirdusername.blogger.com/" 

print get_username_from_url(easy_url) 
print get_username_from_url(hard_url)

其中產量我們：

httpwwwweirdusername 
httpwwwweirdusername

來源

2014-09-10 19:04:59

有一個名爲urlparse模塊是專爲任務：

>>> from urlparse import urlparse 
>>> url = "http://httpwwwweirdusername.blogger.com/" 
>>> urlparse(url).hostname.split('.')[0] 
'httpwwwweirdusername'

在http://www.httpwwwweirdusername.com/情況下，將其不期望的輸出www。有解決方法忽略www部分一樣，例如，您可以通過分裂hostname的第一個項目，是不是等於www：

>>> from urlparse import urlparse 

>>> url = "http://www.httpwwwweirdusername.com/" 
>>> next(item for item in urlparse(url).hostname.split('.') if item != 'www') 
'httpwwwweirdusername' 

>>> url = "http://httpwwwweirdusername.blogger.com/" 
>>> next(item for item in urlparse(url).hostname.split('.') if item != 'www') 
'httpwwwweirdusername'

來源

2014-09-10 18:55:07 alecxe

如何操作URL字符串以提取單個文件？

回答

相關問題