如何從URL中剪切文件名？

我有很多像這樣的鏈接http://example.com/2013/1520/i2013i1520p100049.html或http://example.com/2013/89/i2013i89p60003.html。如何從URL中剪切文件名？

我需要將HTML文件分別保存在文件夾1520中作爲i2013i1520p100049.html和文件夾「89」中的文件作爲i2013i89p60003.html。

我可以削減字符串，但其他人有另一個長度。

P.S.我正在使用Python。

來源

2013-07-26 Andrew Tsaryov

所以使用這種標準化的格式最快的方法是使用查找和切片:)。正則表達式是不值得的

例如，

>>> a = "http://example.com/2013/1520/i2013i1520p100049.html or http://example.com/2013/89/i2013i89p60003.html" 
>>> lastindex = a.rfind('/') 
>>> a[lastindex+1:] 
'i2013i89p60003.html' 
>>> a[a.rfind('/',0,lastindex)+1:lastindex] 
'89'

分裂VS發現一個巨大的網址（這些是存在的，但通常不這大）

>>> a = range(10000) 
>>> [a.insert(randint(0,10000),'/') for x in range(0,100)] 
>>> a = str(a) 
>>> b = time.time(); a.rfind('/'); time.time()-b 
58493 
1.8835067749023438e-05 
>>> b = time.time(); d=a.split('/'); time.time()-b 
0.00012683868408203125

更重要的是，你不需要做出的一個巨大的再分配/複製您的列表，當你有1000的，這並不好玩URL的

來源

2013-07-26 20:51:29

您可以使用類似以下的（如果你想要做的更復雜的工作）：

s = 'http://example.com/2013/1520/i2013i1520p100049.html' 

from operator import itemgetter 
from urlparse import urlsplit 

split_url = urlsplit(s) 
path, fname = itemgetter(2, -1)(split_url.path.split('/')) 
print path, fname 
# 1520 i2013i1520p100049.html

否則：

path, fname = s.rsplit('/', 2)[1:]

來源

2013-07-26 20:52:14

使用split()

url = 'http://example.com/2013/1520/i2013i1520p100049.html' 
parts = url.split('/') 

fn = parts[-1] 
dir = parts[-2]

然後撥打電話，保存源：

import urllib2 

fp = urllib2.urlopen(url).read() 

fullpath_fn = dir + '/' + fn 
with open(fullpath, 'w') as htmlfile: 
    htmlfile.write(fp)

來源

2013-07-26 20:52:56 That1Guy

>>> 'http://example.com/2013/1520/i2013i1520p100049.html'.split('/')[-1] 
'i2013i1520p100049.html'

來源

2013-07-26 20:53:43

您可以使用該方法split()：

url = 'http://example.com/2013/1520/i2013i1520p100049.html' 
tokens = url.split('/') 
file = parts[-1] 
folder = parts[-2]

來源

2013-07-26 20:54:12 amatellanes

你可以使用urlparse.urlsplit和os.path.split：

import os 
import urlparse 
s = 'http://example.com/2013/1520/i2013i1520p100049.html' 

path = urlparse.urlsplit(s).path 
print(path) 
# /2013/1520/i2013i1520p100049.html 

dirname, basename = os.path.split(path) 
dirname, basedir = os.path.split(dirname) 
print(basedir) 
# 1520 
print(basename) 
# i2013i1520p100049.html

來源

2013-07-26 20:56:58 unutbu

只是爲了它的緣故，基於正則表達式回答：

match = re.search(r'([0-9]+)/([a-z0-9]+\.html)$', string) 
if match: 
    folder = match.group(1) 
    file = match.group(2)

來源

2013-07-26 23:10:45 adbar

如何從URL中剪切文件名？

回答

相關問題