0
無論如何通過使用lxml或機械手和切出美麗的湯全部提高腳本的速度?Python鏈接下載緩慢
蟒蛇:
import lxml.html as html
import urllib
import urlparse
from BeautifulSoup import BeautifulSoup
import re
import os, sys
print ("downloading and parsing bibles...")
root = html.parse(open('all.html'))
for link in root.findall('//a'):
url = link.get('href')
name = urlparse.urlparse(url).path.split('/')[-1]
dirname = urlparse.urlparse(url).path.split('.')[-1]
f = urllib.urlopen(url)
s = f.read()
if (os.path.isdir(dirname) == 0):
os.mkdir(dirname)
soup = BeautifulSoup(s)
articleTag = soup.html.body.article
converted = str(articleTag)
full_path = os.path.join(dirname, name)
open(full_path, 'w').write(converted)
print(name)
print("downloads complete!")
all.html
<a href="http://www.youversion.com/bible/gen.1.nmv-fas">http://www.youversion.com/bible/gen.1.nmv-fas</a>
<a href="http://www.youversion.com/bible/gen.2.nmv-fas">http://www.youversion.com/bible/gen.2.nmv-fas</a>
<a href="http://www.youversion.com/bible/gen.3.nmv-fas">http://www.youversion.com/bible/gen.3.nmv-fas</a>
是不是下載需要花費時間? – Fenikso 2012-04-25 18:41:52
需要大部分時間,但我不能使用lxml而不是美麗和提高速度? – Blainer 2012-04-25 18:42:49
這些是解析。如果下載佔用大部分時間,則解析器無關緊要。 – Fenikso 2012-04-25 18:45:40