0
這是我爲獲取alexa排名而編寫的腳本。以良好的表格形式顯示結果並對其進行分欄
#!/usr/bin/env python
import sys
import requests
from lxml import html
if __name__ == '__main__':
if len(sys.argv) < 2:
print 'usage: python %s <file-urls>' % (sys.argv[0])
sys.exit(2)
filename = sys.argv[1]
urls = open(filename)
for site in urls:
try:
url="http://www.alexa.com/siteinfo/"+site
content=requests.get(url).content
tree=html.fromstring(content)
RANK=tree.xpath('//strong[@class="metrics-data align-vmiddle"]/text()')
print "Site:",site+"Global Rank:",RANK[0]+"\t"+"Country Rank:",RANK[1]
# print 'Site:%s Global Rank:%2s Country Rank:%2s' % (site, RANK[0], RANK[1])
except (KeyboardInterrupt, SystemExit):
print "Keyboar Interruption!"
sys.exit(0)
結果:
Site: google.com
Global Rank: 1 Country Rank: 1
Site: yahoo.com
Global Rank: 4 Country Rank: 4
Site: bing.com
Global Rank: 23 Country Rank: 14
的結果並不令人滿意。你能否展示如何更好地分組結果?
我想知道爲什麼網站位於上面一行以及如何糾正它 – MLSC 2014-10-22 12:19:59
因爲在網站變量的末尾有'\ n'。嘗試去除它。 – 2014-10-22 12:23:09