簡單的網站地圖掃描器，保存爲.txt

只是一個簡單的問題，我想創建一個簡單的蜘蛛，它將訪問站點的sitemap.xml並將url保存在記事本中，但我只有下面的代碼將1個URL保存在記事本中。簡單的網站地圖掃描器，保存爲.txt

這似乎是打印所有我需要在CMD但不是在TXT

import urllib2 as ur 
import re 

f = ur.urlopen(u'http://www.site.co.uk/sitemap.xml') 
res = f.readlines() 
for d in res: 
    data = re.findall('<loc>(http:\/\/.+)<\/loc>',d) 
    for i in data: 
    print i 
    file = open("sitemapdata.txt", "w") 
    file.write(i) 
    file.close()

提前感謝的信息。

來源

2015-06-03 BubblewrapBeast

只要我發佈這個，我意識到出了什麼問題。我意外地遺漏了+'\n'並需要更改：

import urllib2 as ur 
import re 

f = ur.urlopen(u'http://www.site.co.uk/sitemap.xml') 
res = f.readlines() 
for d in res: 
    data = re.findall('<loc>(http:\/\/.+)<\/loc>',d) 
    for i in data: 
    print i 
    file = open("sitemapdata.txt", "a") 
    file.write(i +'\n') 
    file.close()

來源

2015-06-03 13:44:08 BubblewrapBeast

簡單的網站地圖掃描器，保存爲.txt

回答

相關問題