2015-09-06 77 views
0

好吧,我正在製作一個程序來測試某個網站中的某些頁面是脫機還是聯機。將文本存儲在Python中的txt文件中

import urllib2 

u = 'http://www.google.com/' 

pages = open('pages.txt', 'r').readlines() 

for page in pages: 
    url = u + page 
    try: 
     req = urllib2.urlopen(url) 
    except urllib2.HTTPError as e: 
     if e.code == 404: 
      print url + " does not exists" 
    else: 
     print url + " exists" 

和「Pages.txt」包含了這樣的事情:

search 
page 
plus 
signin 
account 
security 
lol 
about 
contactus 
someotherpage.html 

現在程序工作正常,但我想它存儲在一個TXT文件中的可用頁面。有人可以幫助我嗎?如果不僅僅列出存在的頁面而忘記脫機頁面也會很棒。謝謝:)

回答

0

什麼:

蟒蛇your_script> Pages.txt


編輯

對於文件中寫入

with open('Pages.txt', 'w') as f: 
    f.write('something') 
f.close() 
+0

將不只是存儲的一切嗎?我不希望脫機網站出現 –

+0

我認爲只有您的打印將在文件實際上... – Richard

+0

但我的打印也包含「不存在」頁面 –

0

只是寫文件的方式與您正在閱讀的方式相同:

out = open('pages.txt', 'w')

...然後在else:標籤,你已經寫:

out.write(url+"\n")

製作:

import urllib2

u = 'http://www.google.com/' pages = open('pages.txt', 'r').readlines() out = open('pages.txt', 'w') for page in pages: url = u+page try: req = urllib2.urlopen(url) except urllib2.HTTPError as e: if e.code == 404: print url+" does not exists" else: print url+" exists" out.write(url+"\n")

+0

嗯,我還是新來的蟒蛇,你能寫出最終的代碼嗎?非常感謝:) –

0

以追加模式打開文件進行寫入。 重定向打印語句以打印到新的文件處理程序。

import urllib2 

u = raw_input('Enter a url: ') or 'http://www.google.com/' 

pages = open('pages.txt', 'r').readlines() 
with open('available.txt', 'a') as available: 
    for page in pages: 
     url = u.rstrip('\n')+page 
     try: 
       req = urllib2.urlopen(url) 
     except urllib2.HTTPError as e: 
       if e.code == 404: 
         print url+" does not exists" 
     else: 
       print url+" exists" 
       print >> available, url.rstrip('\n') 

輸出:

(availablepages)macbook:availablepages joeyoung$ ls -al 
total 16 
drwxr-xr-x 4 joeyoung staff 136 Sep 7 00:23 . 
drwxr-xr-x 4 joeyoung staff 136 Sep 6 23:54 .. 
-rw-r--r-- 1 joeyoung staff 478 Sep 7 00:20 availablepages.py 
-rw-r--r-- 1 joeyoung staff 70 Sep 6 23:56 pages.txt 
(availablepages)macbook:availablepages joeyoung$ python availablepages.py 
Enter a url: http://www.google.com/ 
http://www.google.com/search 
exists 
http://www.google.com/page 
does not exists 
http://www.google.com/plus 
exists 
http://www.google.com/signin 
does not exists 
http://www.google.com/account 
exists 
http://www.google.com/security 
exists 
http://www.google.com/lol 
does not exists 
http://www.google.com/about 
exists 
http://www.google.com/someotherpage.html 
does not exists 
(availablepages)macbook:availablepages joeyoung$ ls -al 
total 24 
drwxr-xr-x 5 joeyoung staff 170 Sep 7 00:23 . 
drwxr-xr-x 4 joeyoung staff 136 Sep 6 23:54 .. 
-rw-r--r-- 1 joeyoung staff 145 Sep 7 00:23 available.txt 
-rw-r--r-- 1 joeyoung staff 478 Sep 7 00:20 availablepages.py 
-rw-r--r-- 1 joeyoung staff 70 Sep 6 23:56 pages.txt 
(availablepages)macbook:availablepages joeyoung$ cat available.txt 
http://www.google.com/search 
http://www.google.com/plus 
http://www.google.com/account 
http://www.google.com/security 
http://www.google.com/about 
(availablepages)macbook:availablepages joeyoung$ python availablepages.py 
Enter a url: http://www.bing.com/ 
http://www.bing.com/search 
exists 
http://www.bing.com/page 
does not exists 
http://www.bing.com/plus 
does not exists 
http://www.bing.com/signin 
does not exists 
http://www.bing.com/account 
exists 
http://www.bing.com/security 
does not exists 
http://www.bing.com/lol 
does not exists 
http://www.bing.com/about 
does not exists 
http://www.bing.com/someotherpage.html 
does not exists 
(availablepages)macbook:availablepages joeyoung$ ls -al 
total 24 
drwxr-xr-x 5 joeyoung staff 170 Sep 7 00:23 . 
drwxr-xr-x 4 joeyoung staff 136 Sep 6 23:54 .. 
-rw-r--r-- 1 joeyoung staff 200 Sep 7 00:24 available.txt 
-rw-r--r-- 1 joeyoung staff 478 Sep 7 00:20 availablepages.py 
-rw-r--r-- 1 joeyoung staff 70 Sep 6 23:56 pages.txt 
(availablepages)macbook:availablepages joeyoung$ cat available.txt 
http://www.google.com/search 
http://www.google.com/plus 
http://www.google.com/account 
http://www.google.com/security 
http://www.google.com/about 
http://www.bing.com/search 
http://www.bing.com/account 
+0

非常感謝!只是一件事可能這是一個愚蠢的問題,但什麼是available.txt?它是自動生成還是應該創建它?再次感謝:) –

+0

對不起,它是自動生成的。您可以在代碼中將文件名更改爲任何您想要的內容。 –

+0

好吧,它完美的工作!非常感謝:)我知道我應該在單獨的問題中提出這個問題,但我可以讓用戶選擇網站嗎?您的回答已被標記爲有用:) –

相關問題