2014-04-28 87 views
0

我已經編寫了這個代碼來替換它們的標題。它可以根據需要替換帶有標題的網址,但它會在下一行顯示它們的標題。Python:用標題替換url

twfile.txt包含這些行:

link1 http://t.co/HvKkwR1c 
no link line 

輸出tw2file:

link1 
Instagram 
no link line 

,但我想以這種形式輸出:

link1 Instagram 
no link line 

我應該怎麼辦?

我的代碼:

from bs4 import BeautifulSoup 
import urllib 

output = open('tw2file.txt','w') 

with open('twfile.txt','r') as inputf: 
    for line in inputf: 
     try: 
      list1 = line.split(' ') 
      for i in range(len(list1)): 

       if "http" in list1[i]: 
        ##print list1[i] 
        response = urllib.urlopen(list1[i]) 
        html = response.read() 
        soup = BeautifulSoup(html) 
        list1[i] = soup.html.head.title 
        ##print list1[i] 


        list1[i] = ''.join(ch for ch in list1[i]) 
       else: 
        list1[i] = ''.join(ch for ch in list1[i]) 
      line = ' '.join(list1) 
      print line 
      output.write(line) 
     except: 
      pass 


inputf.close() 
output.close() 

回答

1

試試這個代碼:(看這裏,這裏和這裏)

from bs4 import BeautifulSoup 
import urllib 

with open('twfile.txt','r') as inputf, open('tw2file.txt','w') as output: 
    for line in inputf: 
     try: 
      list1 = line.split(' ') 
      for i in range(len(list1)): 
       if "http" in list1[i]: 
        response = urllib.urlopen(list1[i]) 
        html = response.read() 
        soup = BeautifulSoup(html) 
        list1[i] = soup.html.head.title 
        list1[i] = ''.join(ch for ch in list1[i]).strip() # here 
       else: 
        list1[i] = ''.join(ch for ch in list1[i]).strip() # here 
      line = ' '.join(list1) 
      print line 
      output.write('{}\n'.format(line)) # here 
     except: 
      pass 

順便說一句,你在使用Python 2.7.x +,同樣with子句中表示2 open秒。他們的close也是不必要的。

1

關於寫入一個文件

fileobject = open("bar", 'w') 
fileobject.write("Hello, World\n") # newline is inserted by '\n' 
fileobject.close() 

內容關於控制檯輸出

變化print lineprint line,

Python在結尾處寫入'\ n'字符,除非打印語句以逗號結尾。

+0

它不影響輸出 –

+0

你爲什麼要打印2次?打印行和output.write(行)? – Gio

+1

'print'似乎是'console'。另一個似乎是'file' – emeth