PyQuery Python不能用於循環

我正在嘗試編寫一個程序，它從.txt文件的每一行中拉出urls並執行一個PyQuery來從LyricsWiki刮掉歌詞數據，並且一切看起來都很好，直到我真的例如，當我這樣做：例如，當我這樣做：PyQuery Python不能用於循環

full_lyrics = ""   
#open up the input file 
links = open('links.txt') 

for line in links: 
    full_lyrics += line 

print(full_lyrics) 
links.close()

它打印所有預期的一切，一個大字符串與其中的所有數據。但是，當我實現實際的html解析時，它只會從最後一個url中提取歌詞並跳過所有以前的歌詞。

import requests, re, sqlite3 
from pyquery import PyQuery 
from collections import Counter 

full_lyrics = ""   
#open up the input file 
links = open('links.txt') 
output = open('web.txt', 'w') 
output.truncate() 

for line in links: 
    r = requests.get(line) 
    #create the PyQuery object and parse text 
    results = PyQuery(r.text) 
    results = results('div.lyricbox').remove('script').text() 
    full_lyrics += (results + " ") 

output.write(full_lyrics) 
links.close() 
output.close()

我正在寫入txt文件以避免編碼Powershell問題。無論如何，在我運行該程序並打開txt文件後，它只會顯示links.txt文件上最後一個鏈接的歌詞。

作爲參考， 'links.txt' 應該包含幾個環節來lyricswiki歌的網頁，這樣的： http://lyrics.wikia.com/Taylor_Swift:Shake_It_Off http://lyrics.wikia.com/Maroon_5:Animals

'web.txt' 應該是一個空白的輸出文件。

爲什麼pyquery會打破for循環？當它做更簡單的事情時，它很明顯，比如只是連接文件的各個行。

來源

2014-10-31 thenorm

問題是從文件（links.txt）中讀取的每一行中的額外換行符。嘗試在links.txt中打開另一行，您將看到即使最後一個條目也不會被處理。

我建議您，在像這樣做後上線變量右側條狀：

for line in links: 
    line = line.rstrip() 
    r = requests.get(line) 
    ...

它應該工作。

我也認爲你不需要獲取html的請求。嘗試results = PyQuery(line)，看看它是否有效。

來源

2014-12-22 18:13:15 jheyse

PyQuery Python不能用於循環

回答

相關問題