索引增加到大於50時出現索引錯誤

我想從部分URL（每行一個）的現有.txt文件中取出每行，從每行的末尾剝離％0A，爲每個URL添加前綴要完成它，然後將每個完整URL的HTML文件下載到我的硬盤中，以便稍後/在下一步使用BeautifulSoup進行拼寫。索引增加到大於50時出現索引錯誤

下面的代碼工作得很好，除了兩個問題：

1）每個下載的HTML文件具有正確使用離線（查看源文件上的文件），但在Firefox打開時不會在所有的HTML數據不含有除頁眉/橫幅任何可見數據，和

2）腳本引發「oidstripped [J] = STR（offenderid [j]的）IndexError：列表分配索引超出範圍」在J = 51每個它運行的時間。它正確地下載j = 1到50的文件，但然後崩潰並且不會繼續。

#snip# 
j = 0 
with open('offenderurls.txt') as r: 
    offenderid = r.readlines() 
    while j < len(offenderid): 
     oidstripped = [] 
     for l in offenderid[j]: 
      oidstripped.append(l) 
     oidstripped[j] = str(offenderid[j]) 
     oidstripped[j] = oidstripped[j][:-1] 
     res = requests.get('http://www.icrimewatch.net/' + str(oidstripped[j]), stream=True) 
     type(res) 
     res.raise_for_status() 
     with open('Offenderpage' + str(j) + '.html', 'wb') as playFile: 
      for chunk in res.iter_content(1024): 
       playFile.write(chunk) 
      playFile.close() 
    j = j + 1

請幫忙！我對python非常陌生。不需要溫和。厚臉皮。所有建議將被考慮和讚賞。

例offenderurls.txt有55項是在這裏：https://pastebin.ca/3886683

謝謝！

來源

2017-10-12 roamingprofessor

您可以嘗試使用一個循環是這樣的：'F =開放（ 'somefile.txt'， 'R'）在f.readlines（）行： ... //做一些與各行 ... f.close（）' – VTodorov

如果我正確理解你的目標，這應該有所幫助。

你不需要oidstripped可言，嘗試以下

for one_id in offenderid: 
    res = requests.get('http://www.icrimewatch.net/' + one_id.rstrip('\n'), stream=True)

而不是while循環。

其餘我沒有測試。

來源

2017-10-12 18:42:59 Kajienk

索引增加到大於50時出現索引錯誤

回答

相關問題