使用.lower解析網站時，列表索引超出範圍（）

我正在解析網站以計算提及關鍵字的換行符的數量。一切都正常運行下面的代碼：使用.lower解析網站時，列表索引超出範圍（）

import time 
import urllib2 
from urllib2 import urlopen 
import datetime 

website = 'http://www.dailyfinance.com/2014/11/13/market-wrap-seventh-dow-record-in-eight-days/#!slide=3077515' 
topSplit = 'NEW YORK -- ' 
bottomSplit = "<div class=\"knot-gallery\"" 

# Count mentions on newlines 
def main(): 
    try: 
     x = 0 
     sourceCode = urllib2.urlopen(website).read() 
     sourceSplit = sourceCode.split(topSplit)[1].split(bottomSplit)[0] 
     content = sourceSplit.split('\n') # provides an array 

     for line in content: 
      if 'gain' in line: 
       x += 1 

     print x 

    except Exception,e: 
     print 'Failed in the main loop' 
     print str(e) 

main()

不過，我想考慮到所有提及特定關鍵字（在這種情況下'gain'或'Gain'）的。反過來，我在源代碼中包含了.lower()的閱讀。

sourceCode = urllib2.urlopen(website).read().lower()

然而，這給我的錯誤：

Failed in the main loop

list index out of range

假設.lower()被擺脫的指數，爲什麼會發生這種情況？

來源

2015-04-07 Chuck

您只使用小寫字符串（這就是lower()所做的），但您嘗試使用topSplit = 'NEW YORK -- '進行拆分，這應該使用單個項目創建列表。

然後，您可以嘗試訪問索引1，這將總是不能在該列表：

sourceCode.split(topSplit)[1]

考慮到這兩種情況下，看看與re模塊正則表達式的使用，下面是一個例子：

>>> string = "some STRING lol" 
>>> re.split("string", string, flags=re.IGNORECASE) 
['some ', ' lol'] 
>>> re.split("STRING", string, flags=re.IGNORECASE) 
['some ', ' lol']

來源

2015-04-07 11:27:52

很好的回答，並根據你的建議我使用'topSplit ='NEW YORK - '.lower（）'讓它運行。我也會看看're'模塊，謝謝你的支持。 – Chuck

使用.lower解析網站時，列表索引超出範圍（）

回答

相關問題