2012-08-17 23 views
0

我正在運行for循環來抓取某些XML的內容,並且它工作正常,直到我達到第29次迭代。在這一點上它給我這個錯誤:list索引錯誤(即使存在)?

File "C:\Program Files (x86)\Google\google_appengine\lib\webapp2\webapp2.py", line 572, in dispatch 
    return self.handle_exception(e, self.app.debug) 
File "C:\Program Files (x86)\Google\google_appengine\lib\webapp2\webapp2.py", line 570, in dispatch 
    return method(*args, **kwargs) 
File "J:\Art & Graphic Design\Graphic Design\Websites\lawvoter-dev\cron_congressman.py", line 64, in get 
    birthday  = re.findall("<birthday>(.*)</birthday>",element)[0] 
IndexError: list index out of range 

的代碼是:

for element in members: 
      title   = re.findall("<title>(.*)</title>",element)[0] 
      role   = re.findall("<role_type_label>(.*)</role_type_label>",element)[0] 
      name_sortable = re.findall("<name_sortable>(.*)</name_sortable>",element)[0] 
      firstname  = re.findall("<firstname>(.*)</firstname>",element)[0] 
      lastname  = re.findall("<lastname>(.*)</lastname>",element)[0] 
      gender  = re.findall("<gender_label>(.*)</gender_label>",element)[0] 
      birthday  = re.findall("<birthday>(.*)</birthday>",element)[0] 
      party   = re.findall("<party>(.*)</party>",element)[0] 
      state   = re.findall("<state>(.*)</state>",element)[0] 
      description = re.findall("<description>(.*)</description>",element)[0] 
      start_date = re.findall("<startdate>(.*)</startdate>",element)[0] 
      end_date  = re.findall("<enddate>(.*)</enddate>",element)[0] 
      website  = re.findall("<website>(.*)</website>",element)[0] 
      bioguideid = re.findall("<bioguideid>(.*)</bioguideid>",element)[0] 
      osid   = re.findall("<osid>(.*)</osid>",element)[0] 
      pvsid   = re.findall("<pvsid>(.*)</pvsid>",element)[0] 
      twitterid  = re.findall("<twitterid>(.*)</twitterid>",element)[0] 
      youtubeid  = re.findall("<youtubeid>(.*)</youtubeid>",element)[0] 

      member = Congressman(title=title, role=role, name_sortable=name_sortable, firstname=firstname, lastname=lastname, gender=gender, birthday=birthday, party=party, state=state, 
           description=description, start_date=start_date, end_date=end_date, website=website, bioguideid=bioguideid, osid=osid, pvsid=pvsid, twitterid=twitterid, youtubeid=youtubeid) 
      member.put() 

我真的不知道爲什麼這個錯誤彈出?它在前29次迭代中總是正常工作?以防萬一,數據模型中的每個元素也被設置爲「default = None」。但是,當我查看XML本身,並轉到錯誤發生的確切位置時,該值實際上就是存在的。任何人都知道爲什麼它會給出錯誤,即使該值存在?

回答

1

它看起來像

birthday  = re.findall("<birthday>(.*)</birthday>",element)[0] 

返回一個空列表和你正試圖以提取不在列表中的第一個元素,它拋出

IndexError: list index out of range 

喜歡這裏:

>>> l = [] 
>>> l[0] 
Traceback (most recent call last): 
    File "<stdin>", line 1, in <module> 
IndexError: list index out of range 
>>> 

編輯:

import re, logging 

def findelement(item, element): 
    i = re.findall(item, element) 
    if not i: 
     logging.info('no item found for %s with element %s' %(item, element)) 
     return '' 
    return i[0] 


for element in members: 
    title = findelement("<title>(.*)</title>", element) 
    ... 
+0

這就是我的想法,但是當我打印出那一行時,迭代會在列表中打印一個值。類似於: '>> a = ['1984-10-20'] >>> a [0] IndexError'' – glitchbox 2012-08-17 13:22:07

+0

相似還是正好?你正在迭代成員,所以它可能是那個元素在特定的迭代中返回一個空列表。嘗試記錄結果。 – aschmid00 2012-08-17 13:26:30

+0

當我查看它顯示的XML時,我在「生日」之後投擲了一張照片,在第29次迭代之後,出現「Status:500」錯誤。 – glitchbox 2012-08-17 13:31:31