Python文件解析 - > IndexError

我通過一個包含幾百條記錄的ISI文件解析，這些記錄全部以'PT J'標記開頭，並以'ER'標記結尾。我試圖從嵌套循環中的每個記錄拉標籤的信息，但不斷得到一個IndexError。我知道爲什麼我會得到它，但是沒有人比檢查前幾個字符有更好的方式來識別新記錄的開始？Python文件解析 - > IndexError

while file: 
     while line[1] + line[2] + line[3] + line[4] != 'PT J': 
      ...     
      Search through and record data from tags 
      ...

我使用同樣的方法，因此偶爾會得到同樣的問題，識別標籤，因此，如果您有任何建議，以及我將不勝感激！

樣本數據，你會發現並不總是包含每個記錄每一個標籤，是：

PT J 
    AF Bob Smith 
    TI Python For Dummies 
    DT July 4, 2012 
    ER 

    PT J 
    TI Django for Dummies 
    DT 4/14/2012 
    ER 

    PT J 
    AF Jim Brown 
    TI StackOverflow 
    ER

來源

2012-07-06 MTP

我想指出，我在將它轉換爲.txt之前，以及在閱讀之前。 – MTP 2012-07-06 02:47:56

不要在'ER'行只包含「ER」？這就是爲什麼你會得到IndexError，因爲第[4]行不存在。

，以嘗試將是第一件事：

while not line.startswith('PT J'):

，而不是現有的while循環。

此外，片：

line[1] + line[2] + line[3] + line[4] == line[1:5]

（片的兩端是noninclusive）

來源

2012-07-06 02:51:46 Marius

是的，'ER'（記錄結束）行通常不包含任何其他內容，甚至不包含尾隨空格。 – 2012-07-06 08:15:59

我喜歡你的建議......我將不得不多玩它。 – MTP 2012-07-07 02:57:11

你可以嘗試這樣的方法，通過你的文件中讀取。

with open('data.txt') as f: 
    for line in f: 
     line = line.split() # splits your line into a list of character sequences 
          # separated based on whitespace (blanks, tabs) 
     llen = len(line) 
     if llen == 2 and line[0] == 'PT' and line[1] == 'J': # found start of record 
      # process 
      # examine line[0] for 'tags', such as "AF", "TI", "DT" and proceed 
      # as dictated by your needs. 
      # e.g., 

     if llen > 1 and line[0] == "AF": # grab first/last name in line[1] and line[2] 

      # The data will be on the same line and 
      # accessible via the correct index values. 

     if lline == 1 and line[0] == 'ER': # found end of record.

這肯定需要更多的「編程邏輯」（最有可能嵌入環，或者更好的是，調用函數）把一切都在正確的順序/序列，但其基本結構是那裏，我希望能得到你開始並給你一些想法。

來源

2012-07-06 02:54:00 Levon

with open('data1.txt') as f: 
    for line in f: 
     if line.strip()=='PT J': 
      for line in f: 
       if line.strip()!='ER' and line.strip(): 
        #do something with data 
       elif line.strip()=='ER': 
        #this record ends here move to the next record 
        break

來源

2012-07-06 03:00:14

我想我看到這裏發生了什麼，但是，我將如何訪問不同的行來操作或測試它們？由於行是充當迭代器的，因此我們不能在嵌套的'if'語句中說出如下內容：line = file.readline（）什麼是替換line = file.readline（）以允許我獲取到具體的行？我問，因爲在某些情況下，每個標籤有多個實體。 – MTP 2012-07-07 03:54:39

Python文件解析 - > IndexError

回答

相關問題