2011-05-13 117 views
0

下面的腳本應該獲取特定的行號並從實時網站解析它。它適用於30個循環,但它看起來像枚舉(f)停止正常工作... for循環中的「我」似乎停止在130行,而不是像200的東西。這可能是由於我試圖從其他網站獲取數據的網站嗎?謝謝!!從Python中的實時網站解析數據枚舉問題!

import sgmllib 

class MyParser(sgmllib.SGMLParser): 
"A simple parser class." 

def parse(self, s): 
    "Parse the given string 's'." 
    self.feed(s) 
    self.close() 

def __init__(self, verbose=0): 
    "Initialise an object, passing 'verbose' to the superclass." 

    sgmllib.SGMLParser.__init__(self, verbose) 
    self.divs = [] 
    self.descriptions = [] 
    self.inside_div_element = 0 

def start_div(self, attributes): 
    "Process a hyperlink and its 'attributes'." 

    for name, value in attributes: 
     if name == "id": 
      self.divs.append(value) 
      self.inside_div_element = 1 

def end_div(self): 
    "Record the end of a hyperlink." 

    self.inside_div_element = 0 

def handle_data(self, data): 
    "Handle the textual 'data'." 

    if self.inside_div_element: 
     self.descriptions.append(data) 


def get_div(self): 
    "Return the list of hyperlinks." 

    return self.divs 

def get_descriptions(self, check): 
    "Return a list of descriptions." 
if check == 1: 
    self.descriptions.pop(0) 
    return self.descriptions 

def rm_descriptions(self): 
"Remove all descriptions." 

self.descriptions.pop() 

import urllib 
import linecache 
import sgmllib 


tempLine = "" 
tempStr = " " 
tempStr2 = "" 
myparser = MyParser() 
count = 0 
user = [''] 
oldUser = ['none'] 
oldoldUser = [' '] 
array = [" ", 0] 
index = 0 
found = 0  
k = 0 
j = 0 
posIndex = 0 
a = 0 
firstCheck = 0 
fCheck = 0 
while a < 1000: 

print a 
f = urllib.urlopen("SITE") 
a = a+1 

for i, line in enumerate(f): 


    if i == 187: 
     print i 
     tempLine = line 
     print line 

     myparser.parse(line) 
     if fCheck == 1: 
      result = oldUser[0] is oldUser[1] 

      u1 = oldUser[0] 
      u2 = oldUser[1] 
      tempStr = oldUser[1] 
      if u1 == u2: 
       result = 1 
     else: 
      result = user is oldUser 
     fCheck = 1 

     user = myparser.get_descriptions(firstCheck) 
     tempStr = user[0] 
     firstCheck = 1 



     if result: 

      array[index+1] = array[index+1] +0 

     else: 
      j = 0 

      for z in array: 
       k = j+2 

       tempStr2 = user[0] 
       if k < len(array) and tempStr2 == array[k]: 

        array[j+3] = array[j+3] + 1 
        index = j+2 
        found = 1 
        break 
       j = j+1 
      if found == 0: 

       array.append(tempStr) 
       array.append(0) 


     oldUser = user 
     found = 0 
     print array 


    elif i > 200: 
     print "HERE" 
     break 



print array 
f.close() 

回答

0

也許該網頁上的行數比您想象的要少?這是什麼給你?:

print max(i for i, _ in enumerate(urllib.urlopen("SITE"))) 
+0

564行。令人困惑的是,它經歷了30-35次之間的if i == 187,然後再也沒有......我不明白這一點 – Michael 2011-05-13 22:21:50

0

旁白:你的壓痕是while a < 1000:行後釀。過多的空行和單字母名稱不能幫助您理解您的代碼。

enumerate沒有壞掉。而不是這種猜測,檢查你的數據。建議:更換

for i, line in enumerate(f): 

通過

lines = list(f) 
print "=== a=%d linecount=%d === % (a, len(lines)) 
for i, line in enumerate(lines): 
    print " a=%d i=%d line=%r" % (a, i, line) 

仔細檢查輸出。