爲什麼此腳本隨着輸入量增加而減慢每個項目的速度？

-1

#!/usr/bin/env pypy 

import json 
import cStringIO 
import sys 

def main(): 
    BUFSIZE = 10240 
    f = sys.stdin 
    decoder = json.JSONDecoder() 
    io = cStringIO.StringIO() 

    do_continue = True 
    while True: 
     read = f.read(BUFSIZE) 
     if len(read) < BUFSIZE: 
      do_continue = False 
     io.write(read) 
     try: 
      data, offset = decoder.raw_decode(io.getvalue()) 
      print(data) 
      rest = io.getvalue()[offset:] 
      if rest.startswith('\n'): 
       rest = rest[1:] 
      decoder = json.JSONDecoder() 
      io = cStringIO.StringIO() 
      io.write(rest) 
     except ValueError, e: 
      #print(e) 
      #print(repr(io.getvalue())) 
      continue 
     if not do_continue: 
      break 

if __name__ == '__main__': 
    main()

而這裏的測試用例：

$ yes '{}' | pv | pypy parser-test.py >/dev/null

正如你可以看到，下面的腳本減慢當你添加更多的投入到它。這也發生在cPython上。我試圖用mprof和cProfile來描述腳本，但是我沒有發現爲什麼會這樣。有人有線索嗎？

來源

2015-06-12 d33tah

爲什麼*不*它變慢更多的投入？ – jonrsharpe

我試圖讓它迭代 - 獲取一個對象，打印並丟棄它。我不希望那裏有內存泄漏。你能看到一個嗎？ – d33tah

我不是在談論內存泄漏！輸入越長，處理的時間就越長，除非你的算法是'O（1）'。或者你的意思是，隨着輸入長度的增加，每個項目需要更長的時間**。 – jonrsharpe

顯然，字符串操作會減慢速度。相反的：

 data, offset = decoder.raw_decode(io.getvalue()) 
     print(data) 
     rest = io.getvalue()[offset:] 
     if rest.startswith('\n'): 
      rest = rest[1:]

這是更好地做：

 data, offset = decoder.raw_decode(io.read()) 
     print(data) 
     rest = io.getvalue()[offset:] 
     io.truncate() 
     io.write(rest) 
     if rest.startswith('\n'): 
      io.seek(1)

來源

2015-06-12 19:53:16 d33tah

你可能想在迭代結束時關閉你的StringIO（寫之後）。

io.close()

一個StringIO的內存緩衝區將釋放一旦被關閉，但將保持開放，否則。這可以解釋爲什麼每個額外的輸入會減慢你的腳本。

來源

2015-06-12 19:37:46 eleventhend

是否這樣？ https://gist.github.com/d33tah/09144ba0ce596a6b92ba – d33tah

當然，那裏或立即後if-else語句。只要讓它在某個地方關閉，每次迭代都應該保持足夠的緩衝區。 – eleventhend

爲什麼此腳本隨着輸入量增加而減慢每個項目的速度？

回答

相關問題