2013-09-24 73 views
3

我有一個程序,在udp上監聽日誌流量,嘗試解析它,然後將其插入到Redis中。在流量一定的水平,內存似乎「爆炸」(迅速從幾百兆字節提高到千兆字節golang內存爆炸:newdefer

我抓起一堆資料這個發生後不久,它返回如下:

(pprof) top100 -cum 
Total: 1731.3 MB 
    0.0 0.0% 0.0% 1731.3 100.0% gosched0 
    1162.5 67.1% 67.1% 1162.5 67.1% newdefer 
    0.0 0.0% 67.1% 1162.5 67.1% runtime.deferproc 
    0.0 0.0% 67.1% 1162.0 67.1% main.TryParse 
    0.0 0.0% 67.1% 438.0 25.3% runtime.main 
    301.5 17.4% 84.6% 437.5 25.3% main.main 
    136.0 7.9% 92.4% 136.0 7.9% runtime.malg 
    0.0 0.0% 92.4% 136.0 7.9% runtime.newproc 
    0.0 0.0% 92.4% 136.0 7.9% runtime.newproc1 
    1.5 0.1% 92.5% 131.3 7.6% main.RedisCuller 
    0.0 0.0% 92.5% 108.5 6.3% github.com/garyburd/redigo/redis.(*conn).Do 
    0.0 0.0% 92.5% 108.5 6.3% github.com/garyburd/redigo/redis.(*conn).readReply 
    0.0 0.0% 92.5% 108.5 6.3% github.com/garyburd/redigo/redis.(*pooledConnection).Do 
    95.8 5.5% 98.0%  95.8 5.5% cnew 
    0.0 0.0% 98.0%  95.8 5.5% runtime.cnewarray 
    34.0 2.0% 100.0%  34.0 2.0% runtime.convT2E 
    0.0 0.0% 100.0%  0.5 0.0% main.init 
    0.0 0.0% 100.0%  0.5 0.0% net/http/pprof.init 
    0.0 0.0% 100.0%  0.5 0.0% sync.(*Once).Do 
    0.0 0.0% 100.0%  0.5 0.0% syscall.Getenv 
    0.0 0.0% 100.0%  0.5 0.0% time.init 

當節目是「健康」的外形看起來更像是:

(pprof) top20 -cum 
Total: 186.7 MB 
    0.0 0.0% 0.0% 186.7 100.0% gosched0 
    0.5 0.3% 0.3% 122.7 65.7% main.RedisCuller 
    0.0 0.0% 0.3% 103.5 55.4% github.com/garyburd/redigo/redis.(*pooledConnection).Do 
    0.0 0.0% 0.3% 103.0 55.2% github.com/garyburd/redigo/redis.(*conn).Do 
    0.0 0.0% 0.3% 103.0 55.2% github.com/garyburd/redigo/redis.(*conn).readReply 
    88.2 47.2% 47.5%  88.2 47.2% cnew 
    0.0 0.0% 47.5%  88.2 47.2% runtime.cnewarray 
    0.0 0.0% 47.5%  57.0 30.5% main.TryParse 
    57.0 30.5% 78.0%  57.0 30.5% newdefer 
    0.0 0.0% 78.0%  57.0 30.5% runtime.deferproc 
    34.0 18.2% 96.3%  34.0 18.2% runtime.convT2E 
    1.5 0.8% 97.1%  6.5 3.5% main.main 
    0.0 0.0% 97.1%  6.5 3.5% runtime.main 
    5.0 2.7% 99.7%  5.0 2.7% runtime.malg 
    0.0 0.0% 99.7%  5.0 2.7% runtime.newproc 
    0.0 0.0% 99.7%  5.0 2.7% runtime.newproc1 
    0.0 0.0% 99.7%  0.5 0.3% bufio.NewWriter 
    0.0 0.0% 99.7%  0.5 0.3% bufio.NewWriterSize 
    0.0 0.0% 99.7%  0.5 0.3% github.com/garyburd/redigo/redis.(*Pool).get 
    0.0 0.0% 99.7%  0.5 0.3% github.com/garyburd/redigo/redis.(*pooledConnection).get 

唯一的延遲我在我的代碼周圍的解析函數(因爲它可以經常失敗):

for { 
      rlen, _, err := sock.ReadFromUDP(buf[0:]) 
      checkError(err) 
      raw := logrow.RawRecord(string(buf[:rlen])) 
      go TryParse(raw, c) 
    } 

    ... 

    func TryParse(raw logrow.RawRecord, c chan logrow.Record) { 
     defer func() { 
       if r := recover(); r != nil { 
         //log.Printf("Failed Parse due to panic: %v", raw) 
         return 
       } 
     }() 
     rec, ok := logrow.ParseRawRecord(raw) 
     if !ok { 
       return 
       //log.Printf("Failed Parse: %v", raw) 
     } else { 
       c <- rec 
     } 
} 

有沒有人看到任何明顯的我做錯了,可能會導致記憶突然氣球?或者可能提供一些方向來固定它?

編輯(圍繞logrow.Record頻道更多的代碼):

c := make(chan logrow.Record) 
... 
go RedisInserter(c, bucket, retention, pool) 

func RedisInserter(c chan logrow.Record, bucket, retention int, p *redis.Pool) { 
     for rec := range c { 
       logrow.SendToRedis(rec, bucket, retention, p) 
     } 
} 
+0

你正在關閉'c'('logrow.Record'的'chan')嗎?如果是這樣,你是否推遲關閉'c'?不知道還有什麼建議沒有看到更多的代碼抱歉。 – Intermernet

+0

@Intermernet:在該頻道中添加了更多的代碼。我從來沒有關閉它,因爲「應該」發生的是頻道不斷被「go TryParse」填充logrows,並且'RedisInserter'循環應該消耗這些行。現在看看它,我可以看到,也許logrow.SendToRedis可能會被阻止,並且頻道填滿 - 但是看不到這會導致newdefer內存使用率高? –

+0

閉包延遲功能中存在內存泄漏。如果您取消註釋printf,則會是這種情況。: https://codereview.appspot.com/10784043/ 您是否試過使用提示? – ClojureMostly

回答

1

原來是在合攏延遲功能,因爲我們看到(在圍棋本身有問題)泄漏在:

defer func() { 
      if r := recover(); r != nil { 
        //log.Printf("Failed Parse due to panic: %v", raw) 
        return 
      } 
    }() 

因此,用最新版本的Go進行重建可解決問題(參考:https://codereview.appspot.com/10784043/)。但是,聰明的事情當然是設計ParseRecordFunction,所以它不會試圖超越界限並導致恐慌。