2016-11-09 70 views
1

在過去的一週左右,我一直在調查內存使用量隨着時間累積的應用程序中的問題。我把範圍縮小到一個行拷貝malloc_trim(0)釋放線程競技場的倉位?

std::vector< std::vector< std::vector< std::map< uint, map< uint, std::bitset< N> > > > > >

在輔助線程

(我知道這是組織內存荒謬的方式)。定期,工作線程被銷燬,重新創建,並且該線程在啓動時複製該內存結構。被複制的原始數據通過主線程的引用傳遞給工作線程。

使用malloc_stat和malloc_info,我可以看到,當工作線程被銷燬時,它所使用的競技場/堆將在其空閒空閒列表中保留用於該結構的內存。這是有道理的,因爲有很多單獨的分配小於64字節。

問題是,當工作線程被重新創建時,它會創建一個新的競技場/堆,而不是重複使用前一個,這樣以前的場/堆的空位將不會被重用。最終,系統在重用先前的堆/競技場重用他們所持有的齋戒之前耗盡內存。

有點意外,我發現在加入工作線程後,在我的主線程中調用malloc_trim(0)會導致線程arenas/heap中的fastbins被釋放。就我所見,這種行爲是無證的。有沒有人有解釋?

這裏是我使用的是看這個行爲的一些測試代碼:

// includes 
#include <stdio.h> 
#include <algorithm> 
#include <vector> 
#include <iostream> 
#include <stdexcept> 
#include <stdio.h> 
#include <string> 
#include <mcheck.h> 
#include <malloc.h> 
#include <map> 
#include <bitset> 
#include <boost/thread.hpp> 
#include <boost/shared_ptr.hpp> 

// Number of bits per bitset. 
const int sizeOfBitsets = 40; 

// Executes a system command. Used to get output of "free -m". 
std::string ExecuteSystemCommand(const char* cmd) { 
    char buffer[128]; 
    std::string result = ""; 
    FILE* pipe = popen(cmd, "r"); 
    if (!pipe) throw std::runtime_error("popen() failed!"); 
    try { 
     while (!feof(pipe)) { 
      if (fgets(buffer, 128, pipe) != NULL) 
       result += buffer; 
     } 
    } catch (...) { 
     pclose(pipe); 
     throw; 
    } 
    pclose(pipe); 
    return result; 
} 

// Prints output of "free -m" and output of malloc_stat(). 
void PrintMemoryStats() 
{ 
    try 
    { 
     char *buf; 
     size_t size; 
     FILE *fp; 

     std::string myCommand("free -m"); 
     std::string result = ExecuteSystemCommand(myCommand.c_str()); 
     printf("Free memory is \n%s\n", result.c_str()); 

     malloc_stats(); 

     fp = open_memstream(&buf, &size); 
     malloc_info(0, fp); 
     fclose(fp); 
     printf("# Memory Allocation Stats\n%s\n#> ", buf); 
     free(buf); 

    } 
    catch(...) 
    { 
     printf("Unable to print memory stats.\n"); 
     throw; 
    } 
} 

void MakeCopies(std::vector<std::vector<std::map<uint, std::map<uint, std::bitset<sizeOfBitsets> > > > >& data) 
{ 
    try 
    { 
     // Create copies. 
     std::vector<std::vector<std::map<uint, std::map<uint, std::bitset<sizeOfBitsets> > > > > dataCopyA(data); 
     std::vector<std::vector<std::map<uint, std::map<uint, std::bitset<sizeOfBitsets> > > > > dataCopyB(data); 
     std::vector<std::vector<std::map<uint, std::map<uint, std::bitset<sizeOfBitsets> > > > > dataCopyC(data); 

     // Print memory info. 
     printf("Memory after creating data copies:\n"); 
     PrintMemoryStats(); 
    } 
    catch(...) 
    { 
     printf("Unable to make copies."); 
     throw; 
    } 
} 

int main(int argc, char** argv) 
{ 
    try 
    { 
      // When uncommented, disables the use of fastbins. 
//  mallopt(M_MXFAST, 0); 

     // Print memory info. 
     printf("Memory to start is:\n"); 
     PrintMemoryStats(); 

     // Sizes of original data. 
     int sizeOfDataA = 2048; 
     int sizeOfDataB = 4; 
     int sizeOfDataC = 128; 
     int sizeOfDataD = 20; 
     std::vector<std::vector<std::map<uint, std::map<uint, std::bitset<sizeOfBitsets> > > > > testData; 

     // Populate data. 
     testData.resize(sizeOfDataA); 
     for(int a = 0; a < sizeOfDataA; ++a) 
     { 
      testData.at(a).resize(sizeOfDataB); 
      for(int b = 0; b < sizeOfDataB; ++b) 
      { 
       for(int c = 0; c < sizeOfDataC; ++c) 
       { 
        std::map<uint, std::bitset<sizeOfBitsets> > dataMap; 
        testData.at(a).at(b).insert(std::pair<uint, std::map<uint, std::bitset<sizeOfBitsets> > >(c, dataMap)); 
        for(int d = 0; d < sizeOfDataD; ++d) 
        { 
         std::bitset<sizeOfBitsets> testBitset; 
         testData.at(a).at(b).at(c).insert(std::pair<uint, std::bitset<sizeOfBitsets> >(d, testBitset)); 
        } 
       } 
      } 
     } 

     // Print memory info. 
     printf("Memory to after creating original data is:\n"); 
     PrintMemoryStats(); 

     // Start thread to make copies and wait to join. 
     { 
      boost::shared_ptr<boost::thread> makeCopiesThread = boost::shared_ptr<boost::thread>(new boost::thread(&MakeCopies, boost::ref(testData))); 
      makeCopiesThread->join(); 
     } 

     // Print memory info. 
     printf("Memory to after joining thread is:\n"); 
     PrintMemoryStats(); 

     malloc_trim(0); 

     // Print memory info. 
     printf("Memory to after malloc_trim(0) is:\n"); 
     PrintMemoryStats(); 

     return 0; 

    } 
    catch(...) 
    { 
     // Log warning. 
     printf("Unable to run application."); 

     // Return failure. 
     return 1; 
    } 

    // Return success. 
    return 0; 
} 

從前後修整的malloc調用(尋找「看這裏!」)有趣的輸出:

#> Memory to after joining thread is: 
Free memory is 
       total  used  free  shared buff/cache available 
Mem:   257676  7361  246396   25  3918  249757 
Swap:   1023   0  1023 

Arena 0: 
system bytes  = 1443450880 
in use bytes  = 1443316976 
Arena 1: 
system bytes  = 35000320 
in use bytes  =  6608 
Total (incl. mmap): 
system bytes  = 1478451200 
in use bytes  = 1443323584 
max mmap regions =   0 
max mmap bytes =   0 
# Memory Allocation Stats 
<malloc version="1"> 
<heap nr="0"> 
<sizes> 
<size from="241" to="241" total="241" count="1"/> 
<size from="529" to="529" total="529" count="1"/> 
</sizes> 
<total type="fast" count="0" size="0"/> 
<total type="rest" count="2" size="770"/> 
<system type="current" size="1443450880"/> 
<system type="max" size="1443459072"/> 
<aspace type="total" size="1443450880"/> 
<aspace type="mprotect" size="1443450880"/> 
</heap> 
<heap nr="1"> 
<sizes> 
<size from="33" to="48" total="48" count="1"/> 
<size from="49" to="64" total="4026531712" count="62914558"/> <-- LOOK HERE! 
<size from="65" to="80" total="160" count="2"/> 
<size from="81" to="96" total="301989888" count="3145728"/> <-- LOOK HERE! 
<size from="33" to="33" total="231" count="7"/> 
<size from="49" to="49" total="1274" count="26"/> 
<unsorted from="0" to="49377" total="1431600" count="6144"/> 
</sizes> 
<total type="fast" count="66060289" size="4328521808"/> 
<total type="rest" count="6177" size="1433105"/> 
<system type="current" size="4329967616"/> 
<system type="max" size="4329967616"/> 
<aspace type="total" size="35000320"/> 
<aspace type="mprotect" size="35000320"/> 
</heap> 
<total type="fast" count="66060289" size="4328521808"/> 
<total type="rest" count="6179" size="1433875"/> 
<total type="mmap" count="0" size="0"/> 
<system type="current" size="5773418496"/> 
<system type="max" size="5773426688"/> 
<aspace type="total" size="1478451200"/> 
<aspace type="mprotect" size="1478451200"/> 
</malloc> 

#> Memory to after malloc_trim(0) is: 
Free memory is 
       total  used  free  shared buff/cache available 
Mem:   257676  3269  250488   25  3918  253850 
Swap:   1023   0  1023 

Arena 0: 
system bytes  = 1443319808 
in use bytes  = 1443316976 
Arena 1: 
system bytes  = 35000320 
in use bytes  =  6608 
Total (incl. mmap): 
system bytes  = 1478320128 
in use bytes  = 1443323584 
max mmap regions =   0 
max mmap bytes =   0 
# Memory Allocation Stats 
<malloc version="1"> 
<heap nr="0"> 
<sizes> 
<size from="209" to="209" total="209" count="1"/> 
<size from="529" to="529" total="529" count="1"/> 
<unsorted from="0" to="49377" total="1431600" count="6144"/> 
</sizes> 
<total type="fast" count="0" size="0"/> 
<total type="rest" count="6146" size="1432338"/> 
<system type="current" size="1443459072"/> 
<system type="max" size="1443459072"/> 
<aspace type="total" size="1443459072"/> 
<aspace type="mprotect" size="1443459072"/> 
</heap> 
<heap nr="1"> <---------------------------------------- LOOK HERE! 
<sizes> <-- HERE! 
<unsorted from="0" to="67108801" total="4296392384" count="6208"/> 
</sizes> 
<total type="fast" count="0" size="0"/> 
<total type="rest" count="6208" size="4296392384"/> 
<system type="current" size="4329967616"/> 
<system type="max" size="4329967616"/> 
<aspace type="total" size="35000320"/> 
<aspace type="mprotect" size="35000320"/> 
</heap> 
<total type="fast" count="0" size="0"/> 
<total type="rest" count="12354" size="4297824722"/> 
<total type="mmap" count="0" size="0"/> 
<system type="current" size="5773426688"/> 
<system type="max" size="5773426688"/> 
<aspace type="total" size="1478459392"/> 
<aspace type="mprotect" size="1478459392"/> 
</malloc> 

#> 

malloc_info的輸出很少或沒有文檔,所以我不確定那些我指出的輸出是否真的很快。爲了驗證他們確實是fastbins,我取消註釋代碼行

mallopt(M_MXFAST, 0); 

禁止使用fastbins和加入線程後堆1的內存使用情況,要求malloc_trim(0),貌似在做之前在調用malloc_trim(0)後啓用fastbins。最重要的是,禁止使用fastbins會在線程加入後立即將內存返回給系統。調用malloc_trim(0)後,加入啓用了fastbins的線程後,還會將內存返回給系統。

malloc_trim(0)的文檔聲明它只能從主存儲區堆的頂部釋放內存,所以這裏是怎麼回事?我正在使用glibc 2.17版的CentOS 7上運行。

回答

1

malloc_trim(0)聲明它只能從主存儲堆頂部釋放內存,所以這裏是怎麼回事?

它可以被稱爲「過時」或「不正確」的文件。 Glibc沒有documentation of malloc_trim function;而Linux使用man-pages項目的手冊頁。作爲新的手冊頁malloc_trimhttp://man7.org/linux/man-pages/man3/malloc_trim.3.htmlwas written in 2012 by maintainer of man-pages。可能他使用了一些來自glibc malloc/malloc的評論。C源代碼http://code.metager.de/source/xref/gnu/glibc/malloc/malloc.c#675

676 malloc_trim(size_t pad); 
677 
678 If possible, gives memory back to the system (via negative 
679 arguments to sbrk) if there is unused memory at the `high' end of 
680 the malloc pool. You can call this after freeing large blocks of 
681 memory to potentially reduce the system-level memory requirements 
682 of a program. However, it cannot guarantee to reduce memory. Under 
683 some allocation patterns, some large free blocks of memory will be 
684 locked between two used chunks, so they cannot be given back to 
685 the system. 
686 
687 The `pad' argument to malloc_trim represents the amount of free 
688 trailing space to leave untrimmed. If this argument is zero, 
689 only the minimum amount of memory to maintain internal data 
690 structures will be left (one page or less). Non-zero arguments 
691 can be supplied to maintain enough trailing space to service 
692 future expected allocations without having to re-obtain memory 
693 from the system. 
694 
695 Malloc_trim returns 1 if it actually released any memory, else 0. 
696 On systems that do not support "negative sbrks", it will always 
697 return 0. 

在glibc的實際實現是__malloc_trim並且它具有的代碼用於遍歷領域:使用mtrim()mTRIm())函數

http://code.metager.de/source/xref/gnu/glibc/malloc/malloc.c#4552

4552 int 
4553 __malloc_trim (size_t s) 

4560 mstate ar_ptr = &main_arena; 
4561 do 
4562 { 
4563  (void) mutex_lock (&ar_ptr->mutex); 
4564  result |= mtrim (ar_ptr, s); 
4565  (void) mutex_unlock (&ar_ptr->mutex); 
4566 
4567  ar_ptr = ar_ptr->next; 
4568 } 
4569 while (ar_ptr != &main_arena); 

每競技場被修整,這調用malloc_consolidate()將所有空閒段從快速轉換(它們不是免費的,因爲它們快)而將其轉換爲正常的空閒塊(它們是coale與相鄰的組塊SCED)

4498 /* Ensure initialization/consolidation */ 
4499 malloc_consolidate (av); 

4111 malloc_consolidate is a specialized version of free() that tears 
4112 down chunks held in fastbins. 

1581 Fastbins 
1591 Chunks in fastbins keep their inuse bit set, so they cannot 
1592 be consolidated with other free chunks. malloc_consolidate 
1593 releases all chunks in fastbins and consolidates them with 
1594 other free chunks. 

的問題是,當工作線程重新創建時,它創建的,而不是再利用前一個一個新的競技場/堆,使得從前面的舞臺上fastbins /堆是從來重複使用。

這很奇怪。按照設計,在glibc malloc中,最大數量的場所被cpu_core_count * 8限制(對於64位平臺); cpu_core_count * 2(用於32位平臺)或環境變量MALLOC_ARENA_MAX/mallopt參數M_ARENA_MAX

您可以限制應用程序的競技場數量;定期調用malloc_trim()或「大」尺寸打電話malloc()(它會調用malloc_consolidate),然後從你的線程free()它只是破壞之前:

3319 _int_malloc (mstate av, size_t bytes) 
3368 if ((unsigned long) (nb) <= (unsigned long) (get_max_fast())) 
// fastbin allocation path 
3405 if (in_smallbin_range (nb)) 
// smallbin path; malloc_consolidate may be called 
3437  If this is a large request, consolidate fastbins before continuing. 
3438  While it might look excessive to kill all fastbins before 
3439  even seeing if there is space available, this avoids 
3440  fragmentation problems normally associated with fastbins. 
3441  Also, in practice, programs tend to have runs of either small or 
3442  large requests, but less often mixtures, so consolidation is not 
3443  invoked all that often in most programs. And the programs that 
3444  it is called frequently in otherwise tend to fragment. 
3445 */ 
3446 
3447 else 
3448 { 
3449  idx = largebin_index (nb); 
3450  if (have_fastchunks (av)) 
3451  malloc_consolidate (av); 
3452 } 

PS:有在malloc_trimhttps://github.com/mkerrisk/man-pages/commit/a15b0e60b297e29c825b7417582a33e6ca26bf65手冊頁註釋:

+.SH NOTES 
+This function only releases memory in the main arena. 
+.\" malloc/malloc.c::mTRIm(): 
+.\" return result | (av == &main_arena ? sYSTRIm (pad, av) : 0); 

是的,有檢查main_arena,但它是在malloc_trim實施mTRIm()盡頭,它只是調用sbrk()負偏移。 Since 2007 (glibc 2.9 and newer) there is another method將內存返回到操作系統:madvise(MADV_DONTNEED)這是用於所有領域(並沒有記錄glibc補丁或手冊頁的作者)。每個舞臺都需要合併。還有一些代碼用於修剪(munmapping)mmap堆積的頂層塊(heap_trim/shrink_heap,由慢速路徑釋放())調用,但不從malloc_trim調用。