2016-08-12 68 views
3

首先問題的背景:我有一個非常大的圖形,大約需要4GB的存儲空間。關於3M節點和34M邊緣。我的程序使用這個大圖並遞歸地從它構建更小的圖。在遞歸的每個級別,我都有兩個圖 - 原始圖和從原始圖創建的圖。遞歸繼續下去,直到圖形減少到非常小的圖形,例如約10個節點。在內存中存儲大型地圖

因爲我需要這些圖表來完成程序的全部執行,所以對於我的應用程序而言,內存效率是至關重要的。

現在,這裏是我目前遇到的問題: 這是算法創建從更大的一個小圖:

public static Graph buildByTriples(Graph g, ArrayList<Integer> seeds) { 
    ArrayList<Edge> edges = new ArrayList(g.getEdgeCount()); 
    for (int i = 0; i < g.size(); i++) { 
     for (Edge e : g.adj(i)) { 
      int v = e.getEndpoint(i); 
      if (i < v) { 
       edges.add(e); 
      } 
     } 
    } 

    Table<Integer, Integer, Double> coarseEgdes = HashBasedTable.create(seeds.size(),seeds.size()); 
    //compute coarse weights 
    edges.stream().forEach((e) -> { 
     int v = e.getV(); 
     int u = e.getU(); 
     if (g.isC(u) && g.isC(v)) { 
      addToTable(coarseEgdes, u, v, e.getWeight()); 
     }else if(!g.isC(u) && g.isC(v)){ //F-C 
      for(Edge cEdge: g.cAdj(u)){//get coarse neighbors of the fine edges 
       int nb = cEdge.getEndpoint(u); 
       if(nb != v){ 
        addToTable(coarseEgdes, v, nb, cEdge.getPij() * e.getWeight()); 

       } 
      } 
     }else if(g.isC(u) && !g.isC(v)){//C-F 
      for(Edge cEdge: g.cAdj(v)){//get coarse neighbors of the fine edges 
       int nb = cEdge.getEndpoint(v); 
       if(nb != u){ 
        addToTable(coarseEgdes, u, nb, cEdge.getPij() * e.getWeight()); 
       } 
      } 
     }else{//F-F 
      for(Edge cEdgeU: g.cAdj(u)){//get coarse neighbors of the fine edges 
       int uNb = cEdgeU.getEndpoint(u); 
       for(Edge cEdgeV: g.cAdj(v)){ 
        int vNb = cEdgeV.getEndpoint(v); 
        if(uNb != vNb){ 
         addToTable(coarseEgdes, uNb, vNb, cEdgeU.getPij() * e.getWeight() * cEdgeV.getPij()); 
        } 
       } 
      } 
     } 
    }); 

    return createGraph(g, coarseEgdes); //use the edges to build new graph. Basically loops through coarseEdges and add edge and weight to the new graph. 
} 

private static void addToTable(Table<Integer, Integer,Double> tbl, int r, int c, double val){ 
    int mn = Math.min(r, c);//the smaller of the two nodeIds 
    int mx = Math.min(r, c);//the largest of the two nodeId 
    if(tbl.contains(mn, mx)){ 
     tbl.put(mn, mx, tbl.get(mn, mx) + val); 
    }else{ 
     tbl.put(mn, mx,val); 
    } 
} 

現在,當我這樣做,我很快就會用完內存。我使用YourKit來描述應用程序,並且內存使用率已超過屋頂(在耗盡前大於6GB),因此也是CPU使用率。 coarseEdges可以變得非常大。是否有一個更好的內存映射實現可以與大數據集一起擴展?還是有沒有更好的方式來做到這一點,而不儲存coarseEdges

PS:請注意,我的圖形無法在常量時間內檢索邊(u,v)。它基本上是一個列表清單,這更好地爲我的應用程序的其他關鍵部分提供了性能。

**Also See my graph implementation code below: ** 
public class Graph{ 
    private final int SIZE; 
    private final EdgeList[] nodes; 
    private final float[] volumes; 
    private final double[] weightedSum; 
    private final double[] weightedCoarseSum; 
    private final int[] nodeDegrees; 
    private final int[] c_nodeDegrees; 
    private int edge_count=0; 
    private final boolean[] coarse; 
    private final EdgeList[] coarse_neighbors; 
    public Graph(int SIZE){ 
     this.SIZE =SIZE; 
     nodes = new EdgeList[SIZE]; 
     coarse_neighbors = new EdgeList[SIZE]; 

     volumes = new float[SIZE]; 
     coarse = new boolean[SIZE]; 

     //initialize data 
     weightedSum = new double[SIZE]; 
     weightedCoarseSum = new double[SIZE]; 
     nodeDegrees= new int[SIZE]; 
     c_nodeDegrees = new int[SIZE]; 

     for(int i=0;i<SIZE;i++){ 
      nodes[i]=new EdgeList(); 
      coarse_neighbors[i] = new EdgeList(); 
      volumes[i]=1; 
     } 
    } 

    public void addEdge(int u, int v, double w){ 
     //graph is undirected 
     //In order to traverse edges in order such that u < v. We store edge u,v such that u<v 
     Edge e=null; 
     if(u<v){ 
      e = new Edge(u,v,w); 
     }else if(u>v){ 
      e = new Edge(v,u,w); 
     }else{ 
      throw new UnsupportedOperationException("Self loops not allowed in graph"); //TODO: Need a graph validation routine 
     } 

     nodes[u].add(e); 
     nodes[v].add(e); 

     //update the weighted sum of each edge 
     weightedSum[u] += w; 
     weightedSum[v] += w; 

     //update the degree of each edge 
     ++nodeDegrees[u]; 
     ++nodeDegrees[v]; 

     ++edge_count; 
    } 

    public int size(){ 
     return SIZE; 
    } 

    public EdgeList adj(int v){ 
     return nodes[v]; 
    } 

    public EdgeList cAdj(int v){ 
     return coarse_neighbors[v]; 
    } 

    public void sortAdj(int u, Comparator<Edge> c){ 
     nodes[u].sort(c); 
    } 

    public void sortCoarseAdj(int u, Comparator<Edge> c){ 
     coarse_neighbors[u].sort(c); 
    } 

    public void setCoarse(int node, boolean c){ 
     coarse[node] = c; 
     if(c){ 
      //update the neighborHood of node 
      for(Edge e: adj(node)){ 
       int v = e.getEndpoint(node); 
       coarse_neighbors[v].add(e); 
       weightedCoarseSum[v] += e.getWeight(); 
       ++c_nodeDegrees[v]; 
      } 
     } 
    } 

    public int getEdgeCount(){ 
     return edge_count; 
    } 

    public boolean isC(int id){ 
     return coarse[id]; 
    } 

    public double weightedDegree(int node){ 
     return weightedSum[node]; 
    } 

    public double weightedCoarseDegree(int node){ 
     return weightedCoarseSum[node]; 
    } 

    public int degree(int u){ 
     return nodeDegrees[u]; 
    } 

    public int cDegree(int u){ 
     return c_nodeDegrees[u]; 
    } 

    public Edge getCNeighborAt(int u,int idx){ 
     return coarse_neighbors[u].getAt(idx); 
    } 

    public float volume(int u){ 
     return volumes[u]; 
    } 

    public void setVolume(int node, float v){ 
     volumes[node] = v; 
    } 

    @Override 
    public String toString() { 
     return "Graph[nodes:"+SIZE+",edges:"+edge_count+"]"; 
    } 

} 


//Edges are first class objects. 
public class Edge { 
    private boolean deleted=false; 
    private int u; 
    private int v; 
    private double weight; 
    private double pij; 
    private double algebraicDist = (1/Constants.EPSILON); 

    public Edge(int u, int v, double weight) { 
     this.u = u; 
     this.v = v; 
     this.weight = weight; 
    } 

    public Edge() { 
    } 

    public int getU() { 
     return u; 
    } 

    public void setU(int u) { 
     this.u = u; 
    } 

    public int getV() { 
     return v; 
    } 

    public void setV(int v) { 
     this.v = v; 
    } 

    public int getEndpoint(int from){ 
     if(from == v){ 
      return u; 
     } 

     return v; 
    } 

    public double getPij() { 
     return pij; 
    } 

    public void setPij(double pij) { 
     this.pij = pij; 
    } 

    public double getAlgebraicDist() { 
     return algebraicDist; 
    } 

    public void setAlgebraicDist(double algebraicDist) { 
     this.algebraicDist = algebraicDist; 
    } 

    public boolean isDeleted() { 
     return deleted; 
    } 

    public void setDeleted(boolean deleted) { 
     this.deleted = deleted; 
    } 

    public double getWeight() { 
     return weight; 
    } 

    public void setWeight(double weight) { 
     this.weight = weight; 
    } 

    @Override 
    public String toString() { 
     return "Edge[u:"+u+", v:"+v+"]"; 
    } 
} 


// The Edge iterable 
public class EdgeList implements Iterable<Edge>{ 
    private final ArrayList<Edge> data= new ArrayList(); 

    public void add(Edge e){ 
     data.add(e); 
    } 

    @Override 
    public Iterator<Edge> iterator() { 
     Iterator<Edge> it = new IteratorImpl(); 
     return it; 
    } 

    private class IteratorImpl implements Iterator<Edge> { 

     public IteratorImpl() { 
     } 
     private int currentIndex = 0; 
     private final int N = data.size(); 
     @Override 
     public boolean hasNext() { 

      //skip deleted 
      while(currentIndex < N && data.get(currentIndex).isDeleted()){ 
       currentIndex++; 
      } 

      return currentIndex < N; 
     } 

     @Override 
     public Edge next() { 
      return data.get(currentIndex++); 
     } 

     @Override 
     public void remove() { 
      throw new UnsupportedOperationException(); 
     } 
    } 

    public Edge getAt(int idx){ 
     return data.get(idx); 
    } 

    public void sort(Comparator<Edge> c){ 
     data.sort(c); 
    } 
} 
+0

其中'圖表'實現你正在用嗎? 「com.google.guava:guava:20.0-SNAPSHOT」中有一個'Graph'接口,但它的API沒有這些方法。我發現在不理解不同的方法的情況下遵循算法有點困難,等等。 – mfulton26

+0

該圖是我的實現。我已經包含了圖形實現供您參考。 – unekwu

+0

這個問題在這裏很好,但也適合[CR](http://codereview.stackexchange.com/questions/tagged/java),在那裏你可以得到一些有用的一般建議。注意:你的'next'和'hasNext'被打破了(儘管它們在普通場景下工作) – maaartinus

回答

4

在這裏盲人很少刺 - 你需要實施它們來看看它有多大的幫助。

1)你可能會考慮使用組合鍵(int,int)和hashmap,而不是guava表。對於邊權重,它肯定會更有效率。如果你需要查詢從某個頂點出來的邊,那麼它就不那麼明顯了,但是你需要看到CPU與內存的折衷。

2)如果你使用普通的hashmap,你可以考慮使用一種堆外實現。看看https://github.com/OpenHFT/Chronicle-Map例如,它可能是

3)如果你留在記憶中,想擠出一些額外的空間,你可以用原始地圖做一些骯髒的詭計。使用long-> double map,例如http://labs.carrotsearch.com/download/hppc/0.4.1/api/com/carrotsearch/hppc/LongDoubleMap.htmlhttp://trove4j.sourceforge.net/javadocs/gnu/trove/map/hash/TLongDoubleHashMap.html,將您的2xint頂點對編碼爲long,並查看它有多大幫助。如果使用64位,整數可以佔用16個字節(假設壓縮oops),雙24字節 - 這使得每個條目32 + 24 = 56個字節,與具有原始映射的8 + 8相比

+0

重新編號節點並切換到'ArrayList >'有所作爲,我沒有嘗試其他選擇。 – unekwu