2014-09-28 48 views
0

我具有表示如以下的層級關係的平坦數據:找到所有的後代深深樹結構的根據平面數據

ID Name PID 
0 A  NULL 
1 B  0 
2 C  0 
4 D  1 
5 E  1 
6 F  4 
3 G  0 

此表代表「數據表」,其中PID表示的父元素。 例如,在第一行中,我們看到A具有PID null,而B具有PID 0,這意味着B的父代是A,因爲0是A的ID,A是根元素,因爲它沒有PID。同樣,C有父A,因爲C也有PID0,0是A的ID。

我創建了一個DataTable類來表示上表。我還實現了方法processDataTable

public Map<String, List<String>> processDataTable() 

返回的映射使用元素作爲鍵,並將後代節點的集合保存爲值。例如,地圖中的第一項對應於元素A,其具有許多後代,而元素C沒有後代。輸出中的成員順序並不重要。

public static void main(String...arg) { 

    DataTable dt = newDataTable(); 

    dt.addRow(0, "A", null); 
    dt.addRow(1, "B", 0); 
    dt.addRow(2, "C", 0); 
    dt.addRow(4, "D", 1); 
    dt.addRow(5, "E", 1); 
    dt.addRow(6, "F", 4); 
    dt.addRow(3, "G", 0); 

    System.out.println("Output:"); 
    System.out.println(dt.processDataTable()); 
} 

Output: 
{D=[F], A=[B, C, G, D, E, F], B=[D, E, F]} 
or 
{D=[F], E=null, F=null, G=null, A=[B, C, G, D, E, F], B=[D, E, F], C=null} 

下面是我實現的數據表:

public class DataTable { 

    private List<Record> records = new ArrayList<>(); 
    private Map<Integer, Integer> indexes = new HashMap<>(); 
    private static final int PROCESSORS = Runtime.getRuntime().availableProcessors(); 

    /** 
    * Add new record into DataTable. 
    * 
    * @param id 
    * @param name 
    * @param parentId 
    */ 
    public void addRow(Integer id, String name, Integer parentId) { 
     if (indexes.get(id) == null) { 
      Record rec = new Record(id, name, parentId); 
      records.add(rec); 
      indexes.put(id, records.size() - 1); 
     } 
    } 

    public List<Record> getRecords() { 
     return records; 
    } 

    /** 
    * Process DataTable and return a Map of all keys and its children. The 
    * main algorithm here is to divide big record set into multiple parts, compute 
    * on multi threads and then merge all result together. 
    * 
    * @return 
    */ 
    public Map<String, List<String>> processDataTable() { 
     long start = System.currentTimeMillis(); 
     int size = size(); 

     // Step 1: Link all nodes together 
     invokeOnewayTask(new LinkRecordTask(this, 0, size)); 

     Map<String, List<String>> map = new ConcurrentHashMap<>(); 

     // Step 2: Get result 
     invokeOnewayTask(new BuildChildrenMapTask(this, 0, size, map)); 

     long elapsedTime = System.currentTimeMillis() - start; 

     System.out.println("Total elapsed time: " + elapsedTime + " ms"); 

     return map; 
    } 

    /** 
    * Invoke given task one way and measure the time to execute. 
    * 
    * @param task 
    */ 
    private void invokeOnewayTask(ForkJoinTask<?> task) { 
     long start = System.currentTimeMillis(); 
     ForkJoinPool pool = new ForkJoinPool(PROCESSORS); 
     pool.invoke(task); 
     long elapsedTime = System.currentTimeMillis() - start; 
     System.out.println(task.getClass().getSimpleName() + ":" + elapsedTime + " ms"); 
    } 

    /** 
    * Find record by id. 
    * 
    * @param id 
    * @return 
    */ 
    public Record getRecordById(Integer id) { 
     Integer pos = indexes.get(id); 
     if (pos != null) { 
      return records.get(pos); 
     } 
     return null; 
    } 

    /** 
    * Find record by row number. 
    * 
    * @param rownum 
    * @return 
    */ 
    public Record getRecordByRowNumber(Integer rownum) { 
     return (rownum < 0 || rownum > records.size() - 1) ? null:records.get(rownum); 
    } 

    public int size() { 
     return records.size(); 
    } 

    /** 
    * A task link between nodes 
    */ 
    private static class LinkRecordTask extends RecursiveAction { 

    private static final long serialVersionUID = 1L; 
    private DataTable dt; 
    private int start; 
    private int end; 
    private int limit = 100; 

    public LinkRecordTask(DataTable dt, int start, int end) { 
     this.dt = dt; 
     this.start = start; 
     this.end = end; 
    } 

    @Override 
    protected void compute() { 
     if ((end - start) < limit) { 
     for (int i = start; i < end; i++) { 
      Record r = dt.records.get(i); 
      Record parent = dt.getRecordById(r.parentId); 
      r.parent = parent; 
      if(parent != null) { 
       parent.children.add(r); 
      } 
     } 
     } else { 
      int mid = (start + end)/2; 
      LinkRecordTask left = new LinkRecordTask(dt, start, mid); 
      LinkRecordTask right = new LinkRecordTask(dt, mid, end); 
      left.fork(); 
      right.fork(); 
      left.join(); 
      right.join(); 
     } 
    } 

    } 

    /** 
    * Build Map<String, List<String>> result from given DataTable. 
    */ 
    private static class BuildChildrenMapTask extends RecursiveAction { 

     private static final long serialVersionUID = 1L; 
     private DataTable dt; 
     private int start; 
     private int end; 
     private int limit = 100; 
     private Map<String, List<String>> map; 

     public BuildChildrenMapTask(DataTable dt, int start, int end, Map<String, List<String>> map) { 
      this.dt = dt; 
      this.start = start; 
      this.end = end; 
      this.map = map; 
     } 

     @Override 
     protected void compute() { 
      if ((end - start) < limit) { 
       computeDirectly(); 
      } else { 
       int mid = (start + end)/2; 
       BuildChildrenMapTask left = new BuildChildrenMapTask(dt, start, mid, map); 
       BuildChildrenMapTask right = new BuildChildrenMapTask(dt, mid, end, map); 
       left.fork(); 
       right.fork(); 
       left.join(); 
       right.join(); 
      } 
     } 

     private void computeDirectly() { 
      for (int i = start; i < end; i++) { 
       Record rec = dt.records.get(i); 
       List<String> names = new ArrayList<String>(); 

       loadDeeplyChildNodes(rec, names); 

       if(!names.isEmpty()) { 
        map.put(rec.name, names); 
       } 
      } 
     } 

     private void loadDeeplyChildNodes(Record r, List<String> names) { 
      Collection<Record> children = r.children; 
      for(Record rec:children) { 
       if(!names.contains(rec.name)) { 
        names.add(rec.name); 
       } 
       loadDeeplyChildNodes(rec, names); 
      } 
     } 

    } 

} 

我的記錄類:

/** 
* Represents a structure of a record in DataTable. 
*/ 
public class Record { 

    public Integer id; 
    public String name; 
    public Integer parentId; 
    public Record parent; 
    public Collection<Record> children; 

    public Record(Integer id, String name, Integer parentId) { 
     this(); 
     this.id = id; 
     this.name = name; 
     this.parentId = parentId; 
    } 

    public Record() { 
     children = Collections.newSetFromMap(new ConcurrentHashMap<Record, Boolean>()) 
    } 

    public Collection<Record> getChildren() { 
     return children; 
    } 

    public Record getParent() { 
     return parent; 
    } 

    public Integer getParentId() { 
     return parentId; 
    } 

    @Override 
    public String toString() { 
     return "Record{" + "id=" + id + ", name=" + name + ", parentId=" + parentId + '}'; 
    } 

    /* (non-Javadoc) 
    * @see java.lang.Object#hashCode() 
    */ 
    @Override 
    public int hashCode() { 
     final int prime = 31; 
     int result = 1; 
     result = prime * result + ((id == null) ? 0 : id.hashCode()); 
     result = prime * result + ((name == null) ? 0 : name.hashCode()); 
     result = prime * result + ((parentId == null) ? 0 : parentId.hashCode()); 
     return result; 
    } 

    /* (non-Javadoc) 
    * @see java.lang.Object#equals(java.lang.Object) 
    */ 
    @Override 
    public boolean equals(Object obj) { 
    if (this == obj) { 
     return true; 
    } 
    if (obj == null) { 
     return false; 
    } 
    if (!(obj instanceof Record)) { 
     return false; 
    } 
    Record other = (Record) obj; 
    if (id == null) { 
     if (other.id != null) { 
     return false; 
     } 
    } else if (!id.equals(other.id)) { 
     return false; 
    } 
    if (name == null) { 
     if (other.name != null) { 
     return false; 
     } 
    } else if (!name.equals(other.name)) { 
     return false; 
    } 
    if (parentId == null) { 
     if (other.parentId != null) { 
     return false; 
     } 
    } else if (!parentId.equals(other.parentId)) { 
     return false; 
    } 
    return true; 
    } 

} 

我的算法是:

- Link all parent and child of each record 
    - Build the map 

On each step I apply fork join to divide the dataset into smaller parts and run in parellel. 

我不知道什麼是錯的做與此實現。有人能給我一些建議嗎?如果線性層次結構5K記錄(項目1是項目2的根目錄和父項目,項目2是項目3的父項目,項目3是項目4的父項目,等等),則此實現會發生OutOfmemory錯誤。它獲得了OutOfmemory,因爲它多次調用遞歸方法。

什麼是這個問題的好算法,或者我應該修改哪些數據結構以使其更好?

+0

在LinkChildrenTask的compute()中,爲什麼要向每個祖先的子列表添加一個節點?您是否打算將節點的每個子節點添加到該節點的子節點列表中,以便快速瞭解其子節點中的某個位置?這可以使用非常大量的內存。在你的例子中,大約有1250萬個引用。 – 2014-09-28 02:45:28

+0

我第一次實現我只是直接添加每個節點的子節點,而不是所有的子節點。但是在BuildChildrenMapTask中,它必須多次遞歸才能找到所有嵌套子元素,而且在該任務中也會失敗。有了這個測試用例,每個節點上都有很多子節點:(:(:(:( – Barcelona 2014-09-28 02:48:30

+0

這看起來明顯不同,它是否在loadDeeplyChildNodes()中出現錯誤?考慮到您的根節點將有一個堆棧5000調用深,根的孩子將有一個堆棧4999深等 – 2014-09-28 03:20:37

回答

4

你似乎已經開始祈禱寫更多的代碼,而不是做你想做的事情。鑑於您的數據,我們可以寫一個簡單的樹狀結構,讓你做的祖先和後代搜索:

import java.util.HashMap; 
import java.util.ArrayList; 

class Node { 
    // static lookup table, because we *could* try to find nodes by walking 
    // the node tree, but the ids are uniquely identifying: this way we can 
    // do an instant lookup. Efficiency! 
    static HashMap<Long, Node> NodeLUT = new HashMap<Long, Node>(); 

    // we could use Node.NodeLUT.get(...), but having a Node.getNode(...) is nicer 
    public static Node getNode(long id) { 
    return Node.NodeLUT.get(id); 
    } 

    // we don't call the Node constructor directly, we just let this factory 
    // take care of that for us instead. 
    public static Node create(long _id, String _label) { 
    return new Node(_id, _label); 
    } 

    public static Node create(long _id, String _label, long _parent) { 
    Node parent = Node.NodeLUT.get(_parent), node; 
    node = new Node(_id, _label); 
    parent.addChild(node); 
    return node; 
    } 

    // instance variables and methods 

    Node parent; 
    long id; 
    String label; 
    ArrayList<Node> children = new ArrayList<Node>(); 

    // again: no public constructor. We can only use Node.create if we want 
    // to make Node objects. 
    private Node(long _id, String _label) { 
    parent = null; 
    id = _id; 
    label = _label; 
    Node.NodeLUT.put(id, this); 
    } 

    // this is taken care of in Node.create, too 
    private void addChild(Node child) { 
    children.add(child); 
    child.setParent(this); 
    } 

    // as is this. 
    private void setParent(Node _parent) { 
    parent = _parent; 
    } 

    /** 
    * Find the route from this node, to some descendant node with id [descendentId] 
    */ 
    public ArrayList<Node> getDescendentPathTo(long descendentId) { 
    ArrayList<Node> list = new ArrayList<Node>(), temp; 
    list.add(this); 
    if(id == descendentId) { 
     return list; 
    } 
    for(Node n: children) { 
     temp = n.getDescendentPathTo(descendentId); 
     if(temp != null) { 
     list.addAll(temp); 
     return list; 
     } 
    } 
    return null; 
    } 

    /** 
    * Find the route from this node, to some ancestral node with id [descendentId] 
    */ 
    public ArrayList<Node> getAncestorPathTo(long ancestorId) { 
    ArrayList<Node> list = new ArrayList<Node>(), temp; 
    list.add(this); 
    if(id == ancestorId) { 
     return list; 
    } 
    temp = parent.getAncestorPathTo(ancestorId); 
    if(temp != null) { 
     list.addAll(temp); 
     return list; 
    } 
    return null; 
    } 

    public String toString() { 
    return "{id:"+id+",label:"+label+"}"; 
    } 
} 

那麼讓我們來測試是爲了確保它的工作原理,通過在標準public static void main(String[] args)方法加入,爲方便起見,一函數將節點ArrayLists變成可讀的東西:

public static String stringify(ArrayList<?> list) { 
    String listString = ""; 
    for (int s=0, l=list.size(); s<l; s++) { 
     listString += list.get(s).toString(); 
     if(s<l-1) { listString += ", "; } 
    } 
    return listString; 
    } 

    public static void main(String[] args) { 
    // hard coded data based on your question-supplied example data 
    Node.create(0, "A"); 
    Node.create(1, "B", 0); 
    Node.create(2, "C", 0); 
    Node.create(4, "D", 1); 
    Node.create(5, "E", 1); 
    Node.create(6, "F", 4); 
    Node.create(3, "G", 0); 

    // let's see what we get! 
    Node root = Node.getNode(0); 
    Node f = Node.getNode(6); 
    System.out.println("From root to F: " + stringify(root.getDescendentPathTo(6))); 
    System.out.println("From F to root: " + stringify(f.getAncestorPathTo(0))); 
    } 

輸出?

From root to F: {id:0,label:A}, {id:1,label:B}, {id:4,label:D}, {id:6,label:F} 
From F to root: {id:6,label:F}, {id:4,label:D}, {id:1,label:B}, {id:0,label:A} 

完美。

所以我們所需要做的就是編寫將你的「平面定義」變成Node.create調用的部分,然後完成。記住:不要過分複雜的事情。如果你的數據是一棵平坦的樹,你需要的只是一個樹結構。所有你需要編寫一個樹結構是一個單獨的Node類。

+0

插入順序不一定是父母先,然後是子女,它可以是隨機的。那麼我們需要另一個任務來鏈接每個節點的所有父節點,比如上面的LinkRecordtask? – Barcelona 2014-09-28 05:06:55

+0

如果可以,請訂購平面數據,以便始終保證在插入小孩之前存在父項。如果這是不可能的,那麼你可以通過改變'create'工廠來檢查'parent',並且如果它不存在,就可以使父綁定「後期綁定」,並且如果它不存在,則調用一個(由您新定義的)第二個私有構造函數,父母的ID爲'long _parentid'或其他東西。然後你可以創建一個靜態的'resolveParents'來完成真正的交聯,一旦你插入了所有的節點,你就可以調用Node.resolveParents()。 – 2014-09-28 05:23:53

+0

如何在線性層次結構中有效地加載節點深處的所有子節點,但不知道結束點的ID? – Barcelona 2014-09-28 06:27:41