我具有表示如以下的層級關係的平坦數據:找到所有的後代深深樹結構的根據平面數據
ID Name PID
0 A NULL
1 B 0
2 C 0
4 D 1
5 E 1
6 F 4
3 G 0
此表代表「數據表」,其中PID表示的父元素。 例如,在第一行中,我們看到A具有PID null,而B具有PID 0,這意味着B的父代是A,因爲0是A的ID,A是根元素,因爲它沒有PID。同樣,C有父A,因爲C也有PID0,0是A的ID。
我創建了一個DataTable類來表示上表。我還實現了方法processDataTable
public Map<String, List<String>> processDataTable()
返回的映射使用元素作爲鍵,並將後代節點的集合保存爲值。例如,地圖中的第一項對應於元素A,其具有許多後代,而元素C沒有後代。輸出中的成員順序並不重要。
public static void main(String...arg) {
DataTable dt = newDataTable();
dt.addRow(0, "A", null);
dt.addRow(1, "B", 0);
dt.addRow(2, "C", 0);
dt.addRow(4, "D", 1);
dt.addRow(5, "E", 1);
dt.addRow(6, "F", 4);
dt.addRow(3, "G", 0);
System.out.println("Output:");
System.out.println(dt.processDataTable());
}
Output:
{D=[F], A=[B, C, G, D, E, F], B=[D, E, F]}
or
{D=[F], E=null, F=null, G=null, A=[B, C, G, D, E, F], B=[D, E, F], C=null}
下面是我實現的數據表:
public class DataTable {
private List<Record> records = new ArrayList<>();
private Map<Integer, Integer> indexes = new HashMap<>();
private static final int PROCESSORS = Runtime.getRuntime().availableProcessors();
/**
* Add new record into DataTable.
*
* @param id
* @param name
* @param parentId
*/
public void addRow(Integer id, String name, Integer parentId) {
if (indexes.get(id) == null) {
Record rec = new Record(id, name, parentId);
records.add(rec);
indexes.put(id, records.size() - 1);
}
}
public List<Record> getRecords() {
return records;
}
/**
* Process DataTable and return a Map of all keys and its children. The
* main algorithm here is to divide big record set into multiple parts, compute
* on multi threads and then merge all result together.
*
* @return
*/
public Map<String, List<String>> processDataTable() {
long start = System.currentTimeMillis();
int size = size();
// Step 1: Link all nodes together
invokeOnewayTask(new LinkRecordTask(this, 0, size));
Map<String, List<String>> map = new ConcurrentHashMap<>();
// Step 2: Get result
invokeOnewayTask(new BuildChildrenMapTask(this, 0, size, map));
long elapsedTime = System.currentTimeMillis() - start;
System.out.println("Total elapsed time: " + elapsedTime + " ms");
return map;
}
/**
* Invoke given task one way and measure the time to execute.
*
* @param task
*/
private void invokeOnewayTask(ForkJoinTask<?> task) {
long start = System.currentTimeMillis();
ForkJoinPool pool = new ForkJoinPool(PROCESSORS);
pool.invoke(task);
long elapsedTime = System.currentTimeMillis() - start;
System.out.println(task.getClass().getSimpleName() + ":" + elapsedTime + " ms");
}
/**
* Find record by id.
*
* @param id
* @return
*/
public Record getRecordById(Integer id) {
Integer pos = indexes.get(id);
if (pos != null) {
return records.get(pos);
}
return null;
}
/**
* Find record by row number.
*
* @param rownum
* @return
*/
public Record getRecordByRowNumber(Integer rownum) {
return (rownum < 0 || rownum > records.size() - 1) ? null:records.get(rownum);
}
public int size() {
return records.size();
}
/**
* A task link between nodes
*/
private static class LinkRecordTask extends RecursiveAction {
private static final long serialVersionUID = 1L;
private DataTable dt;
private int start;
private int end;
private int limit = 100;
public LinkRecordTask(DataTable dt, int start, int end) {
this.dt = dt;
this.start = start;
this.end = end;
}
@Override
protected void compute() {
if ((end - start) < limit) {
for (int i = start; i < end; i++) {
Record r = dt.records.get(i);
Record parent = dt.getRecordById(r.parentId);
r.parent = parent;
if(parent != null) {
parent.children.add(r);
}
}
} else {
int mid = (start + end)/2;
LinkRecordTask left = new LinkRecordTask(dt, start, mid);
LinkRecordTask right = new LinkRecordTask(dt, mid, end);
left.fork();
right.fork();
left.join();
right.join();
}
}
}
/**
* Build Map<String, List<String>> result from given DataTable.
*/
private static class BuildChildrenMapTask extends RecursiveAction {
private static final long serialVersionUID = 1L;
private DataTable dt;
private int start;
private int end;
private int limit = 100;
private Map<String, List<String>> map;
public BuildChildrenMapTask(DataTable dt, int start, int end, Map<String, List<String>> map) {
this.dt = dt;
this.start = start;
this.end = end;
this.map = map;
}
@Override
protected void compute() {
if ((end - start) < limit) {
computeDirectly();
} else {
int mid = (start + end)/2;
BuildChildrenMapTask left = new BuildChildrenMapTask(dt, start, mid, map);
BuildChildrenMapTask right = new BuildChildrenMapTask(dt, mid, end, map);
left.fork();
right.fork();
left.join();
right.join();
}
}
private void computeDirectly() {
for (int i = start; i < end; i++) {
Record rec = dt.records.get(i);
List<String> names = new ArrayList<String>();
loadDeeplyChildNodes(rec, names);
if(!names.isEmpty()) {
map.put(rec.name, names);
}
}
}
private void loadDeeplyChildNodes(Record r, List<String> names) {
Collection<Record> children = r.children;
for(Record rec:children) {
if(!names.contains(rec.name)) {
names.add(rec.name);
}
loadDeeplyChildNodes(rec, names);
}
}
}
}
我的記錄類:
/**
* Represents a structure of a record in DataTable.
*/
public class Record {
public Integer id;
public String name;
public Integer parentId;
public Record parent;
public Collection<Record> children;
public Record(Integer id, String name, Integer parentId) {
this();
this.id = id;
this.name = name;
this.parentId = parentId;
}
public Record() {
children = Collections.newSetFromMap(new ConcurrentHashMap<Record, Boolean>())
}
public Collection<Record> getChildren() {
return children;
}
public Record getParent() {
return parent;
}
public Integer getParentId() {
return parentId;
}
@Override
public String toString() {
return "Record{" + "id=" + id + ", name=" + name + ", parentId=" + parentId + '}';
}
/* (non-Javadoc)
* @see java.lang.Object#hashCode()
*/
@Override
public int hashCode() {
final int prime = 31;
int result = 1;
result = prime * result + ((id == null) ? 0 : id.hashCode());
result = prime * result + ((name == null) ? 0 : name.hashCode());
result = prime * result + ((parentId == null) ? 0 : parentId.hashCode());
return result;
}
/* (non-Javadoc)
* @see java.lang.Object#equals(java.lang.Object)
*/
@Override
public boolean equals(Object obj) {
if (this == obj) {
return true;
}
if (obj == null) {
return false;
}
if (!(obj instanceof Record)) {
return false;
}
Record other = (Record) obj;
if (id == null) {
if (other.id != null) {
return false;
}
} else if (!id.equals(other.id)) {
return false;
}
if (name == null) {
if (other.name != null) {
return false;
}
} else if (!name.equals(other.name)) {
return false;
}
if (parentId == null) {
if (other.parentId != null) {
return false;
}
} else if (!parentId.equals(other.parentId)) {
return false;
}
return true;
}
}
我的算法是:
- Link all parent and child of each record
- Build the map
On each step I apply fork join to divide the dataset into smaller parts and run in parellel.
我不知道什麼是錯的做與此實現。有人能給我一些建議嗎?如果線性層次結構5K記錄(項目1是項目2的根目錄和父項目,項目2是項目3的父項目,項目3是項目4的父項目,等等),則此實現會發生OutOfmemory錯誤。它獲得了OutOfmemory,因爲它多次調用遞歸方法。
什麼是這個問題的好算法,或者我應該修改哪些數據結構以使其更好?
在LinkChildrenTask的compute()中,爲什麼要向每個祖先的子列表添加一個節點?您是否打算將節點的每個子節點添加到該節點的子節點列表中,以便快速瞭解其子節點中的某個位置?這可以使用非常大量的內存。在你的例子中,大約有1250萬個引用。 – 2014-09-28 02:45:28
我第一次實現我只是直接添加每個節點的子節點,而不是所有的子節點。但是在BuildChildrenMapTask中,它必須多次遞歸才能找到所有嵌套子元素,而且在該任務中也會失敗。有了這個測試用例,每個節點上都有很多子節點:(:(:(:( – Barcelona 2014-09-28 02:48:30
這看起來明顯不同,它是否在loadDeeplyChildNodes()中出現錯誤?考慮到您的根節點將有一個堆棧5000調用深,根的孩子將有一個堆棧4999深等 – 2014-09-28 03:20:37