我正在研究在Java堆中添加值的不同可能性。我正在使用PriorityHeap
類。當我在應用程序中注意到運行時間緩慢時,我決定給這個看看。我添加了幾千個,有時是數百萬個自定義條目(我有一個自定義類,它有3個字段:int,LongWritable和Text,都來自hadoop.io; this儀器代理說我的記錄平均有200個字節)。將add()與addAll()插入到Java PriorityHeap中
是很明顯,使用的addAll()
代替add()
方法把條目堆將提高性能,只是因爲這會避免幾個heapify
操作?
我試着用不同的策略使用以下新例如:
package Sorting;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
import java.util.PriorityQueue;
public class Main {
private static final int HEAP_SIZE = 1000000;
private static final int BULK_LIST_SIZE = HEAP_SIZE/10;
private static String normal;
private static String bulk;
private static String fullBulk;
public static void main(String[] args) throws IOException {
normal = "";
bulk = "";
fullBulk = "";
long time = 0;
warmup();
normal = "";
bulk = "";
fullBulk = "";
for (int i = 0; i < 100; i++) {
// Normal add time
System.out.println("Normal starts...");
time = normalExecution();
System.out.println("Normal add time " + time);
// Bulk add time
System.out.println("Bulk starts...");
time = bulk();
System.out.println("Bulk add time " + time);
// Bulk add time with list and heap with same size
System.out.println("Full Bulk starts...");
time = fullBulk();
System.out.println("Full Bulk add time " + time);
}
System.out.println(normal);
System.out.println(bulk);
System.out.println(fullBulk);
}
private static long fullBulk() {
long time;
long start;
List<Double> fullBulkList = new ArrayList<Double>(HEAP_SIZE);
PriorityQueue<Double> fullBulkHeap = new PriorityQueue<Double>(HEAP_SIZE);
start = System.nanoTime();
for (int j = 0; j < HEAP_SIZE; j++) {
if (fullBulkList.size() == HEAP_SIZE) {
fullBulkHeap.addAll(fullBulkList);
fullBulkList.clear();
}
}
fullBulkHeap.addAll(fullBulkList);
time = System.nanoTime() - start;
fullBulk = fullBulk + "\t" + time;
fullBulkList = null;
fullBulkHeap = null;
return time;
}
private static long bulk() {
long time;
long start;
List<Double> bulkList = new ArrayList<Double>(BULK_LIST_SIZE);
PriorityQueue<Double> bulkHeap = new PriorityQueue<Double>(HEAP_SIZE);
start = System.nanoTime();
for (int j = 0; j < HEAP_SIZE; j++) {
if (bulkList.size() == BULK_LIST_SIZE) {
bulkHeap.addAll(bulkList);
bulkList.clear();
}
}
bulkHeap.addAll(bulkList);
time = System.nanoTime() - start;
bulk = bulk + "\t" + time;
bulkList = null;
bulkHeap = null;
return time;
}
private static long normalExecution() {
long time;
long start;
PriorityQueue<Double> normalHeap = new PriorityQueue<Double>(HEAP_SIZE);
start = System.nanoTime();
for (int j = 0; j < HEAP_SIZE; j++) {
normalHeap.add(Double.MAX_VALUE);
}
time = System.nanoTime() - start;
normal = normal + "\t" + time;
normalHeap = null;
return time;
}
private static void warmup() {
System.out.println("Starting warmup");
for (int i = 0; i < 1000; i++) {
normalExecution();
bulk();
fullBulk();
}
for (int i = 0; i < 1000; i++) {
bulk();
fullBulk();
normalExecution();
}
for (int i = 0; i < 1000; i++) {
fullBulk();
normalExecution();
bulk();
}
System.out.println("Warmup finished");
}
}
這導致以下結果:
中的第11次迭代的巨大峯值正常添加方法由GC調用解釋:[GC 1347684K->31354K(1446400K), 0.0331610 secs]
。
Mediam值分別爲16049669,783724和800276。 ST開發者是3512492.89,244374.17和33344.17。
僅從算法的角度來看,即使會有堆的批量插入,也需要篩選每個新添加的元素以保留堆屬性。你的基準也有缺陷(你的瓶頸是IO不是堆),請看http://stackoverflow.com/questions/504103/how-do-i-write-a-correct-micro-benchmark-in-java –
非常感謝這個鏈接。 –