我編寫了一個C#應用程序,該應用程序對索引編制索引並對該索引執行布爾文本檢索算法。文章結尾處的類展示了兩者的實現,構建索引並在其上執行算法。重新執行的大量差異
的代碼是通過下列方式一個GUI按鈕叫:
private void Execute_Click(object sender, EventArgs e)
{
Stopwatch s;
String output = "-----------------------\r\n";
String sr = algoChoice.SelectedItem != null ? algoChoice.SelectedItem.ToString() : "";
switch(sr){
case "Naive search":
output += "Naive search\r\n";
algo = NaiveSearch.GetInstance();
break;
case "Boolean retrieval":
output += "boolean retrieval\r\n";
algo = BooleanRetrieval.GetInstance();
break;
default:
outputTextbox.Text = outputTextbox.Text + "Choose retrieval-algorithm!\r\n";
return;
}
output += algo.BuildIndex("../../DocumentCollection/PilzFuehrer.txt") + "\r\n";
postIndexMemory = GC.GetTotalMemory(true);
s = Stopwatch.StartNew();
output += algo.Start("../../DocumentCollection/PilzFuehrer.txt", new String[] { "Pilz", "blau", "giftig", "Pilze" });
s.Stop();
postQueryMemory = GC.GetTotalMemory(true);
output += "\r\nTime elapsed:" + s.ElapsedTicks/(double)Stopwatch.Frequency + "\r\n";
outputTextbox.Text = output + outputTextbox.Text;
}
開始的第一次執行(...)運行約700μS,每次重播只需要<爲10μs。 該應用程序使用Visual Studio 2010和默認的「Debug」build配置進行編譯。
我試了很多找到包括分析和不同實現的原因,但效果始終保持不變。
如果有人可以給我一些新的想法,我會嘗試甚至解釋,我會很生氣。
class BooleanRetrieval:RetrievalAlgorithm
{
protected static RetrievalAlgorithm theInstance;
List<String> documentCollection;
Dictionary<String, BitArray> index;
private BooleanRetrieval()
: base("BooleanRetrieval")
{
}
public override String BuildIndex(string filepath)
{
documentCollection = new List<string>();
index = new Dictionary<string, BitArray>();
documentCollection.Add(filepath);
for(int i=0; i<documentCollection.Count; ++i)
{
StreamReader input = new StreamReader(documentCollection[i]);
var text = Regex.Split(input.ReadToEnd(), @"\W+").Distinct().ToArray();
foreach (String wordToIndex in text)
{
if (!index.ContainsKey(wordToIndex))
{
index.Add(wordToIndex, new BitArray(documentCollection.Count, false));
}
index[wordToIndex][i] = true;
}
}
return "Index " + index.Keys.Count + "words.";
}
public override String Start(String filepath, String[] search)
{
BitArray tempDecision = new BitArray(documentCollection.Count, true);
List<String> res = new List<string>();
foreach(String searchWord in search)
{
if (!index.ContainsKey(searchWord))
return "No documents found!";
tempDecision.And(index[searchWord]);
}
for (int i = 0; i < tempDecision.Count; ++i)
{
if (tempDecision[i] == true)
{
res.Add(documentCollection[i]);
}
}
return res.Count>0 ? res[0]: "Empty!";
}
public static RetrievalAlgorithm GetInstance()
{
Contract.Ensures(Contract.Result<RetrievalAlgorithm>() != null, "result is null.");
if (theInstance == null)
theInstance = new BooleanRetrieval();
theInstance.Executions++;
return theInstance;
}
}
http://en.wikipedia.org/wiki/Just-in-time_compilation –
看起來像JIT編譯開銷。 – ChaosPandion
是的,似乎是正確的想法:System.Runtime.CompilerServices.RuntimeHelpers.PrepareMethod(algo.GetType()。GetMethod(「Start」)。MethodHandle);將問題減少到因子2. – B1ANCHi