動態存儲和使用retriving在C＃.NET 3000000個+字集合

如何存儲和檢索動態3,000,000+話，而無需使用SQL ..動態存儲和使用retriving在C＃.NET 3000000個+字集合

獲取來自文檔的單詞再查詞是否可用與否。

如果有的話，然後增加它在相應的文檔個性化......

如果沒有可用的即新詞然後創建一個新列，然後遞增文件數，並把零所有其他文件。

舉例..

我有93,000個文件，每個文件包含或多或少的5000個單詞。如果出現新單詞，請添加新欄。同樣960000字來了。

----------------字1單詞2 WORD3 word4的word5 ... ----新詞...word96000

文檔1 ---- 2 ---- 19 ---- 45 ---- 16 ---- 7 ---- ------ ... ----。---- ---- ..

Document2 ---- 4 ---- 6 ---- 3 ---- 56 ---- 3 ---- ...。 -------- 0 ---- ----

文檔3 ---- 56 ---- 34 ---- 1 ---- 67 - --4 ---- ...。 -------- 0 ---- ----

文檔4 ---- 7 ---- 45 ---- 9 ---- 45 ----- --6 ---- ...。 -------- 0 ---- ---- ----

文檔5 ---- 56 ---- 43 ---- 234 ---- 87 - --46 ---- ...。 -------- 0 ---- ..

文件6 ---- 56 ---- 6 ---- 2 ---- 5 ---- 23 --- - ...。 -------- 0 ---- ---- ----

。 ...。。 .. ..

。 ...。。 ...

Document1000 ---- 5 ---- 9 ---- 9 ---- 89 ---- 34 ---- ...。 -------- 1 .. ..

添加的那些單詞的計數在相應文檔的條目中動態更新。

來源

2010-10-11 Mind Dead

你有沒有嘗試過任何東西？你想讓結果適合內存，以便查詢它，或者你只是想將結果輸出到某個文件中？有很多方法可以解決這個問題，但最好的方法很大程度上取決於你想要的最終結果。 – 2010-10-11 14:46:17

這樣的稀疏矩陣通常最好作爲詞典的字典來實現。

Dictionary<string, Dictionary<string, int> index;

但這個問題缺乏太多的細節給予更多的建議。

來源

2010-10-11 14:51:36

爲了避免浪費內存，我建議如下：

class Document { 
    List<int> words; 
} 
List<Document> documents;

如果你有1000個文件，然後創建List<Document> documents = new List<Document>(1000);
現在，如果文檔1有句話：單詞2，word19和word45，添加這些索引文字到您的文檔

documents[0].words.add(2); 
documents[0].words.add(19); 
documents[0].words.add(45);

您可以修改代碼來存儲單詞本身。
要查看單詞word2重複了多少次，您可以拋出整個文檔列表並查看文檔是否包含單詞索引。

foreach (Document d in documents) { 
    if (d.words.Contain(2)) { 
     count++; 
    } 
}

來源

2010-10-11 15:45:11

var nWords = (from Match m in Regex.Matches(File.ReadAllText("big.txt").ToLower(), "[a-z]+") 
       group m.Value by m.Value) 
      .ToDictionary(gr => gr.Key, gr => gr.Count());

爲您提供由字和計數索引的字典清單。我相信你可以在每個文件被讀入後保存信息，然後建立你的最終名單。也許？

來源

2010-10-11 16:21:20

動態存儲和使用retriving在C＃.NET 3000000個+字集合

回答

相關問題