我正試圖解決年齡偏大的問題。感謝這裏的許多教程,我可以遍歷一組字符串,遞歸地查找所有排列,然後將它們與英語單詞列表進行比較。我發現的問題是,經過大約三個字(通常是像「變形」),我得到一個OutOfMemory錯誤。我嘗試將我的批次分成小集,因爲它似乎是消耗我所有記憶的遞歸部分。但是,即使只是「歪像」鎖起來......Java Anagram內存不足
在這裏,我從文件中讀取單詞到列表現在
Scanner scanner = new Scanner(resource.getInputStream());
while (scanner.hasNext()) {
String s = scanner.nextLine();
uniqueWords.add(s.toLowerCase());
}
我打破他們分成更小的組,並調用一個類來生成字謎:
List<List<String>> subSets = Lists.partition(new ArrayList(uniqueWords), SET_SIZE);
for (List<String> set: subSets) {
// tried created as class attribute & injection, no difference
AnagramGenerator anagramGenerator = new AnagramGenerator();
List<Word> anagrams = anagramGenerator.createWordList(set);
wordsRepository.save(anagrams);
LOGGER.info("Inserted {} records into the database", anagrams.size());
}
最後我發生器:
public class AnagramGenerator {
private Map<String, List<String>> map = new Hashtable<>();
public List<Word> createWordList(List<String> dictionary) {
buildAnagrams(dictionary);
List<Word> words = new ArrayList<>();
for (Map.Entry<String, List<String>> entry : map.entrySet()) {
words.add(new Word(entry.getKey(), entry.getValue()));
}
return words;
}
private Map<String, List<String>> buildAnagrams(List<String> dictionary) {
for (String str : dictionary) {
String key = sortString(str);
if (map.get(key) != null) {
map.get(key).add(str.toLowerCase());
} else {
if (str.length() < 2) {
map.put(key, new ArrayList<>());
} else {
Set<String> permutations = permutations(str);
Set<String> anagramList = new HashSet<>();
for (String temp : permutations) {
if (dictionary.contains(temp) && !temp.equalsIgnoreCase(str)) {
anagramList.add(temp);
}
}
map.put(key, new ArrayList<>(anagramList));
}
}
}
return map;
}
private Set<String> permutations(String str) {
if (str.isEmpty()) {
return Collections.singleton(str);
} else {
Set<String> set = new HashSet<>();
for (int i = 0; i < str.length(); i++)
for (String s : permutations(str.substring(0, i) + str.substring(i + 1)))
set.add(str.charAt(i) + s);
return set;
}
}
編輯: 基於優秀的反饋我已經改變了我的發電機從排列到工作查找:
public class AnagramGenerator {
private Map<String, Set<String>> groupedByAnagram = new HashMap<String, Set<String>>();
private Set<String> dictionary;
public AnagramGenerator(Set<String> dictionary) {
this.dictionary = dictionary;
}
public List<Word> searchAlphabetically() {
List<Word> words = new ArrayList<>();
for (String word : dictionary) {
String key = sortString(word);
if (!groupedByAnagram.containsKey(key)) {
groupedByAnagram.put(key, new HashSet<>());
}
if (!word.equalsIgnoreCase(key)) {
groupedByAnagram.get(key).add(word);
}
}
for (Map.Entry<String, Set<String>> entry : groupedByAnagram.entrySet()) {
words.add(new Word(entry.getKey(), new ArrayList(entry.getValue())));
}
return words;
}
private String sortString(String goodString) {
char[] letters = goodString.toLowerCase().toCharArray();
Arrays.sort(letters);
return new String(letters);
}
它多一點的調整,從而它自己的字謎,但除此之外,這個我不加一個字似乎正在快速發展。而且,代碼更清潔。感謝大家!
你從哪裏得到錯誤?堆棧跟蹤? –
你正在創造一個很多集合的地方.. – SpaceCowboy
使用遞歸來查找排列需要大量的開銷,並且通常涉及增加您的程序分配的堆空間。我建議使用另一種方式來創建所有的排列組合。 –