Word association counting

我是java新手。我需要在句子中統計彼此的單詞關聯。例如，對於句子「狗是狗，貓是貓」，最終的關聯計數將爲- 第一行：狗狗（0），狗是（2），狗-a（2））狗 - 和（1），狗 - 貓（2）Word association counting

等等。

這是一種開發關聯矩陣。任何關於如何開發的建議？

2010-12-19 Rushdi Shams

有趣！你能詳細說明這是什麼用法，爲什麼「狗是」的計數2。看看這個過程是否有幫助：http://it.toolbox.com/blogs/enterprise-solutions/building-an-association-matrix-15499 – 2010-12-19 00:39:22

@Pangea：那麼，在句子「狗」是存在與2「是」，這就是爲什麼Dog-pair會得到值2.使用表格製作矩陣很容易，但在實施過程中，我迷了路。 – 2010-12-19 06:50:52

我很抱歉，但我看到「狗是」只出現一次。「狗是狗，貓是貓」 – 2010-12-19 11:59:22

將句子拆分爲單獨的單詞。
生成配對。
合併相同的配對。

很簡單：

String[] words = sentence.split("\\s"); //first step 
List<List<String>> pairs = 
    new ArrayList<List<String>>((int)(((words.length)/2.0) * (words.length - 1))); 
for (int i = 0; i < words.length - 1; i++) { 
    for (int j = i + 1; j < words.length; j++) { 
     List<String> pair = Arrays.asList(words[i], words[j]); 
     Collections.sort(pair); 
     pairs.add(pair); 
    } 
} //second step 
Map<List<String>, Integer> pair2count = new LinkedHashMap<List<String>, Integer>(); 
for (List<String> pair : pairs) { 
    if (pair2count.containsKey(pair)) { 
     pair2count.put(pair, pair2count.get(pair) + 1); 
    } else { 
     pair2count.put(pair, 1); 
    } 
} //third step 

//output 
System.out.println(pair2count);

來源

2010-12-19 00:31:53 Roman

感謝羅馬。我可以把句子中的單詞分開 -

String sentence=null; 
    String target="Dog is a Dog and Cat is a Cat"; 
    int index = 0; 
    Locale currentLocale = new Locale ("en","US"); 
    BreakIterator wordIterator = BreakIterator.getWordInstance(currentLocale); 
    //Creating the sentence iterator 
    BreakIterator bi = BreakIterator.getSentenceInstance(); 
    bi.setText(target); 

    while (bi.next() != BreakIterator.DONE) { 

     sentence = target.substring(index, bi.current()); 
     System.out.println(sentence); 
     wordIterator.setText(sentence); 
     int start = wordIterator.first(); 
     int end = wordIterator.next(); 

     while (end!=BreakIterator.DONE){ 

      String word = sentence.substring(start,end); 
      if (Character.isLetterOrDigit(word.charAt(0))) { 

       System.out.println(word); 

      }//if (Character.isLetterOrDigit(word.charAt(0))) 

      start = end; 
      end = wordIterator.next(); 
     }//while (end!=BreakIterator.DONE) 
     index = bi.current(); 
    } // while (bi.next() != BreakIterator.DONE)

但是沒有得到你的其他兩點。謝謝。

來源

2010-12-19 00:43:26

+1使用BreakIterator – 2010-12-19 00:58:21

它是矯枉過正，恕我直言。 'target.split（「\\ s」）'應該足夠了，它可以代替所有這些過於複雜的代碼。 – Roman 2010-12-19 01:01:58

BreakIterator的另一個+1。 – orangepips 2010-12-19 01:19:14

Word association counting

回答

相關問題