刪除重複鍵值對中的值在列表中

下面是我的代碼來檢測縮寫及其長表格。代碼循環遍歷文檔中的一行，循環遍歷該行的每個單詞並標識縮寫候選項。然後它再次循環遍歷文檔的每一行以找到縮寫的適當長格式。我的問題是，如果在文檔中多次出現首字母縮略詞，我的輸出包含多個實例。我只想用所有可能的長格式打印縮寫詞一次。這裏是我的代碼：刪除重複鍵值對中的值在列表中

public static void main(String[] args) throws FileNotFoundException 
    { 
     BufferedReader in = new BufferedReader(new FileReader("D:\\Workspace\\resource\\SampleSentences.txt")); 
     String str=null; 
     ArrayList<String> lines = new ArrayList<String>(); 
     String matchingLongForm; 
     List <String> matchingLongForms = new ArrayList<String>() ; 
     List <String> shortForm = new ArrayList<String>() ; 
     Map<String, List<String>> abbreviationPairs = new HashMap<String, List<String>>(); 


     try 
     { 
      while((str = in.readLine()) != null){ 
       lines.add(str); 
      } 
     } 
     catch (IOException e) 
     { 
      // TODO Auto-generated catch block 
      e.printStackTrace(); 
     } 
     String[] linesArray = lines.toArray(new String[lines.size()]); 




     // document wide search for abbreviation long form and identifying several appropriate matches 
     for (String line : linesArray){ 
      for (String word : (Tokenizer.getTokenizer().tokenize(line))){ 
       if (isValidShortForm(word)){ 
        for (int i = 0; i < linesArray.length; i++){ 
         matchingLongForm = extractBestLongForm(word, linesArray[i]); 
         //shortForm.add(word); 
         if (matchingLongForm != null && !(matchingLongForms.contains(matchingLongForm))){ 
          matchingLongForms.add(matchingLongForm); 

          //System.out.println(matchingLongForm); 
          abbreviationPairs.put(word, matchingLongForms); 
          //matchingLongForms.clear(); 
         } 
        } 

        if (abbreviationPairs != null){ 
         //for(abbreviationPairs.) 
         System.out.println("Abbreviation Pair:" + "\t" + abbreviationPairs); 
         abbreviationPairs.clear(); 
         matchingLongForms.clear(); 
         //System.out.println("Abbreviation Pair:" + "\t" + abbreviationPairsNew); 
        } 


        else 
         continue; 
       } 
      } 
     } 
    }

下面是電流輸出：

Abbreviation Pair: {GLBA=[Gramm Leach Bliley act]} 
Abbreviation Pair: {NCUA=[National credit union administration]} 
Abbreviation Pair: {FFIEC=[Federal Financial Institutions Examination Council]} 
Abbreviation Pair: {CFR=[comments for the Report]} 
Abbreviation Pair: {CFR=[comments for the Report]} 
Abbreviation Pair: {CFR=[comments for the Report]} 
Abbreviation Pair: {CFR=[comments for the Report]} 
Abbreviation Pair: {OFAC=[Office of Foreign Assets Control]}

來源

2016-11-18 serendipity

是'地圖<字符串，請設置> abbreviationPairs'的選項？ – bradimus

請注意['Files.readAllLines']的存在（https://docs.oracle.com/javase/7/docs/api/java/nio/file/Files.html#readAllLines（java.nio.file.Path ，％20java.nio.charset.Charset））。通過重新發明輪子，你正在浪費你的時間......此外，你可以簡單地寫'for（String line：lines）{...'，而不需要將List的內容複製到數組中。 – Holger

您希望縮寫和文本具有關鍵值對。所以你應該使用Map。地圖不能包含重複鍵;每個鍵可以映射到最多一個值。

問題出在輸出的位置上，而不是在地圖上。您嘗試在循環中輸出，因此多次顯示地圖。

移動代碼外循環：

if (abbreviationPairs != null){ 
    //for(abbreviationPairs.) 
    System.out.println("Abbreviation Pair:" + "\t" + abbreviationPairs); 
    abbreviationPairs.clear(); 
    matchingLongForms.clear(); 
    //System.out.println("Abbreviation Pair:" + "\t" + abbreviationPairsNew); 
}

來源

2016-11-18 14:20:12

更重要的是，在每次循環迭代中清除映射，這使得檢測重複鍵不可能。但無論哪種情況，將打印代碼移出循環都是正確的解決方案。必須小心地爲每個映射創建一個匹配「LongForms」的新列表。那麼'clear（）'調用就會過時。 – Holger

非常感謝！我用了你的答案的組合。每當我爲matchingLongForms創建一個新列表時，將打印代碼移到循環外部。 – serendipity

嘗試使用java.util.Set來存儲您的匹配短的形式和長形式。從該類的javadoc：

...如果此集合已包含該元素，則該調用將保持集合不變並返回false。結合對構造函數的限制，這可確保集合永遠不會包含重複的元素...

來源

2016-11-18 14:07:23

這裏的解決方案

感謝code_angel和Holger

移動打印代碼外循環並創建一個新的列表爲每個匹配的LongForm。

for (String line : linesArray){ 
     for (String word : (Tokenizer.getTokenizer().tokenize(line))){ 
      if (isValidShortForm(word)){ 
       for (int i = 0; i < linesArray.length; i++){ 
        matchingLongForm = extractBestLongForm(word, linesArray[i]); 
        List <String> matchingLongForms = new ArrayList<String>() ; 
        if (matchingLongForm != null && !(matchingLongForms.contains(matchingLongForm))&& !(abbreviationPairs.containsKey(word))){ 
         matchingLongForms.add(matchingLongForm); 
         //System.out.println(matchingLongForm); 
         abbreviationPairs.put(word, matchingLongForms); 
         //matchingLongForms.clear(); 
        } 
       } 

      } 
     } 
    } 
    if (abbreviationPairs != null){ 
     System.out.println("Abbreviation Pair:" + "\t" + abbreviationPairs); 
     //abbreviationPairs.clear(); 
     //matchingLongForms.clear(); 

    } 

}

新的輸出：

Abbreviation Pair: {NCUA=[National credit union administration], FFIEC=[Federal Financial Institutions Examination Council], OFAC=[Office of Foreign Assets Control], MSSP=[Managed Security Service Providers], IS=[Information Systems], SLA=[Service level agreements], CFR=[comments for the Report], MIS=[Management Information Systems], IDS=[Intrusion detection systems], TSP=[Technology Service Providers], RFI=[risk that FIs], EIC=[Examples of in the cloud], TIER=[The institution should ensure], BCP=[Business continuity planning], GLBA=[Gramm Leach Bliley act], III=[It is important], FI=[Financial Institutions], RFP=[Request for proposal]}

來源

2016-11-21 06:24:56 serendipity

刪除重複鍵值對中的值在列表中

回答

相關問題