2017-02-13 37 views
0

我已經以不同的方式問了這個問題幾次。每當我獲得突破時,我都會遇到另一個問題。這也是因爲我不熟悉Java,並且對像Google Maps這樣的集合有困難。所以請耐心等待。將兩個LinkedHashMaps與值作爲列表比較

我有兩張地圖是這樣的:

Map1 -{ORGANIZATION=[Fulton Tax Commissioner 's Office, Grady Hospital, Fulton Health Department], LOCATION=[Bellwood, Alpharetta]} 

Map2 - {ORGANIZATION=[Atlanta Police Department, Fulton Tax Commissioner, Fulton Health Department], LOCATION=[Alpharetta], PERSON=[Bellwood, Grady Hospital]} 

的地圖被定義爲:LinkedHashMap<String, List<String>> sampleMap = new LinkedHashMap<String, List<String>>();

我基於數值比較這兩個地圖和只有3個按鍵,即組織,個人和位置。 Map1是我比較Map2的金牌。現在我面臨的問題是,當我迭代Map1中的ORGANIZATION鍵值並檢查Map2中的匹配項時,即使我的第一個條目在Map2(富爾頓稅務專員)中有部分匹配,但因爲Map2的第一個條目(亞特蘭大警察局)是不是一場比賽,我得到一個不正確的結果(我正在尋找確切和部分匹配)。這裏的結果是增加了真正的肯定的,假的肯定的和假的否定的計數器,這使我能夠最終計算精確度和召回率,即命名實體識別。

編輯

我期待這樣做的結果是

Organization: 
True Positive Count = 2 
False Negative Count = 1 
False Positive Count = 1 

Person: 
False Positive Count = 2 

Location: 
True Positive Count = 1 
False Negative Count = 1 

我目前得到的輸出是:

Organization: 
    True Positive Count = 1 
    False Negative Count = 2 
    False Positive Count = 0 

    Person: 
    True Positive Count = 0 
    False Negative Count = 0 
    False Positive Count = 2 

    Location: 
    True Positive Count = 0 
    False Negative Count = 1 
    False Positive Count = 0 

CODE

private static List<Integer> compareMaps(LinkedHashMap<String, List<String>> annotationMap, LinkedHashMap<String, List<String>> rageMap) 
    { 
     List<Integer> compareResults = new ArrayList<Integer>(); 

     if (!annotationMap.entrySet().containsAll(rageMap.entrySet())){ 
       for (Entry<String, List<String>> rageEntry : rageMap.entrySet()){ 
        if (rageEntry.getKey().equals("ORGANIZATION") && !(annotationMap.containsKey(rageEntry.getKey()))){ 
         for (int j = 0; j< rageEntry.getValue().size(); j++) { 
          orgFalsePositiveCount++; 
         } 
       } 
        if (rageEntry.getKey().equals("PERSON") && !(annotationMap.containsKey(rageEntry.getKey()))){ 
         // System.out.println(rageEntry.getKey()); 
         // System.out.println(annotationMap.entrySet()); 
         for (int j = 0; j< rageEntry.getValue().size(); j++) { 
          perFalsePositiveCount++; 
         } 
       } 
        if (rageEntry.getKey().equals("LOCATION") && !(annotationMap.containsKey(rageEntry.getKey()))){ 
         for (int j = 0; j< rageEntry.getValue().size(); j++) { 
          locFalsePositiveCount++; 
        } 
       } 
       } 
      } 



       for (Entry<String, List<String>> entry : annotationMap.entrySet()){ 

        int i_index = 0; 
        if (rageMap.entrySet().isEmpty()){ 
         orgFalseNegativeCount++; 
         continue; 
        } 

        // for (Entry<String, List<String>> rageEntry : rageMap.entrySet()){ 

        if (entry.getKey().equals("ORGANIZATION")){ 
         for(String val : entry.getValue()) { 
          if (rageMap.get(entry.getKey()) == null){ 
           orgFalseNegativeCount++; 
           continue; 
         } 
      recusion:  for (int i = i_index; i< rageMap.get(entry.getKey()).size();){ 
           String rageVal = rageMap.get(entry.getKey()).get(i); 
           if(val.equals(rageVal)){ 
            orgTruePositiveCount++; 
            i_index++; 
            break recusion; 
         } 

          else if((val.length() > rageVal.length()) && val.contains(rageVal)){ //|| dataB.get(entryA.getKey()).contains(entryA.getValue())){ 
           orgTruePositiveCount++; 
           i_index++; 
           break recusion; 
         } 
          else if((val.length() < rageVal.length()) && rageVal.contains(val)){ 
           orgTruePositiveCount++; 
           i_index++; 
           break recusion; 
          } 

          else if(!val.contains(rageVal)){ 
           orgFalseNegativeCount++; 
           i_index++; 
           break recusion; 
          } 
          else if(!rageVal.contains(val)){ 
           orgFalsePositiveCount++; 
           i_index++; 
           break recusion; 
          } 


         } 
        } 
        } 

        ......................... //(Same for person and location) 


        compareResults.add(orgTruePositiveCount); 
        compareResults.add(orgFalseNegativeCount); 
        compareResults.add(orgFalsePositiveCount); 
        compareResults.add(perTruePositiveCount); 
        compareResults.add(perFalseNegativeCount); 
        compareResults.add(perFalsePositiveCount); 
        compareResults.add(locTruePositiveCount); 
        compareResults.add(locFalseNegativeCount); 
        compareResults.add(locFalsePositiveCount); 

        System.out.println(compareResults); 
        return compareResults; 

      } 
+1

你應該更正式地描述你想要接收什麼結果 – Andremoniy

+0

@Andremoniy完成!我的代碼完成了大部分的工作,但我想我要找的是對「如何比較之前需要對地圖進行排序?」這樣的問題的答案。或者有什麼我應該做的,以防止這個問題? – serendipity

+0

什麼是價值類型?它是一個集合還是一個列表? – vanje

回答

1

我想出了一個簡化版本:

例如,字符串比較可以使用Levenshtein distance實現。這是我得到的輸出:

Organization: 
    False Positive: Atlanta Police Department 
    True Positive: Fulton Tax Commissioner 
    True Positive: Fulton Health Department 
    False Negative: Grady Hospital 

Person: 
    False Positive: Bellwood 
    False Positive: Grady Hospital 

Location: 
    True Positive: Alpharetta 
    False Negative: Bellwood 

[2, 1, 1, 0, 0, 2, 1, 1, 0] 

這裏是我創建的代碼:


public class MapCompare { 

    public static boolean listContains(List<String> annotationList, String value) { 
     if(annotationList.contains(value)) { 
      // 100% Match 
      return true; 
     } 
     for(String s: annotationList) { 
      if (s.contains(value) || value.contains(s)) { 
       // Partial Match 
       return true; 
      } 
     } 
     return false; 
    } 

    public static List<Integer> compareLists(List<String> annotationList, List<String> rageList){ 
     List<Integer> compareResults = new ArrayList<Integer>(); 
     if(annotationList == null || rageList == null) return Arrays.asList(0, 0, 0); 
     Integer truePositiveCount = 0; 
     Integer falseNegativeCount = 0; 
     Integer falsePositiveCount = 0; 

     for(String r: rageList) { 
      if(listContains(annotationList, r)) { 
       System.out.println("\tTrue Positive: " + r); 
       truePositiveCount ++; 
      } else { 
       System.out.println("\tFalse Positive: " + r); 
       falsePositiveCount ++; 
      } 
     } 

     for(String s: annotationList) { 
      if(listContains(rageList, s) == false){ 
       System.out.println("\tFalse Negative: " + s); 
       falseNegativeCount ++; 
      } 
     } 

     compareResults.add(truePositiveCount); 
     compareResults.add(falseNegativeCount); 
     compareResults.add(falsePositiveCount); 

     System.out.println(); 

     return compareResults; 
    } 

    private static List<Integer> compareMaps(LinkedHashMap<String, List<String>> annotationMap, LinkedHashMap<String, List<String>> rageMap) { 
     List<Integer> compareResults = new ArrayList<Integer>(); 
     System.out.println("Organization:"); 
     compareResults.addAll(compareLists(annotationMap.get("ORGANIZATION"), rageMap.get("ORGANIZATION"))); 
     System.out.println("Person:"); 
     compareResults.addAll(compareLists(annotationMap.get("PERSON"), rageMap.get("PERSON"))); 
     System.out.println("Location:"); 
     compareResults.addAll(compareLists(annotationMap.get("LOCATION"), rageMap.get("LOCATION"))); 
     System.out.println(compareResults); 
     return compareResults; 
    } 

    public static void main(String[] args) { 
     LinkedHashMap<String, List<String>> Map1 = new LinkedHashMap<>(); 
     List<String> m1l1 = Arrays.asList("Fulton Tax Commissioner's Office", "Grady Hospital", "Fulton Health Department"); 
     List<String> m1l2 = Arrays.asList("Bellwood", "Alpharetta"); 
     List<String> m1l3 = Arrays.asList(); 
     Map1.put("ORGANIZATION", m1l1); 
     Map1.put("LOCATION", m1l2); 
     Map1.put("PERSON", m1l3); 

     LinkedHashMap<String, List<String>> Map2 = new LinkedHashMap<>(); 
     List<String> m2l1 = Arrays.asList("Atlanta Police Department", "Fulton Tax Commissioner", "Fulton Health Department"); 
     List<String> m2l2 = Arrays.asList("Alpharetta"); 
     List<String> m2l3 = Arrays.asList("Bellwood", "Grady Hospital"); 

     Map2.put("ORGANIZATION", m2l1); 
     Map2.put("LOCATION", m2l2); 
     Map2.put("PERSON", m2l3); 

     compareMaps(Map1, Map2); 

    } 

} 

希望這有助於!

1

這裏,如果我得到這個權利,它可能會有所幫助。

我創建了一個自定義字符串重載equals對於部分匹配

public class MyCustomString { 

    private String myString; 

    public MyCustomString(String myString) { 
     this.myString = myString; 
    } 

    public String getMyString() { 
     return myString; 
    } 

    public void setMyString(String myString) { 
     this.myString = myString; 
    } 

    @Override 
    public boolean equals(Object obj) { 
     if (obj == null) { 
      return false; 
     } 
     if (getClass() != obj.getClass()) { 
      return false; 
     } 
     final MyCustomString other = (MyCustomString) obj; 
     if (!Objects.equals(this.myString, other.myString) && !other.myString.contains(this.myString)) { 
      return false; 
     } 
     return true; 
    } 

    // add getter and setter for myString 
    // or delegate needed methods to myString object. 
    @Override 
    public int hashCode() { 
     int hash = 3; 
     hash = 47 * hash + Objects.hashCode(this.myString); 
     return hash; 
    } 
} 

,在這裏我想與您的地圖的第一部分的代碼

LinkedHashMap<String, List<MyCustomString>> sampleMap1 = new LinkedHashMap<String, List<MyCustomString>>(); 
     sampleMap1.put("a", new ArrayList<>()); 
     sampleMap1.get("a").add(new MyCustomString("Fulton Tax Commissioner 's Office")); 
     sampleMap1.get("a").add(new MyCustomString("Grady Hospital")); 
     sampleMap1.get("a").add(new MyCustomString("Fulton Health Department")); 

     LinkedHashMap<String, List<MyCustomString>> sampleMap2 = new LinkedHashMap<String, List<MyCustomString>>(); 
     sampleMap2.put("a", new ArrayList<>()); 
     sampleMap2.get("a").add(new MyCustomString("Atlanta Police Department")); 
     sampleMap2.get("a").add(new MyCustomString("Fulton Tax Commissioner")); 
     sampleMap2.get("a").add(new MyCustomString("Fulton Health Department")); 

     HashMap<String, Integer> resultMap = new HashMap<String, Integer>(); 

     for (Map.Entry<String, List<MyCustomString>> entry : sampleMap1.entrySet()) { 
      String key1 = entry.getKey(); 
      List<MyCustomString> value1 = entry.getValue(); 
      List<MyCustomString> singleListOfMap2 = sampleMap2.get(key1); 
      if (singleListOfMap2 == null) { 
       // all entry are false negative 
       System.out.println("Number of false N" + value1.size()); 
      } 
      for (MyCustomString singleStringOfMap2 : singleListOfMap2) { 
       if (value1.contains(singleStringOfMap2)) { 
        //True positive 
        System.out.println("true"); 
       } else { 
        //false negative 
        System.out.println("false N"); 
       } 
      } 
      int size = singleListOfMap2.size(); 
      System.out.println(size + " - numero di true"); 
      //false positive = size - true 
     } 
     for (String string : sampleMap2.keySet()) { 
      if (sampleMap1.get(string) == null) { 
       //all these are false positive 
       System.out.println("numero di false P: " + sampleMap2.get(string).size()); 
      } 
     } 
+0

啊後來注意到地圖可能會在第一種情況下會丟失像PERSON這樣的整個密鑰 – Zeromus

+0

非常感謝代碼!我正在嘗試看看如何將其納入我的代碼並進行測試。爲了回答你關於錯過PERSON密鑰的問題,我的代碼的第一部分負責處理這個問題。 map2中任何不在map1中的額外東西都是誤報。 – serendipity

+0

好吧,我也修好了;) – Zeromus

1

我寫了這個類比較地圖:

public class MapComparison<K, V> { 
    private final Map<K, Collection<ValueCounter>> temp; 
    private final Map<K, Collection<V>> goldMap; 
    private final Map<K, Collection<V>> comparedMap; 
    private final BiPredicate<V, V> valueMatcher; 

    public MapComparison(Map<K, Collection<V>> mapA, Map<K, Collection<V>> mapB, BiPredicate<V, V> valueMatcher) { 
     this.goldMap = mapA; 
     this.comparedMap = mapB; 
     this.valueMatcher = valueMatcher; 

     this.temp = new HashMap<>(); 

     goldMap.forEach((key, valueList) -> { 
      temp.put(key, valueList.stream().map(value -> new ValueCounter(value, true)).collect(Collectors.toList())); 
     }); 

     comparedMap.entrySet().stream().forEach(entry -> { 

      K key = entry.getKey(); 
      Collection<V> valueList = entry.getValue(); 

      if(temp.containsKey(key)) { 
       Collection<ValueCounter> existingMatches = temp.get(key); 

       Stream<V> falsePositives = valueList.stream().filter(v -> existingMatches.stream().noneMatch(mv -> mv.match(v))); 

       falsePositives.forEach(fp -> existingMatches.add(new ValueCounter(fp, false))); 
      } else { 
       temp.putIfAbsent(key, valueList.stream().map(value -> new ValueCounter(value, false)).collect(Collectors.toList())); 
      } 
     }); 
    } 

    public String formatMatchedCounters() { 
     StringBuilder sb = new StringBuilder(); 

     for(Entry<K, Collection<ValueCounter>> e : temp.entrySet()) { 
      sb.append(e.getKey()).append(":"); 

      int[] counters = e.getValue().stream().collect(() -> new int[3], (a, b) -> { 
       a[0] += b.truePositiveCount; 
       a[1] += b.falsePositiveCount; 
       a[2] += b.falseNegativeCount; 
      }, (c, d) -> { 
       c[0] += d[0]; 
       c[1] += d[1]; 
       c[2] += d[2]; 
      }); 
      sb.append(String.format("\ntruePositiveCount=%s\nfalsePositiveCount=%s\nfalseNegativeCount=%s\n\n", counters[0], counters[1], counters[2])); 
     } 
     return sb.toString(); 
    } 


    private class ValueCounter { 
     private final V goldValue; 

     private int truePositiveCount = 0; 
     private int falsePositiveCount = 0; 
     private int falseNegativeCount = 0; 

     ValueCounter(V value, boolean isInGoldMap) { 
      this.goldValue = value; 

      if(isInGoldMap) { 
       falseNegativeCount = 1; 
      } else { 
       falsePositiveCount = 1; 
      } 
     } 

     boolean match(V otherValue) { 
      boolean result = valueMatcher.test(goldValue, otherValue); 

      if(result) { 
       truePositiveCount++; 

       falseNegativeCount = 0; 
      } 
      return result; 
     } 
    } 
} 

什麼是基本上創建一個地圖項目的聯合,並且每個它em有自己的可變計數器來計算匹配值。方法formatMatchedCounters()只是爲每個鍵迭代和求和這些計數器。

以下測試:

public class MapComparisonTest { 

    private Map<String, Collection<String>> goldMap; 
    private Map<String, Collection<String>> comparedMap; 
    private BiPredicate<String, String> valueMatcher; 

    @Before 
    public void initMaps() { 
     goldMap = new HashMap<>(); 
     goldMap.put("ORGANIZATION", Arrays.asList("Fulton Tax Commissioner", "Grady Hospital", "Fulton Health Department")); 
     goldMap.put("LOCATION", Arrays.asList("Bellwood", "Alpharetta")); 

     comparedMap = new HashMap<>(); 
     comparedMap.put("ORGANIZATION", Arrays.asList("Atlanta Police Department", "Fulton Tax Commissioner", "Fulton Health Department")); 
     comparedMap.put("LOCATION", Arrays.asList("Alpharetta")); 
     comparedMap.put("PERSON", Arrays.asList("Bellwood", "Grady Hospital")); 

     valueMatcher = String::equalsIgnoreCase; 
    } 

    @Test 
    public void test() { 
     MapComparison<String, String> comparison = new MapComparison<>(goldMap, comparedMap, valueMatcher); 

     System.out.println(comparison.formatMatchedCounters()); 
    } 
} 

具有的結果:

ORGANIZATION: 
truePositiveCount=2 
falsePositiveCount=1 
falseNegativeCount=1 

LOCATION: 
truePositiveCount=1 
falsePositiveCount=0 
falseNegativeCount=1 

PERSON: 
truePositiveCount=0 
falsePositiveCount=2 
falseNegativeCount=0 

請注意,我不知道你是怎麼想比較類似值(例如,「富爾頓稅務專員」與「富爾頓稅務專員」),所以我決定把這個決定放在簽名中(在這個例子中是BiPredicate作爲參數)。

valueMatcher = (s1, s2) -> StringUtils.getLevenshteinDistance(s1, s2) < 5;