2016-12-05 66 views
-2

我試圖在文本中識別數字和相應的大小。我遇到了以下錯誤:用逗號解析數字爲float?

UNABLE TO PARSE MAGNITUDE: 6,700

下面是一個代碼片斷從一個較大的代碼,以幫助您瞭解我在做什麼。這裏

for(Quantity quantity: originalQuantities){ 
    y = Math.round(quantity.getMagnitude()); 

    if ((roleStrings.get(SemanticRole.TIME) != null && (roleStrings.get(SemanticRole.TIME)).contains(String.valueOf(y)))) 
     continue; 
......................... 

量如下定義一個類:

public class Quantity 
{ 
    private Float  magnitude; 
    private String  multiplier; 
    private String  unit; 
    private UnitType type; 
    private Float  absoluteMagnitude; 

enum UnitType 
{ 
    TIME, MONEY, WEIGHT, VOLUME, NUMBER 
} 
public Quantity(String strMagnitude, String multiplier, String unit, 
      String strType) 
    { 
     this.setMagnitude(strMagnitude); 
     this.multiplier = multiplier; 
     this.unit = unit; 
     this.setType(strType); 
    } 

    public Float getMagnitude() 
    { 
     return magnitude; 
    } 

    public String getMultiplier() 
    { 
     return multiplier; 
    } 

    public String getUnit() 
    { 
     return unit; 
    } 

    public UnitType getType() 
    { 
     return type; 
    } 

如何解決這個問題?我嘗試使用Locale和ParseFloat等轉換,但無法解決問題。

這裏是一個解析代碼大小:

public static List<Quantity> getQuantitiesFromString(String str) throws ParseException 
{ 
    List<Quantity> quantities = new ArrayList<Quantity>(); 
    //final String REGEX = "^(\\+|-)?([1-9]\\d{0,2}|0)?(,\\d{3}){0,}(\\.\\d+)?"; 
    //NumberFormat numberFormat = NumberFormat.getNumberInstance(Locale.US); 
    //String numberAsString = numberFormat.format(number); 
    // optional +/- sign followed by numbers separated with a decimal 

    Pattern pattern = Pattern.compile("^[-+]?[0-9]*\\.?[0-9]+"); 
    Pattern pattern1 = Pattern.compile("^[0-9][0-9,-]*-[0-9,-]*[0-9]"); 



    List<String> tokens = Arrays.asList(str.split(" ")); 

    for (int i = 0; i < tokens.size(); i++) 
    { 
     String magnitude = ""; 
     String multiplier = ""; 
     String unit = ""; 
     String type = ""; 

     boolean numFound = false; 

     String token = tokens.get(i); 

     // append all numbers matching pattern into a String 
     Matcher matcher = pattern.matcher(token); 
     Matcher matcher1 = pattern1.matcher(token); 

     while (matcher.find()) 
     { 
      numFound = true; 
      magnitude += matcher.group(); 
     } 

     //ignore for number ranges (e.g. 0-10) 
     while (matcher1.find()) 
     { 
      numFound = false; 
      continue; 
     } 

     if (numFound) 
     { 
      // loop through all words starting from current word 
      // keep adding valid unit words until an invalid unit word is 
      // encountered 
      for (int j = i; j < tokens.size(); j++) 
      { 
       // strip non-alphabetic chars from word 
       String word = tokens.get(j).replaceAll("[^a-zA-Z$%]", "") 
         .toLowerCase(); 

       // see if the stripped word is a unit 
       boolean validUnitWord = false; 
       if (getUnitTypesMap().keySet().contains(word)) 
       { 
        validUnitWord = true; 

        if (getUnitTypesMap().get(word).equalsIgnoreCase(
          "number")) 
        { 
         multiplier += multiplier.isEmpty() ? word : " " 
           + word; 
        } 
        else 
        { 
         unit += unit.isEmpty() ? word : " " + word; 
         type = getUnitTypesMap().get(word); 
        } 
       } 

       // break if invalid unit word; else keep searching in next 
       // words 

       // except for current word (index = i), in which case keep 
       // searching regardless 
       if (!validUnitWord && j != i) 
        break; 
      } 

      quantities.add(new Quantity(magnitude, multiplier, unit, type)); 
     } 
    } 

    return quantities; 
} 

編輯

的無法解析,當我與Locale.US

玩弄我恢復幅度誤差到舊的代碼,現在像一個字符串:

debentures amounting to Rs 6,700 crore

輸出我從getQuantitiesFromString得到的是:逗號後

QUANTITY: [[magnitude=6.0, multiplier=crore, unit=, type=NUMBER, absoluteMagnitude=null]]

一切都被忽略。我想這正則表達式來檢測類似22,00.15 22000353等:

"^(\+|-)?([1-9]\d{0,2}|0)?(,\d{3}){0,}(\.\d+)?"

數字,但由於某種原因,它不適合我的代碼工作。

+3

解析任何東西的代碼在哪裏? – f1sh

+2

你輸入了正確的語言環境嗎?時間,日期,金錢,重量的標記因地區而異。 – Tschallacka

+0

解決解析的其他可能性是用'.'替換''',那麼應該可以解析 – XtremeBaumer

回答

0

由於^您的模式"^[-+]?[0-9]*\\.?[0-9]+"只在字符串6,700的開頭看起來。所以它找到了6,並沒有找到700.如果你刪除了那個^,你的方法會傳遞6700到你的構造器。