我試圖在文本中識別數字和相應的大小。我遇到了以下錯誤:用逗號解析數字爲float?
UNABLE TO PARSE MAGNITUDE: 6,700
下面是一個代碼片斷從一個較大的代碼,以幫助您瞭解我在做什麼。這裏
for(Quantity quantity: originalQuantities){
y = Math.round(quantity.getMagnitude());
if ((roleStrings.get(SemanticRole.TIME) != null && (roleStrings.get(SemanticRole.TIME)).contains(String.valueOf(y))))
continue;
.........................
量如下定義一個類:
public class Quantity
{
private Float magnitude;
private String multiplier;
private String unit;
private UnitType type;
private Float absoluteMagnitude;
enum UnitType
{
TIME, MONEY, WEIGHT, VOLUME, NUMBER
}
public Quantity(String strMagnitude, String multiplier, String unit,
String strType)
{
this.setMagnitude(strMagnitude);
this.multiplier = multiplier;
this.unit = unit;
this.setType(strType);
}
public Float getMagnitude()
{
return magnitude;
}
public String getMultiplier()
{
return multiplier;
}
public String getUnit()
{
return unit;
}
public UnitType getType()
{
return type;
}
如何解決這個問題?我嘗試使用Locale和ParseFloat等轉換,但無法解決問題。
這裏是一個解析代碼大小:
public static List<Quantity> getQuantitiesFromString(String str) throws ParseException
{
List<Quantity> quantities = new ArrayList<Quantity>();
//final String REGEX = "^(\\+|-)?([1-9]\\d{0,2}|0)?(,\\d{3}){0,}(\\.\\d+)?";
//NumberFormat numberFormat = NumberFormat.getNumberInstance(Locale.US);
//String numberAsString = numberFormat.format(number);
// optional +/- sign followed by numbers separated with a decimal
Pattern pattern = Pattern.compile("^[-+]?[0-9]*\\.?[0-9]+");
Pattern pattern1 = Pattern.compile("^[0-9][0-9,-]*-[0-9,-]*[0-9]");
List<String> tokens = Arrays.asList(str.split(" "));
for (int i = 0; i < tokens.size(); i++)
{
String magnitude = "";
String multiplier = "";
String unit = "";
String type = "";
boolean numFound = false;
String token = tokens.get(i);
// append all numbers matching pattern into a String
Matcher matcher = pattern.matcher(token);
Matcher matcher1 = pattern1.matcher(token);
while (matcher.find())
{
numFound = true;
magnitude += matcher.group();
}
//ignore for number ranges (e.g. 0-10)
while (matcher1.find())
{
numFound = false;
continue;
}
if (numFound)
{
// loop through all words starting from current word
// keep adding valid unit words until an invalid unit word is
// encountered
for (int j = i; j < tokens.size(); j++)
{
// strip non-alphabetic chars from word
String word = tokens.get(j).replaceAll("[^a-zA-Z$%]", "")
.toLowerCase();
// see if the stripped word is a unit
boolean validUnitWord = false;
if (getUnitTypesMap().keySet().contains(word))
{
validUnitWord = true;
if (getUnitTypesMap().get(word).equalsIgnoreCase(
"number"))
{
multiplier += multiplier.isEmpty() ? word : " "
+ word;
}
else
{
unit += unit.isEmpty() ? word : " " + word;
type = getUnitTypesMap().get(word);
}
}
// break if invalid unit word; else keep searching in next
// words
// except for current word (index = i), in which case keep
// searching regardless
if (!validUnitWord && j != i)
break;
}
quantities.add(new Quantity(magnitude, multiplier, unit, type));
}
}
return quantities;
}
編輯
的無法解析,當我與Locale.US
玩弄我恢復幅度誤差到舊的代碼,現在像一個字符串:
debentures amounting to Rs 6,700 crore
輸出我從getQuantitiesFromString得到的是:逗號後
QUANTITY: [[magnitude=6.0, multiplier=crore, unit=, type=NUMBER, absoluteMagnitude=null]]
一切都被忽略。我想這正則表達式來檢測類似22,00.15 22000353等:
"^(\+|-)?([1-9]\d{0,2}|0)?(,\d{3}){0,}(\.\d+)?"
數字,但由於某種原因,它不適合我的代碼工作。
解析任何東西的代碼在哪裏? – f1sh
你輸入了正確的語言環境嗎?時間,日期,金錢,重量的標記因地區而異。 – Tschallacka
解決解析的其他可能性是用'.'替換''',那麼應該可以解析 – XtremeBaumer