How can I split a string containing Chinese or Japanese or English into words by using regex or any utility class?Split string containing Chinese or Japanese or English into words
Example 1:
根據從2013年的一項研究,由一羣來自美國俄亥俄州立大學的研
Output 1:
根據從2013 年的一項研究,由一羣來自美國俄亥俄州立大學的研
Example 2:
According to a 2013 study by a research group from the US to
Output 2:
According, to, a, 2013, study, by, a, research, group, from, the, US, to
It's certain that the input string will not mix English with Japanese - both will come in separate strings; but yes, an English string should also be split by this piece of code:
words = input.split("[ ./()\\[\\]=,<>;\"']+");
If this is not possible in Java, please suggest if the Non-English input strings could be separated by whitespace characters only.
May I ask why there is no space in-between "年的", "項研究" and "羣來" in output 1? – Pang
對不起,不瞭解中國人,所以錯誤地發生了。 – Kishore