與其他語言一起使用LibShortText嗎？

LibShortText是一個用於短文本分類和分析的開源工具。 http://www.csie.ntu.edu.tw/~cjlin/libshorttext/與其他語言一起使用LibShortText嗎？

我試圖弄清楚它是否也適用於英語以外的其他語言（例如德語）？但我沒有找到提示。

誰知道答案？先謝謝你。

2016-08-30 NewbieXXL

我這麼認爲（但可能需要一些額外的預處理）。 Libsvm和Liblinear都是語言不可知的。由於LibShortText構建在LibLinear之上，因此它也適用於所有語言。

根據this的論文，它有內部預處理方法來提取特徵。

libshorttext.converter: For given short texts, LibShortText follows 
the bag-of-word model to generate features. Users apply procedures in 
this library to pre-process short texts by tokenization, stemming 
(optional), and stop-word removal (optional). The library also allows 
users to choose between unigram and bigram features.

但是，它看起來像它的詞幹和停止詞移除只支持英語。因此，如果您想爲非英文文本提取更好的功能，則可能需要使用自己的預處理方法，例如，使用nltk。

來源

2016-09-03 01:42:25 greeness

與其他語言一起使用LibShortText嗎？

回答

相關問題