一個簡單的字符串輸入的字符串算法

我一直在尋找字詞幹擾算法，如搬運工算法，但是到目前爲止我發現的所有東西都將文件作爲輸入進行處理。一個簡單的字符串輸入的字符串算法

是否有任何現有的算法，讓我簡單地通過stemmer一個字符串，並讓它返回字符串？

喜歡的東西：

String toBeStemmed = "The man worked tirelessly"; 
Stemmer s = new Stemmer(); 

String stemmed = s.stem(toBeStemmed);

來源

2014-03-25 user3163073

一個關於porter-stemmer的好網站是http://tartarus.org/martin/PorterS temmer / – aloisdg

算法本身不帶文件。該代碼可能需要該文件並將其作爲一系列字符串讀入，並送入該算法。你只需要看看從文件中讀取字符串的代碼部分，並以類似的方式傳遞自己的字符串。

來源

2014-03-25 14:26:02

在你的例子中，toBeStemmed是一個句子，你想首先進行標記。然後，你會阻止個人令牌/單詞，如'工作'或'不知疲倦'。

這裏有一些很好的形態分析器，我在一些項目中用作stemmer。

詞幹JAR：https://code.google.com/p/hunglish-webapp/source/browse/trunk/#trunk%2Flib%2Fnet%2Fsf%2Fjhunlang%2Fjmorph%2F1.0
詞幹來源：https://code.google.com/p/j-morph/source/checkout
語言資源文件：https://code.google.com/p/hunglish-webapp/source/browse/trunk/#trunk%2Fsrc%2Fmain%2Fresources%2Fresources-lang%2Fjmorph
我如何使用Lucene使用它：https://code.google.com/p/hunglish-webapp/source/browse/trunk/src/main/java/hu/mokk/hunglish/jmorph/
屬性文件：https://code.google.com/p/hunglish-webapp/source/browse/trunk/src/main/resources/META-INF/spring/stemmer.properties

用法示例：

import net.sf.jhunlang.jmorph.lemma.Lemma; 
import net.sf.jhunlang.jmorph.lemma.Lemmatizer; 
import net.sf.jhunlang.jmorph.analysis.Analyser; 
import net.sf.jhunlang.jmorph.analysis.AnalyserContext; 
import net.sf.jhunlang.jmorph.analysis.AnalyserControl; 
import net.sf.jhunlang.jmorph.factory.Definition; 
import net.sf.jhunlang.jmorph.factory.JMorphFactory; 
import net.sf.jhunlang.jmorph.parser.ParseException; 
import net.sf.jhunlang.jmorph.sample.AnalyserConfig; 
import net.sf.jhunlang.jmorph.sword.parser.EnglishAffixReader; 
import net.sf.jhunlang.jmorph.sword.parser.EnglishReader; 
import net.sf.jhunlang.jmorph.sword.parser.SwordAffixReader; 
import net.sf.jhunlang.jmorph.sword.parser.SwordReader; 

AnalyserConfig acEn = new AnalyserConfig(); 
//TODO: set path to the English affix file 
String enAff = "src/main/resources/resources-lang/jmorph/en.aff"; 
Definition affixDef = acEn.createDefinition(enAff, "utf-8", EnglishAffixReader.class); 
//TODO set path to the English dict file 
String enDic = "src/main/resources/resources-lang/jmorph/en.dic"; 
Definition dicDef = acEn.createDefinition(enDic, "utf-8", EnglishReader.class); 
int enRecursionDepth = 3; 
acEn.setRecursionDepth(affixDef, enRecursionDepth); 
JMorphFactory jf = new JMorphFactory(); 
Analyser enAnalyser = jf.build(new Definition[] { affixDef, dicDef }); 
AnalyserControl acEn = new AnalyserControl(AnalyserControl.ALL_COMPOUNDS); 
AnalyserContext analyserContextEn = new AnalyserContext(acEn); 
boolean enStripDerivates = true; 
Lemmatizer enLemmatizer = new net.sf.jhunlang.jmorph.lemma.LemmatizerImpl(enAnalyser, enStripDerivates, analyserContextEn); 


//After somewhat complex initializing, here we go 
List<Lemma> lemmas = enLemmatizer.lemmatize("worked"); 
for (Lemma lemma : lemmas) { 
    System.out.println(lemma.getWord()); 
}

來源

2014-03-26 12:01:46 bpgergo

一個簡單的字符串輸入的字符串算法

回答

相關問題