我有一個基於幾個org.apache.hadoop庫的Scala中的MapReduce框架。它適用於一個簡單的wordcount程序。但是,我想將它應用於一些有用的東西,並且遇到了障礙。我想獲取一個csv文件(或任何分隔符)並將第一列中的任何內容作爲關鍵字並隨後計算關鍵字的出現次數。Scala MapReduce框架給出類型不匹配
映射器的代碼如下所示
class WordCountMapper extends Mapper[LongWritable, Text, Text, LongWritable] with HImplicits {
protected override def map(lnNumber: LongWritable, line: Text, context: Mapper[LongWritable, Text, Text, LongWritable]#Context): Unit = {
line.split(",", -1)(0) foreach (context.write(_,1)) //Splits data
}
}
這個問題是在 'line.split' 代碼。當我嘗試編譯它時,出現如下錯誤:
found:char required:org.apache.hadoop.io.Text。
line.split ...應該返回一個字符串,該字符串被傳遞給write(_,1)中的_,但爲了soem的原因,它認爲它是一個char。我甚至添加了.toString來明確地將它作爲一個字符串,但是這也不起作用。
任何想法表示讚賞。讓我知道我可以提供什麼額外的細節。
更新:
這裏是進口的清單:
import org.apache.hadoop.io.{LongWritable, Text}
import org.apache.hadoop.mapreduce.{Reducer, Job, Mapper}
import org.apache.hadoop.conf.{Configured}
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat
import scala.collection.JavaConversions._
import org.apache.hadoop.util.{ToolRunner, Tool}
這裏是build.sbt代碼:
import AssemblyKeys._ // put this at the top of the file
assemblySettings
organization := "scala"
name := "WordCount"
version := "1.0"
scalaVersion:= "2.11.2"
scalacOptions ++= Seq("-no-specialization", "-deprecation")
libraryDependencies ++= Seq("org.apache.hadoop" % "hadoop-client" % "1.2.1",
"org.apache.hadoop" % "hadoop-core" % "latest.integration" exclude ("hadoop-core", "org/apache/hadoop/hdfs/protocol/ClientDatanodeProtocol.class") ,
"org.apache.hadoop" % "hadoop-common" % "2.5.1",
"org.apache.hadoop" % "hadoop-mapreduce-client-core" % "2.5.1",
"commons-configuration" % "commons-configuration" % "1.9",
"org.apache.hadoop" % "hadoop-hdfs" % "latest.integration")
jarName in assembly := "WordCount.jar"
mergeStrategy in assembly <<= (mergeStrategy in assembly) { (old) =>
{case s if s.endsWith(".class") => MergeStrategy.last
case s if s.endsWith(".xsd") => MergeStrategy.last
case s if s.endsWith(".dtd") => MergeStrategy.last
case s if s.endsWith(".xml") => MergeStrategy.last
case s if s.endsWith(".properties") => MergeStrategy.last
case x => old(x)
}
}
您可以提供您的導入和您的build.sbt或依賴關係列表,以便我可以嘗試編譯它嗎? – 2014-11-05 21:25:19
'line'是一個「Hadoop Writable」'Text',你*需要*調用'toString'來獲得一個支持split的Java String。你應該告訴我們你打電話時得到的錯誤。 – 2014-11-05 21:42:20
@ThomasJungblut,你的意思是使用「line.split(」,「, - 1)(0).toString」?這產生了上述相同的錯誤。 – 2014-11-06 03:03:45