Scala MapReduce框架給出類型不匹配

我有一個基於幾個org.apache.hadoop庫的Scala中的MapReduce框架。它適用於一個簡單的wordcount程序。但是，我想將它應用於一些有用的東西，並且遇到了障礙。我想獲取一個csv文件（或任何分隔符）並將第一列中的任何內容作爲關鍵字並隨後計算關鍵字的出現次數。Scala MapReduce框架給出類型不匹配

映射器的代碼如下所示

class WordCountMapper extends Mapper[LongWritable, Text, Text, LongWritable] with HImplicits { 
    protected override def map(lnNumber: LongWritable, line: Text, context: Mapper[LongWritable, Text, Text, LongWritable]#Context): Unit = { 
    line.split(",", -1)(0) foreach (context.write(_,1)) //Splits data 
    } 
}

這個問題是在 'line.split' 代碼。當我嘗試編譯它時，出現如下錯誤：

found：char required：org.apache.hadoop.io.Text。

line.split ...應該返回一個字符串，該字符串被傳遞給write（_，1）中的_，但爲了soem的原因，它認爲它是一個char。我甚至添加了.toString來明確地將它作爲一個字符串，但是這也不起作用。

任何想法表示讚賞。讓我知道我可以提供什麼額外的細節。

更新：

這裏是進口的清單：

import org.apache.hadoop.io.{LongWritable, Text} 
import org.apache.hadoop.mapreduce.{Reducer, Job, Mapper} 
import org.apache.hadoop.conf.{Configured} 
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat 
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat 
import scala.collection.JavaConversions._ 
import org.apache.hadoop.util.{ToolRunner, Tool}

這裏是build.sbt代碼：

import AssemblyKeys._ // put this at the top of the file 

assemblySettings 

organization := "scala" 

name := "WordCount" 

version := "1.0" 

scalaVersion:= "2.11.2" 

scalacOptions ++= Seq("-no-specialization", "-deprecation") 

libraryDependencies ++= Seq("org.apache.hadoop" % "hadoop-client" % "1.2.1", 
         "org.apache.hadoop" % "hadoop-core" % "latest.integration" exclude ("hadoop-core", "org/apache/hadoop/hdfs/protocol/ClientDatanodeProtocol.class") , 
         "org.apache.hadoop" % "hadoop-common" % "2.5.1", 
         "org.apache.hadoop" % "hadoop-mapreduce-client-core" % "2.5.1", 
         "commons-configuration" % "commons-configuration" % "1.9", 
         "org.apache.hadoop" % "hadoop-hdfs" % "latest.integration") 


jarName in assembly := "WordCount.jar" 

mergeStrategy in assembly <<= (mergeStrategy in assembly) { (old) => 
    {case s if s.endsWith(".class") => MergeStrategy.last 
case s if s.endsWith(".xsd") => MergeStrategy.last 
case s if s.endsWith(".dtd") => MergeStrategy.last 
case s if s.endsWith(".xml") => MergeStrategy.last 
case s if s.endsWith(".properties") => MergeStrategy.last 
case x => old(x) 
    } 
}

來源

2014-11-05 J Calbreath

您可以提供您的導入和您的build.sbt或依賴關係列表，以便我可以嘗試編譯它嗎？ – 2014-11-05 21:25:19

'line'是一個「Hadoop Writable」'Text'，你*需要*調用'toString'來獲得一個支持split的Java String。你應該告訴我們你打電話時得到的錯誤。 – 2014-11-05 21:42:20

@ThomasJungblut，你的意思是使用「line.split（」，「， - 1）（0）.toString」？這產生了上述相同的錯誤。 – 2014-11-06 03:03:45

我實際上是通過不使用_符號並直接在context.write中指定值來解決這個問題的。因此，而不是：

line.split(",", -1)(0) foreach (context.write(_,1))

我用：

context.write(line.split(",", -1)(0), 1)

我發現了一個項目網上提到的某個時候斯卡拉被使用時，數據類型混淆_，並建議只是明確到位定義值。不知道這是否屬實，但在這種情況下解決了問題。

來源

2014-11-06 16:52:09

與'_'無關，所做的一切都不再不必要地調用'foreach'。 – 2014-11-06 20:54:25

這很有道理。我想這是因爲它只傳遞一個字符串（第0項），它並不重要，只會在該項目上迭代一次。但我想它是遍歷每個字符在一個字符串中。 – 2014-11-07 13:30:17

我想這line被隱式轉換爲String這裏（感謝HImplicits？）。然後我們有

line.split(",", -1)(0) foreach somethigOrOther

分割字符串分爲多個字符串 - .split(...)
採取這些字符串的零 - (0)
然後遍歷somethingOrOther在字符的字符串 - foreach

因此，你得到你的char。

來源

2014-11-06 09:00:01

Scala MapReduce框架給出類型不匹配

回答

相關問題