2017-01-01 118 views
0

目的是變換駐留在一個文件中的整數:這個RDD來自哪裏空白?

1 2 3 
4 5 6 
7 8 9 

成三個陣列,以便能夠執行數學運算。

預計

[[1, 2, 3], [4, 5, 6], [7, 8, 9]] 

實際

[[u'1', u' ', u'2', u' ', u'3'], [u'4', u' ', u'5', u' ', u'6'], [u'7', u' ', u'8', u' ', u'9']] 

代碼

txt = sc.textFile("integers.txt") 
print txt.collect() 
#[u'1 2 3', u'4 5 6', u'7 8 9'] 

pairs = txt.map(lambda x: x.split(' ')) 
print pairs.collect() 
#[[u'1', u'2', u'3'], [u'4', u'5', u'6'], [u'7', u'8', u'9']] 

pairs = txt.map(lambda x: [s for s in x]) 
print pairs.collect() 
#[[u'1', u' ', u'2', u' ', u'3'], [u'4', u' ', u'5', u' ', u'6'], [u'7', u' ', u'8', u' ', u'9']] 

回答

2

問題似乎是數字是unicode格式而不是int。 您可以將它們轉換爲int來解決它(請參閱https://docs.python.org/2/library/functions.html#int

>>> pairs = txt.map(lambda x: x.split(' ')) 
>>> print pairs.collect() 
[[u'1', u'2', u'3'], [u'4', u'5', u'6'], [u'7', u'8', u'9']] 

>>> pairs2 = pairs.map(lambda x: [int(s) for s in x]) 
>>> print pairs2.collect() 
[[1, 2, 3], [4, 5, 6], [7, 8, 9]] 
>>> 
-2
pairs = txt.map(lambda x: x.split(' ')) 
// this return every concatenated character that separated by space ' ', which kind of similar to following function (lamda also aware of newline from file) 
def AFunc(aString): 
    returnArray = [] 
    tempString = "" 
    foreach(char in aString) 
     if char == ' ': 
     if tempString != "": 
      returnArray.append(tempString) 
      tempString = "" 
     else: 
     tempString += char 
    return returnArray 


// .. 
pairs = txt.map(lambda x: [s for s in x]) 
// this return every character in a string, which kind of similar to following function (lamda also aware of newline from file) 
def BFunc(aString): 
    returnArray = [] 
    foreach(char in aString): 
    returnArray.append(char) 
    return returnArray 

http://www.python-course.eu/lambda.php