2017-07-30 263 views
0

我有列富數據可以是分割字符串

foo 
abcdef_zh 
abcdf_grtyu_zt 
[email protected] 

從這裏我想創建兩列,使得

Part 1  Part 2 
abcdef  zh 
abcdf_grtyu zt 
pqlmn  xl 

的代碼我使用這個是

data = data.withColumn("Part 1",split(data["foo"],substring(data["foo"],-3,1))).get_item(0) 
data = data.withColumn("Part 2",split(data["foo"],substring(data["foo"],-3,1))).get_item(1) 

但是我得到一個錯誤的列沒有可迭代

回答

1

下應該工作

>>> from pyspark.sql import Row 
>>> from pyspark.sql.functions import expr 
>>> df = sc.parallelize(['abcdef_zh', 'abcdfgrtyu_zt', '[email protected]']).map(lambda x: Row(x)).toDF(["col1"]) 
>>> df.show() 
+-------------+ 
|   col1| 
+-------------+ 
| abcdef_zh| 
|abcdfgrtyu_zt| 
|  [email protected]| 
+-------------+ 
>>> df.withColumn('part2',df.col1.substr(-2, 3)).withColumn('part1', expr('substr(col1, 1, length(col1)-3)')).select('part1', 'part2').show() 
+----------+-----+ 
|  part1|part2| 
+----------+-----+ 
| abcdef| zh| 
|abcdfgrtyu| zt| 
|  pqlmn| xl| 
+----------+-----+ 
+0

這完美地工作 –