2017-02-23 23 views
0

落實我與格式線JSON文件:糾正JSON格式,並以星火

{"user1" : "a", 
"mobile": "b", 
"address": "c" 

{"user2" : "aa", 
"mobile": "bb", 
"address": "cc" 

此日誌是不正確的,沒有}對各行的末尾

我試過

text_file = open('abc.txt', "r", encoding='utf-8') 
read = text_file.read() 
a = read.split("\n") 
for i in a: 
    print(i+"}") 

這給了我一個額外的}所有行結束。我該如何避免它?

另外,我需要在spark中實現相同的邏輯。請讓我知道需要什麼樣的修改,或者如果有火花更好的邏輯

+0

您必須使用內置的'json'模塊驗證您的字符串,然後根據顯示的錯誤進行更正。沒有任何魔術實用工具可以解決你的字符串問題。 –

+0

您當前腳本返回的內容的示例會有所幫助嗎?它在每行的結尾打印}}? – putonspectacles

+0

{「user1」:「a」,「mobile」:「b」,「address」:「c」} {「user2」:「aa」,「mobile」:「bb」 :「cc」} } – SpaceOddity

回答

1

與Python

text_file = open('abc.txt', 'r') 
read = text_file.read() 
a = read.split('{') # split with '{' 
del a[0] # remove the first line contains space 
for i in a: 
    print("{"+i.strip()+"}") # if you want to remove '\n' add .replace('\n','') 

結果:

{"user1" : "a", 
"mobile": "b", 
"address": "c"} 
{"user2" : "aa", 
"mobile": "bb", 
"address": "cc"} 

星火

# use wholeTextFiles to read all lines, textFile split lines with '\n'  
text_file = sc.wholeTextFiles("abc.txt") 
a = text_file.map(lambda (pathFile , lines) : lines). # select only lines 
       flatMap(lambda text : text.split('{')). # split with '{' 
       filter(lambda line : len(line) > 0). # remove the first line 
       map(lambda line : '{'+line.strip()+'}') 
for i in a.collect() : 
    print i 

結果:

{"user1" : "a", 
"mobile": "b", 
"address": "c"} 
{"user2" : "aa", 
"mobile": "bb", 
"address": "cc"}