糾正JSON格式，並以星火

落實我與格式線JSON文件：糾正JSON格式，並以星火

{"user1" : "a", 
"mobile": "b", 
"address": "c" 

{"user2" : "aa", 
"mobile": "bb", 
"address": "cc"

此日誌是不正確的，沒有}對各行的末尾

我試過

text_file = open('abc.txt', "r", encoding='utf-8') 
read = text_file.read() 
a = read.split("\n") 
for i in a: 
    print(i+"}")

這給了我一個額外的}所有行結束。我該如何避免它？

另外，我需要在spark中實現相同的邏輯。請讓我知道需要什麼樣的修改，或者如果有火花更好的邏輯

來源

2017-02-23 SpaceOddity

您必須使用內置的'json'模塊驗證您的字符串，然後根據顯示的錯誤進行更正。沒有任何魔術實用工具可以解決你的字符串問題。 –

您當前腳本返回的內容的示例會有所幫助嗎？它在每行的結尾打印}}？ – putonspectacles

{「user1」：「a」，「mobile」：「b」，「address」：「c」} {「user2」：「aa」，「mobile」：「bb」：「cc」} } – SpaceOddity

與Python

text_file = open('abc.txt', 'r') 
read = text_file.read() 
a = read.split('{') # split with '{' 
del a[0] # remove the first line contains space 
for i in a: 
    print("{"+i.strip()+"}") # if you want to remove '\n' add .replace('\n','')

結果：

{"user1" : "a", 
"mobile": "b", 
"address": "c"} 
{"user2" : "aa", 
"mobile": "bb", 
"address": "cc"}

星火

# use wholeTextFiles to read all lines, textFile split lines with '\n'  
text_file = sc.wholeTextFiles("abc.txt") 
a = text_file.map(lambda (pathFile , lines) : lines). # select only lines 
       flatMap(lambda text : text.split('{')). # split with '{' 
       filter(lambda line : len(line) > 0). # remove the first line 
       map(lambda line : '{'+line.strip()+'}') 
for i in a.collect() : 
    print i

結果：

{"user1" : "a", 
"mobile": "b", 
"address": "c"} 
{"user2" : "aa", 
"mobile": "bb", 
"address": "cc"}

來源

2017-02-23 16:07:51 Ahmed

糾正JSON格式，並以星火

回答

相關問題