2013-04-08 86 views
-1

我想重新格式化json文件並刪除文件的很大一部分。這是原始的json文件。REGEX重新格式化

 "2597401":[{"jobID":"2597401", 
       "account":"TG-CCR120014", 
       "user":"charngda", 
       "pkgT":{"pgi/7.2- 5":{"libA":["libpgc.so"], 
       "flavor":["default"]}},   
       "startEpoch":"1338497979", 
       "runTime":"1022", 
       "execType":"user:binary",    
       "exec":"ft.D.64", 
       "numNodes":"4", 
       "sha1":"5a79879235aa31b6a46e73b43879428e2a175db5", 
       "execEpoch":1336766742, 
       "execModify":"Fri May 11 15:05:42 2012", 
       "startTime":"Thu May 31 15:59:39 2012", 
       "numCores":"64", 
       "sizeT":{"bss":"1881400168","text":"239574","data":"22504"}}, 
       {"jobID":"2597401", 
       "account":"TG-CCR120014", 
       "user":"charngda", 
       "pkgT":{"pgi/7.2-5":{"libA":["libpgc.so"], 
       "flavor":["default"]}}, 
       "startEpoch":"1338497946", 
       "runTime":"33" "execType":"user:binary", 
       "exec":"cg.C.64", 
       "numNodes":"4", 
       "sha1":"caf415e011e28b7e4e5b050fb61cbf71a62a9789", 
       "execEpoch":1336766735, 
       "execModify":"Fri May 11 15:05:35 2012", 
       "startTime":"Thu May 31 15:59:06 2012", 
       "numCores":"64", 
       "sizeT":{"bss":"29630984","text":"225749","data":"20360"}}, 
       {"jobID":"2597401", 
       "account":"TG-CCR120014", 
       "user":"charngda", 
       "pkgT":{"pgi/7.2-5": {"libA":["libpgc.so"], 
       "flavor":["default"]}}, 
       "startEpoch":"1338500447", 
       "runTime":"145", 
       "execType":"user:binary", 
       "exec":"mg.D.64", 
       "numNodes":"4", 
       "sha1":"173de32e1514ad097b1c051ec49c4eb240f2001f", 
       "execEpoch":1336766756, 
       "execModify":"Fri May 11 15:05:56 2012", 
       "startTime":"Thu May 31 16:40:47 2012", 
       "numCores":"64", 
       "sizeT":{"bss":"456954120","text":"426186","data":"22184"}},{"jobID":"2597401", 
       "account":"TG-CCR120014", 
       "user":"charngda", 
       "pkgT":{"pgi/7.2-5":{"libA":["libpgc.so"], 
       "flavor":["default"]}}, 
       "startEpoch":"1338499002", 
       "runTime":"1444", 
       "execType":"user:binary", 
       "exec":"lu.D.64", 
       "numNodes":"4", 
       "sha1":"c6dc16d25c2f23d2a3321d4feed16ab7e10c2cc1", 
       "execEpoch":1336766748, 
       "execModify":"Fri May 11 15:05:48 2012", 
       "startTime":"Thu May 31 16:16:42 2012", 
       "numCores":"64", 
       "sizeT":{"bss":"199850984","text":"474218","data":"27064"}}], 

對於每個JobId,我只想保留「exec」字段和JobID。我怎樣才能構造一個正則表達式來啞數據的其餘部分?理想情況下,我需要以下內容: JobID exec1 exec2 exec3
有沒有辦法做到這一點?

在此先感謝。

+0

你的意思是'{「2597401」:[{「JobID」:2597401,「exec」:「ft.D.64」}]}'? – 2013-04-08 00:22:45

+0

排序最初的數字是JobId,所以理想情況下我想要這樣的東西。 2597401 ft.D.64 cg,C,64 mg.D.64 lu.d.64同一個工作有多個exec,所以我想要jobID和exec。 – amber4478 2013-04-08 00:26:09

+4

使用將讀取JSON的JSON庫,讓您操作它並將其保存。與您的代碼不同,該JSON庫已經被寫入,測試和調試過。正則表達式不是一個魔術棒,你在涉及文本的每一個問題上都會揮手。 – 2013-04-08 00:26:28

回答

2

因爲您沒有指定您的RegEx引擎,我會假設您正在使用作爲我的答案。

基於JSON格式,你可以使用這個正則表達式匹配不需要雙用什麼來代替:

/(,\s*(*SKIP))?+("(?!jobID"|exec)[^"]+"\s*+:\s*+("[^"]*"|{(?2)?+(?>,\s*(?2))*}|\[(?3)?+(?>,\s*(?3))*\]))(?(1)|,?)/g 

這裏是你將正則表達式替換後下令什麼:

 "2597401":[{"jobID":"2597401", 
       "execType":"user:binary",    
       "exec":"ft.D.64", 
       "execEpoch":1336766742, 
       "execModify":"Fri May 11 15:05:42 2012"}, 
       {"jobID":"2597401" "execType":"user:binary", 
       "exec":"cg.C.64", 
       "execEpoch":1336766735, 
       "execModify":"Fri May 11 15:05:35 2012"}, 
       {"jobID":"2597401", 
       "execType":"user:binary", 
       "exec":"mg.D.64", 
       "execEpoch":1336766756, 
       "execModify":"Fri May 11 15:05:56 2012"},{"jobID":"2597401", 
       "execType":"user:binary", 
       "exec":"lu.D.64", 
       "execEpoch":1336766748, 
       "execModify":"Fri May 11 15:05:48 2012"}], 

由於您可以看到,結果字符串在'"jobID":"2597401" "execType":"user:binary"'內有無效的語法,這是中的語法錯誤給定的數據...

並提供瞭解釋:

/(,\s*(*SKIP))?+ 
# Attempts to match a comma and whitespace, 
# without backtracking; 
# And if the comma is matched, use (*SKIP) verb, 
# which advances the pointer if we fail to match the comma. 

# Key - Value pairs not worthy of keeping. 
(
    "(?!jobID"|exec)[^"]+" # Check if we like this key. 
    \s*+:\s*+ # The colon, advance whitespaces. 
    (# Check keys recursively. 
    "[^"]*" 
     # String literals, boring. 
    | {(?2)?+(?>,\s*(?2))*} 
     # Or: An object storing some key-value pairs 
     # we don't care about. 
    | \[(?3)?+(?>,\s*(?3))*\] 
     # Or: An array storing some values 
     # we don't care about. 
) 
) 
(?(1)|,?) 
# Balance the comma (so the result string is still valid JSON) 
/gx 

這裏是一個regex demo