正則表達式與多個管道JSON文件

我有以下命令搶在UNIX一個JSON：正則表達式與多個管道JSON文件

wget -q -O- https://www.reddit.com/r/NetflixBestOf/.json

哪個（每次顯然有不同的結果）給了我下面的輸出格式：

{ 
"kind": "...", 
"data": { 
"modhash": "", 
"whitelist_status": "...", 
"children": [ 
e1, 
e2, 
e3, 
... 
], 
"after": "...", 
"before": "..." 
} 
}

其中陣列的兒童中的每個元素是結構化的作爲對象如下：

{ 
"kind": "...", 
"data": { 
... 
} 
}

這裏是一個前充足完整的上傳.json的get（車身太長，直接發佈： https://pastebin.com/20p4kk3u

我需要打印完整的數據對象數組孩子的每一個元素中的存在。我知道我需要管ATLEAST兩次，最初得到那裏的孩子[...]，然後數據{...}，這是我到目前爲止有：

wget -q -O- https://www.reddit.com/r/NetflixBestOf/.json | tr -d '\r\n' | grep -oP '"children"\s*:\s*\[\s*\K({.+?})(?=\s*\])' | grep -oP '"data"\s*:\s*\K({.+?})(?=\s*},)'

我是新來的正則表達式，所以我不知道如何處理括號或大括號內的元素我正在grepping。上面的行沒有打印任何東西，我不知道爲什麼。任何幫助表示讚賞。

來源

2017-10-21 Anthony B

你開到使用第三方的事業嗎？我通常使用jq二進制來輕鬆解析json數據。根據您的要求，您只需將json數據傳遞給具有內部查詢語言的jq即可：cat/tmp/data | jq'.data.children | 。[]'（這裏/ tmp/data包含完整的json）。通過使用這些實用程序，您實際上可以使用較短的查詢和高級功能（如原始輸出，查詢等）完成工作。 – akskap

那麼，獲取數據的最終目標不是唯一的目標;這一次恰好是一個.json格式，但我想知道如何通過正則表達式來處理任何文件。 –

代碼

wget -q -O- https://www.reddit.com/r/NetflixBestOf/.json | tr -d '\r\n' | grep -oP '"children"\s*:\s*\[\s*\K({.+?})(?=\s*\])' | grep -oP '"data"\s*:\s*\K({.+?})(?=\s*},)'

一些關於正則表達式

* == zero or more time 
+ == one or more time 
? == zero or one time 
\s == a space character or a tab character or a carriage return character or a new line character or a vertical tab character or a form feed character 
\w == is a word character and can to be from A to Z (upper or lower), from 0 to 9, included also underscore (_) 
\d == all numbers from 0 to 9 
\r == carriage return 
\n == new line character (line feed) 
\ == escape special characters so they can to be read as normal characters 
[...] == search for character class. Example: [abc] search for a or b or c 
(?=) == is a positive lookahead, a type of zero-width assertion. What it's saying is that the captured match must be followed by whatever is within the parentheses but that part isn't captured. 
\K == match start at this position.

反正你可以閱讀更多關於正則表達式從這裏：Regex Tutorial

現在我可以試着解釋代碼

wget download the source. 
tr remove all line feed e carriage return, so we have all the output in one line and can to be handle from grep. 
grep -o option is used for only matching. 
grep -P option is for perl regexp. 

So here 
grep -oP '"children"\s*:\s*\[\s*\K({.+?})(?=\s*\])' 
we have sayed: 
match all the line from "children" 
zero or more spaces 
: 
zero or more spaces 
\[ escaped so it's a simple character and not a special 
zero or more spaces 
\K force submatch to start from here 
(submatch 
{.+?} all, in braces (the braces are included because after start submatch sign. See greedy, not greedy in the regex tutorial for understand how work .+?) 
) close submatch 
(?=\s*\]) stop submatch when zero or more space founded and simple ] is founded but not include it in the submatch.

來源

2017-10-21 19:19:00

感謝您的詳細解釋，非常有幫助。後續問題，如果使用egrep而不使用perl regex語法，會有什麼區別？ –

看看這裏：https：//en.wikipedia.org/wiki/Perl_Compatible_Regular_Expressions –

如果你想得到兒童陣列試試這個，但我不知道這是你在找什麼。

wget -O - https://www.reddit.com/r/NetflixBestOf/.json | sed -n '/children/,/],/p'

來源

2017-10-21 19:12:36

正則表達式與多個管道JSON文件

回答

相關問題