2016-11-17 60 views
0

我遇到了問題,因爲我試圖將API過程自動化爲BigQuery。解析過濾器需要換行符分隔JSON格式

問題是我需要數據以換行符分隔的JSON格式才能進入我的BigQuery數據庫,但我正在拖動的數據不會這樣做,所以我需要將其解析出來。

Here is a link to pastebin so you can get an idea of what the data looks like,而且,這僅僅是因爲:

{"type":"user.list","users":[{"type":"user","id":"581c13632f25960e6e3dc89a","user_id":"ieo2e6dtsqhiyhtr","anonymous":false,"email":"[email protected]","name":"Joe Martinez","pseudonym":null,"avatar":{"type":"avatar","image_url":null},"app_id":"b5vkxvop","companies":{"type":"company.list","companies":[]},"location_data":{"type":"location_data","city_name":"Houston","continent_code":"NA","country_name":"United States","latitude":29.7633,"longitude":-95.3633,"postal_code":"77002","region_name":"Texas","timezone":"America/Chicago","country_code":"USA"},"last_request_at":1478235114,"last_seen_ip":"66.87.120.30","created_at":1478234979,"remote_created_at":1478234944,"signed_up_at":1478234944,"updated_at":1478235145,"session_count":1,"social_profiles":{"type":"social_profile.list","social_profiles":[]},"unsubscribed_from_emails":false,"user_agent_data":"Mozilla/5.0 (Linux; Android 6.0.1; SM-G920P Build/MMB29K) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.68 Mobile Safari/537.36","tags":{"type":"tag.list","tags":[]},"segments":{"type":"segment.list","segments":[{"type":"segment","id":"57d2ea275bfcebabd516d963"},{"type":"segment","id":"57d2ea265bfcebabd516d962"}]},"custom_attributes":{"claimCount":"1","memberType":"claimant"}},{"type":"user","id":"581c22a19a1dc02c460541df","user_id":"1o3helrdv58cxm7jf","anonymous":false,"email":"[email protected]","name":"Joe Coleman","pseudonym":null,"avatar":{"type":"avatar","image_url":null},"app_id":"b5vkxvop","companies":{"type":"company.list","companies":[]},"location_data":{"type":"location_data","city_name":"San Jose","continent_code":"NA","country_name":"United States","latitude":37.3394,"longitude":-121.895,"postal_code":"95141","region_name":"California","timezone":"America/Los_Angeles","country_code":"USA"},"last_request_at":1478239113,"last_seen_ip":"216.151.183.47","created_at":1478238881,"remote_created_at":1478238744,"signed_up_at":1478238744,"updated_at":1478239113,"session_count":1,"social_profiles":{"type":"social_profile.list","social_profiles":[]},"unsubscribed_from_emails":false,"user_agent_data":"Mozilla/5.0 (Windows NT 6.3; WOW64; rv:49.0) Gecko/20100101 Firefox/49.0","tags":{"type":"tag.list","tags":[]},"segments":{"type":"segment.list","segments":[{"type":"segment","id":"57d2ea275bfcebabd516d963"},{"type":"segment","id":"57d2ea265bfcebabd516d962"}]},"custom_attributes":{"claimCount":"2","memberType":"claimant"}}],"scroll_param":"24ba0fac-b8f9-46b2-944a-9bb523dcd1b1"} 

這兩個問題是第一行:

{"type":"user.list","users": 

,並在底部的最後一塊:

,"scroll_param":"24bd0rac-b2f9-46b2-944a-9zz543dcd1b1"} 

如果你消除這兩個,你只需留下必要的數據,我知道需要使用哪種過濾器將其解析爲新行分隔格式。

You can see for yourself by playing around with this tool,但如果你只是複製和粘貼一切事情,從第一個打開支架上最後一行的密切支架,將其設置爲「緊湊型輸出」和應用濾鏡:

.[] 

結果將就像你所看到的在這裏,in a nice and neat newline delimited format like you see here.,也在這裏是不是在鏈接:

{"type":"user","id":"581c13632f25960e6e3dc89a","user_id":"ieo2e6dtsqhiyhtr","anonymous":false,"email":"[email protected]","name":"Joe Martinez","pseudonym":null,"avatar":{"type":"avatar","image_url":null},"app_id":"b5vkxvop","companies":{"type":"company.list","companies":[]},"location_data":{"type":"location_data","city_name":"Houston","continent_code":"NA","country_name":"United States","latitude":29.7633,"longitude":-95.3633,"postal_code":"77002","region_name":"Texas","timezone":"America/Chicago","country_code":"USA"},"last_request_at":1478235114,"last_seen_ip":"66.87.120.30","created_at":1478234979,"remote_created_at":1478234944,"signed_up_at":1478234944,"updated_at":1478235145,"session_count":1,"social_profiles":{"type":"social_profile.list","social_profiles":[]},"unsubscribed_from_emails":false,"user_agent_data":"Mozilla/5.0 (Linux; Android 6.0.1; SM-G920P Build/MMB29K) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.68 Mobile Safari/537.36","tags":{"type":"tag.list","tags":[]},"segments":{"type":"segment.list","segments":[{"type":"segment","id":"57d2ea275bfcebabd516d963"},{"type":"segment","id":"57d2ea265bfcebabd516d962"}]},"custom_attributes":{"claimCount":"1","memberType":"claimant"}} 
{"type":"user","id":"581c22a19a1dc02c460541df","user_id":"1o3helrdv58cxm7jf","anonymous":false,"email":"[email protected]","name":"Joe Coleman","pseudonym":null,"avatar":{"type":"avatar","image_url":null},"app_id":"b5vkxvop","companies":{"type":"company.list","companies":[]},"location_data":{"type":"location_data","city_name":"San Jose","continent_code":"NA","country_name":"United States","latitude":37.3394,"longitude":-121.895,"postal_code":"95141","region_name":"California","timezone":"America/Los_Angeles","country_code":"USA"},"last_request_at":1478239113,"last_seen_ip":"216.151.183.47","created_at":1478238881,"remote_created_at":1478238744,"signed_up_at":1478238744,"updated_at":1478239113,"session_count":1,"social_profiles":{"type":"social_profile.list","social_profiles":[]},"unsubscribed_from_emails":false,"user_agent_data":"Mozilla/5.0 (Windows NT 6.3; WOW64; rv:49.0) Gecko/20100101 Firefox/49.0","tags":{"type":"tag.list","tags":[]},"segments":{"type":"segment.list","segments":[{"type":"segment","id":"57d2ea275bfcebabd516d963"},{"type":"segment","id":"57d2ea265bfcebabd516d962"}]},"custom_attributes":{"claimCount":"2","memberType":"claimant"}} 

所以我需要的是一個過濾器,我可以在我用同樣的方式應用[]是使出了渾身文本在第一個開放括號之前(正如我在上面強調的那樣)以及最後的封閉式括號之前的所有文本。

但是這裏是最後一個問題出現的地方。雖然我不需要最後一段文字,但我仍然需要那些字母和數字字符串作爲滾動參數。這是因爲爲了完全捕獲我在API中需要的所有數據,我需要連續使用它從命令行調用產生的新滾動參數,直到所有數據都在。

初始調用看起來像這樣:

$ curl -s https://api.program.io/users/scroll -u 'dG9rOmU5NGFjYTkwXzliNDFfNGIyMF9iYzA0XzU0NDg3MjE5ZWJkZDoxOjA=': -H 'Accept:application/json' 

但ordere得到所有的信息中,我需要一個單獨的呼叫,看起來像滾動參數:

curl -s https://api.intercom.io/users/scroll?scroll_param=foo -u 'dG9rOmU5NGFjYTkwXzliNDFfNGIyMF9iYzA0XzU0NDg3MjE5ZWJkZDoxOjA=': -H 'Accept:application/json' >scroll.json 

所以,當我需要在擺脫文本包含參數以便將其放入換行符分隔符中的blob d格式,我仍然需要提取參數的任何內容以循環回到另一個腳本,該腳本將繼續運行直到它爲空。

很想聽到有關解決此問題的任何建議!

+0

請在問題中提供[mcve]中的所有相關代碼,而不是在第三方網站上。 –

+0

真的嗎?它真的有所作爲嗎?當你方便地坐在那裏,以一種容易被消化的格式,你們真的要讓我走,並且格式化一切嗎? – wizkids121

+0

當該網站宕機?這個問題的未來讀者會做什麼?用他們的想象力來看待數據應該是什麼? –

回答

0

與其他人發表評論一樣,我不會假裝理解具體問題的細節,但如果一般問題是如何使用jq發出以換行符分隔的JSON(即確保每個JSON文本後面跟着一個換行符,並且沒有其他(原始)換行符被添加),答案很簡單:使用jq和-c選項,並且不使用-r選項。

0

從您的數據粗略的檢查,過濾

.users[] 

會給你只是用戶數據加載和過濾

.scroll_param 

將僅返回滾動參數。如果你把你的數據放在一個文件中,你可以爲每個過濾器調用jq一次,但是如果你必須流式傳輸數據,你可以簡單地使用,運算符來一個接一個地返回一個值。例如

.scroll_param 
, .users[] 

如果您使用的過濾器與-c選項JQ一起會產生這樣

"24ba0fac-b8f9-46b2-944a-9bb523dcd1b1" 
{"type":"user","id":"581c13632f25960e6e3dc89a","user_id":"ieo2e6dtsqhiyhtr",... 
{"type":"user","id":"581c22a19a1dc02c460541df","user_id":"1o3helrdv58cxm7jf",... 

大概是讀取JQ輸出可以捕捉第一線使用的curl調用的腳本和輸出將其餘數據放入您加載的文件中。

希望這會有所幫助。