我的用戶配置文件中有JSON數據,我最終需要使用SPSS進行分析。 目前我在Google Refine中導入數據以運行一些數據清理。然而,我的問題是,原始的JSON由嵌套對象組成,即例如具有「公司」的「professional_experience」部分,其包括若干子對象/數組(參見示例)。 Google提煉通過創建具有該信息的附加行來處理此問題。然而,這與我需要用SPSS或Excel分析數據的「關係」(SQL方面)視圖/表格結構沒有任何一致性,因爲還有其他的子對象(學校,獎項,等等),它們也被「愚蠢地」填充在高級別「主」記錄下面的行中,但彼此之間沒有直接(行/列方式)關係(考慮分析)。將關係格式的嵌套對象
正如我所看到的,我需要將這些(子對象)列和行提取到自己的表中,並創建一些n:m關係,或者至少將它歸一化爲一個表(然後接受冗餘當然還有其他未知的屬性)。
我想要結束的是一個統一分析/聚類在某些屬性上的一致表。我認爲地圖縮小並不是真正的選擇。
有沒有人對如何處理這個問題有想法,或者有可能是一種更簡單的方式直接處理JSON數據?
{ "users": [
{
"id": "123456_abcdef",
"first_name": "Max",
"last_name": "Mustermann",
"display_name": "Max Mustermann",
"page_name": "Max_Mustermann",
"permalink": "https://www.xing.com/profile/Max_Mustermann",
"employment_status": "EMPLOYEE",
"gender": "m",
"birth_date": {
"day": 12,
"month": 8,
"year": 1963
},
"active_email": "ma[email protected]",
"time_zone": {
"name": "Europe/Copenhagen",
"utc_offset": 2.0
},
"premium_services": [
"SEARCH",
"PRIVATEMESSAGES"
],
"badges": [
"PREMIUM",
"MODERATOR"
],
"wants": "einen neuen Job",
"haves": "viele tolle Skills",
"interests": "Flitzebogen schießen and so on",
"organisation_member": "ACM, GI",
"languages": {
"de": "NATIVE",
"en": "FLUENT",
"fr": null,
"zh": "BASIC"
},
"private_address": {
"city": "Hamburg",
"country": "DE",
"zip_code": "20357",
"street": "Privatstraße 1",
"phone": "49|40|1234560",
"fax": "||",
"province": "Hamburg",
"email": "[email protected]",
"mobile_phone": "49|0155|1234567"
},
"business_address": {
"city": "Hamburg",
"country": "DE",
"zip_code": "20357",
"street": "Geschäftsstraße 1a",
"phone": "49|40|1234569",
"fax": "49|40|1234561",
"province": "Hamburg",
"email": "[email protected]",
"mobile_phone": "49|160|66666661"
},
"web_profiles": {
"qype": [
"http://qype.de/users/foo"
],
"google+": [
"http://plus.google.com/foo"
],
"other": [
"http://blog.example.org"
],
"homepage": [
"http://example.org",
"http://other-example.org"
]
},
"instant_messaging_accounts": {
"skype": "1122334455",
"googletalk": "max.mustermann"
},
"professional_experience": {
"primary_company": {
"id": "1_abcdef",
"name": "XING AG",
"title": "Softwareentwickler",
"company_size": "201-500",
"tag": null,
"url": "http://www.xing.com",
"career_level": "PROFESSIONAL_EXPERIENCED",
"begin_date": "2010-01",
"description": null,
"end_date": null,
"industry": "AEROSPACE",
"form_of_employment": "FULL_TIME_EMPLOYEE",
"until_now": true
},
"companies": [
{
"id": "1_abcdef",
"name": "XING AG",
"title": "Softwareentwickler",
"company_size": "201-500",
"tag": null,
"url": "http://www.xing.com",
"career_level": "PROFESSIONAL_EXPERIENCED",
"begin_date": "2010-01",
"description": null,
"end_date": null,
"industry": "AEROSPACE",
"form_of_employment": "FULL_TIME_EMPLOYEE",
"until_now": true
},
{
"id": "24_abcdef",
"name": "Ninja Ltd.",
"title": "DevOps",
"company_size": null,
"tag": "NINJA",
"url": "http://www.ninja-ltd.co.uk",
"career_level": null,
"begin_date": "2009-04",
"description": null,
"end_date": "2010-07",
"industry": "ALTERNATIVE_MEDICINE",
"form_of_employment": "OWNER",
"until_now": false
},
{
"id": "45_abcdef",
"name": null,
"title": "Wiss. Mitarbeiter",
"company_size": null,
"tag": "OFFIS",
"url": "http://www.uni.de",
"career_level": null,
"begin_date": "2007",
"description": null,
"end_date": "2008",
"industry": "APPAREL_AND_FASHION",
"form_of_employment": "PART_TIME_EMPLOYEE",
"until_now": false
},
{
"id": "176_abcdef",
"name": null,
"title": "TEST NINJA",
"company_size": "201-500",
"tag": "TESTCOMPANY",
"url": null,
"career_level": "ENTRY_LEVEL",
"begin_date": "1998-12",
"description": null,
"end_date": "1999-05",
"industry": "ARTS_AND_CRAFTS",
"form_of_employment": "INTERN",
"until_now": false
}
],
"awards": [
{
"name": "Awesome Dude Of The Year",
"date_awarded": 2007,
"url": null
}
]
},
"educational_background": {
"degree": "MSc CE/CS",
"primary_school": {
"id": "42_abcdef",
"name": "Carl-von-Ossietzky Universtät Schellenburg",
"degree": "MSc CE/CS",
"notes": null,
"subject": null,
"begin_date": "1998-08",
"end_date": "2005-02"
},
"schools": [
{
"id": "42_abcdef",
"name": "Carl-von-Ossietzky Universtät Schellenburg",
"degree": "MSc CE/CS",
"notes": null,
"subject": null,
"begin_date": "1998-08",
"end_date": "2005-02"
}
],
"qualifications": [
"TOEFLS",
"PADI AOWD"
]
}
}
] }
非常感謝您的回覆! Pentaho和Talend是非常有用的,但是我無法弄清楚如何在我的問題中使用它們。 我現在正在做的是寫一個VBA腳本來「規範化」數據! 但是,如果有一種工具可以提供從JSON到關係模型的即時可用的轉換,我真的很感興趣。 準確地說:我有那些數組字段,這些數組字段在提煉中作爲新行添加,但從語義角度來看,這沒有意義,例如,學校和僱主被放在一個「額外的行」,儘管事實上,他們只與用戶有關,而不是彼此...... – kreck 2014-11-08 15:17:16
是的,這就是你可以使用Pentaho和Talend。獲得一些培訓或在他們的論壇上詢問。沒有免費的披薩。好運。 – 2014-11-09 16:31:06