我有一個300 MB的CSV,包含300萬行來自Geonames.org的城市信息。我正在嘗試將這個CSV轉換爲JSON,並通過mongoimport導入到MongoDB中。我想要JSON的原因是,它允許我將「loc」字段指定爲數組,而不是用於地理空間索引的字符串。 CSV以UTF-8編碼。使用Python將CSV轉換爲支持mongoimport的JSON
我的CSV的片段看起來是這樣的:
"geonameid","name","asciiname","alternatenames","loc","feature_class","feature_code","country_code","cc2","admin1_code","admin2_code","admin3_code","admin4_code"
3,"Zamīn Sūkhteh","Zamin Sukhteh","Zamin Sukhteh,Zamīn Sūkhteh","[48.91667,32.48333]","P","PPL","IR",,"15",,,
5,"Yekāhī","Yekahi","Yekahi,Yekāhī","[48.9,32.5]","P","PPL","IR",,"15",,,
7,"Tarvīḩ ‘Adāī","Tarvih `Adai","Tarvih `Adai,Tarvīḩ ‘Adāī","[48.2,32.1]","P","PPL","IR",,"15",,,
所需的JSON輸出(除字符集)與mongoimport的工作原理是下面:
{"geonameid":3,"name":"Zamin Sukhteh","asciiname":"Zamin Sukhteh","alternatenames":"Zamin Sukhteh,Zamin Sukhteh","loc":[48.91667,32.48333] ,"feature_class":"P","feature_code":"PPL","country_code":"IR","cc2":null,"admin1_code":15,"admin2_code":null,"admin3_code":null,"admin4_code":null}
{"geonameid":5,"name":"Yekahi","asciiname":"Yekahi","alternatenames":"Yekahi,Yekahi","loc":[48.9,32.5] ,"feature_class":"P","feature_code":"PPL","country_code":"IR","cc2":null,"admin1_code":15,"admin2_code":null,"admin3_code":null,"admin4_code":null}
{"geonameid":7,"name":"Tarvi? ‘Adai","asciiname":"Tarvih `Adai","alternatenames":"Tarvih `Adai,Tarvi? ‘Adai","loc":[48.2,32.1] ,"feature_class":"P","feature_code":"PPL","country_code":"IR","cc2":null,"admin1_code":15,"admin2_code":null,"admin3_code":null,"admin4_code":null}
我已經嘗試了所有在網上提供CSV -JSON轉換器,並且由於文件大小而無法工作。我得到的最接近的是Mr Data Converter(上圖所示),它將在刪除文檔之間的開始和結束括號以及逗號之後導入MongoDb。不幸的是,該工具不適用於300 MB的文件。
上面的JSON被設置爲UTF-8編碼,但仍然有charset問題,最有可能是由於轉換錯誤?我嘗試使用Python CSVKIT,嘗試使用stackoverflow上的所有CSV-JSON腳本,將CSV導入到MongoDB並將「loc」字符串更改爲數組(不幸保留了引號),並嘗試使用Python CSVKIT甚至嘗試一次手動複製和粘貼30,000條記錄。很多逆向工程,試驗和錯誤等等。
有沒有人有線索如何實現上面的JSON,同時保持像上面的CSV一樣的編碼?我處於完全停滯狀態。
可能的重複:http://stackoverflow.com/questions/1884395/csv-to-json-script – xiaoyi
我的問題是關於格式和不是錯誤消息。我沒有得到任何錯誤,但沒有得到所需的輸出。 – Karl
這個問題不是重複的:在上面提到的另一個問題中,不存在編碼問題和特殊輸出格式要求。 – Petri