2017-03-08 74 views
0

我有以下的CSVPython的CSV列JSON

id, o1, o2, o3 
'jess', 1.0, 4, 0.3 
'jill', 0, 5, 0.123 
'jamie', -3, 0.2, 1.0 

,並希望它在各列鍵在標題名稱的JSON嵌套JSON:

myjson = { 
    "o1": {"jess": 1.0, "jill": 0, "jamie": -3}, 
    "o2": {"jess": 4, "jill": 5, "jamie": 0.2}, 
    "o3": {"jess": 0.3, "jill": 0.2, "jamie": 1.0}, 

不知道的最好(最pythonic)的方式來做到這一點。這是我第一次嘗試:

import csv 
with open(myfile, "r") as f: 
    reader = csv.reader(f, delimiter=',', quotechar='"') 
    first = True 
    for line in reader: 
     if first: 
      myjson = {key: dict() for key in line} 
      header = list(line) 
      first = False 
     for i in range(len(header)): 
      id = line[0] 
      myjson[header[i+1]][id] = line[i+1] 

我假設有一個更好的方法來做到這一點。

編輯:應該早些指明這一點,但我不想使用像熊貓這樣的東西。這需要超級快速,並且最小的軟件包依賴性。

+0

[請在這裏找到答案,(HTTP://計算器.com/questions/38170071/csv-to-json-convertion-with-python)希望它有幫助。 – Bhargav

+0

@Bhargav對不起,這不是答案。這個答案是每一行,並在列名上鍵入一個字典。我正在嘗試取每一列,並在該行的第一個條目上鍵入一個字典。 – Sal

回答

3

這可能是「作弊」,但這有總是爲我工作。如果它沒有 - 沒有一點代碼無法修復。但我使用Pandas模塊。它確實需要處理大量的數據需求。我讀了CSV成數據幀,然後把數據幀到JSON(或任何其他形式)

import pandas as pd 

df1 = pd.read_csv('YOUR_PATH_HERE') 

df1.to_json('PATH_HERE') 

這是超級簡單,易於定製。您可能需要擺弄更多變量。下面是文檔:read_csvto_json ,這始終是一個很好的一個閱讀:10 Minutes to Pandas

+0

對不起,我應該沒有指定熊貓。我試圖不依賴於額外的軟件包,也不想花時間構建一個DataFrame來放棄它。完全是這樣做的最簡單的方法! – Sal

+0

@Sal,瞭解。但是,關於「構建數據框以放棄它」的觀點 - 創建數據框需要一分鐘。不要放棄你的數據寶貝。 – MattR

+0

設計規格實際上是說「無熊貓」。 :) – Sal

0

我絕對認爲下面的答案是太長,但如果您需要的答案依然是,這個工程。 我創建了一個test.csv按您的數據

我不知道你爲什麼想用熊貓來消除,但反正

import csv 
import itertools 
from itertools import * 
import json 


def read_with_header(): 
    with open ('/Users/bhargavsaidama/test.csv', 'rb') as f: 
     reader = csv.reader(f, delimiter = ',', quotechar = '|') 
     row_count = 0 
     keys = [] 
     for row in reader: 
      row_count = row_count + 1 
      keys.append(row) 
     header = keys[0] 
     return row_count, header 


def reading_ignore_header(): 

    row_count, header = read_with_header() 

    with open('/Users/bhargavsaidama/test.csv', 'rb') as f: 
     f.next() 
     # row_count = sum(1 for row in f) 
     # # print row_count 
     reader = csv.reader(f, delimiter = ',' , quotechar = '|') 
     result = [] 
     values =() 

     for row in reader: 
      # row_count is taken including header file in the above function 
      values = tuple((itertools.combinations(row, 2)))[:(row_count-1)] # row_count is important,since your keys are rows 

      for x, y in values: 
       result.append({x:y}) 
     return result, header 

# The following function is taken from here 
# http://stackoverflow.com/questions/312443/how-do-you-split-a-list-into-evenly-sized-chunks 

def chunks(l, n): 
    """Yield successive n-sized chunks from l.""" 
    for i in range(0, len(l), n): 
     yield l[i:i + n] 


def main(): 

    result, header = reading_ignore_header() 
    final_values = list(chunks(result,3)) # here 3 reflects (row_count-1) 
    header = header[1:]  # seems like u wanna ignore Id 
    data_str = json.dumps(dict(zip(header, final_values))) 
    data_json = json.loads(data_str) 
    print data_str, data_json 
    return data_str, data_json 


if __name__ == "__main__": 
    main() 

希望它能幫助,如果你可以優化它繼續前進,做到這一點。我也將學習:)

感謝

0

這裏有一個簡單的解決方案,您將需要pyexcel和pyexcel文本:

>>> import pyexcel as p 
>>> sheet=p.get_sheet(file_name='test.csv') 
>>> sheet 
test.csv: 
+---------+-----+-----+-------+ 
| id  | o1 | o2 | o3 | 
+---------+-----+-----+-------+ 
| 'jess' | 1.0 | 4 | 0.3 | 
+---------+-----+-----+-------+ 
| 'jill' | 0 | 5 | 0.123 | 
+---------+-----+-----+-------+ 
| 'jamie' | 3 | 0.2 | 1.0 | 
+---------+-----+-----+-------+ 
>>> sheet.transpose() 
>>> sheet.name_columns_by_row(0) 
>>> sheet.name_rows_by_column(0) 
>>> sheet 
test.csv: 
+----+--------+--------+---------+ 
| | 'jess' | 'jill' | 'jamie' | 
+====+========+========+=========+ 
| o1 | 1.0 | 0  | 3  | 
+----+--------+--------+---------+ 
| o2 | 4  | 5  | 0.2  | 
+----+--------+--------+---------+ 
| o3 | 0.3 | 0.123 | 1.0  | 
+----+--------+--------+---------+ 
>>> sheet.get_json(write_title=False) # pip install pyexcel-text 
'{"o1": {"\'jamie\'": 3, "\'jess\'": 1.0, "\'jill\'": 0}, "o2": {"\'jamie\'": "0.2", "\'jess\'": 4, "\'jill\'": 5}, "o3": {"\'jamie\'": 1.0, "\'jess\'": "0.3", "\'jill\'": "0.123"}}'