2016-05-26 48 views
0

我試圖從CSV文件(A)讀取數據,提取數據並將其寫入不同的CSV文件(B)。在新文件B中,我想要兩行。第一行應包含所有預定義變量,第2行應填寫屬於第1行中特定變量的所有值。Python - 將.csv中的數據讀取並分配給預定義的變量

我希望任何人都可以告訴我實現此目的的最佳方法。 (ⅰ加入我在這篇文章的結尾使用.csv文件)

(A)Python代碼

import re 
import csv 

#Call for the export file 
data = open('C:/Exports/Export 3.csv') 

#Make a list with the predefined variables 
definition = ["record_id", "abbreviation", "study_id", "step_count", 
"distance", "ambulation_time", "velocity", "cadence", "norm_velocity", 
"step_time_differential", "step_length_differential", 
"cycle_time_differential", "step_time", "step_length", "step_extremity", 
"cycle_time", "stride_length", "hh_base_support", "swing_time", 
"stance_time", "single_support_time", "double_support_time", "toe_in_out"] 

my_data = {} 

#Show data for each row without whitespace 
for line in data: 
    line = line.rstrip() 
    #print(line) 
    values = re.findall("-?[0-9].+", line) 
    print(values) 

這是輸出的一部分上面的代碼將生成:

[] 
['3;'] 
['292,34;'] 
['1,67;'] 
['175,1;'] 
['107,8;'] 
[] 
['0,004;'] 
['1,051;'] 
['0,008;'] 
[] 
[] 
['0,558;0,554'] 
['96,746;97,797'] 
[] 
['1,116;1,108'] 
['192,159;197,122'] 
['2,988;6,32'] 
['0,466;0,466'] 
['0,65;0,642'] 
['0,466;0,466'] 
['0,184;0,176'] 
['41,8;42,1'] 
['58,2;57,9'] 
['41,8;42,1'] 
['16,5;15,9'] 
['-1,1;4'] 

正如您在輸出代碼中看到的,有些行包含兩個值,例如:['2,988; 6,32']它們需要變爲1值,通過計算兩個值之前的平均值寫t下襬到一個csv文件。

(B)所需的輸出

record_id abbreviation study_id step_count distance 
1         3   292,34 

如果你喜歡,你可以導出文件播放,你可以在這裏下載: CSV export file

+0

提供更多信息與樣本輸入和輸出樣本,這樣你就不會得到彌補答案 –

+0

謝謝!我改變了一些文本,使其更容易理解,並添加了我最後使用的輸入.csv文件。還添加了所需輸出的示例。 – Yak

回答

0

你應該打開你的文件與csv庫,semi-colon分隔,然後將第一列與您定義中的項目進行比較。這幾乎是這是否:

import csv 
from collections import defaultdict 

data = defaultdict(str) 

#Make a list with the predefined variables 
definition = ["record_id", "abbreviation", "study_id", "step_count", 
"distance", "ambulation_time", "velocity", "cadence", "norm_velocity", 
"step_time_differential", "step_length_differential", 
"cycle_time_differential", "step_time", "step_length", "step_extremity", 
"cycle_time", "stride_length", "hh_base_support", "swing_time", 
"stance_time", "single_support_time", "double_support_time", "toe_in_out"] 

with open('C:/Exports/Export 3.csv', 'r') as f, 
    open('C:/Exports/result.csv', 'w') as outfile: 
    reader = csv.reader(f, delimiter=';') 
    next(reader, None) # skip the headers 

    writer = csv.DictWriter(outfile, fieldnames=definition, lineterminator='\n') 
    writer.writeheader() 

    for row in reader: 
     for item in definition: 
      h = item.replace('_','') 
      r0 = row[0].lower().replace(' ','') 
      if h in r0: 
       print(h, r0) 
       data[item] = row[1] 

    data['record_id'] = 1 # record id does not exist in input file: Export 3.csv 

    writer.writerow(data) 

要想從項目平均,您可以使用:

try: 
    avg = (float(row[1].replace(',', '.')) + float(row[2].replace(',', '.')))/2 
except ValueError: 
    avg = 0 # for cases with empty strings or commas 
+0

非常感謝!這是幫助我bigtime!,我仍然有一些小問題,我張貼在下面的答案。 – Yak

+0

您好@Yak,您可以通過使'definition'與輸入文件中的名稱更加緊密地匹配來解決不匹配問題。至於平均值,請參閱我的更新 –

+0

是的,這是我認爲和摩西一樣,但是例如速度恰好匹配速度,但是在result.csv中速度值是空的。這似乎發生了,因爲有更多的變量,速度的名字就像:stridevelocitystddev。 至於平均值,我應該把代碼放在哪裏,所以它也會傳遞給result.csv? – Yak

0

這幾乎是完美的!似乎有一些小問題。 在result.csv我缺少以下變量的值:

step_time 
step_length 
cycle_time 
stride_length 
hh_base_support 
swing_time 
stance_time 
single_supp_time  
double_supp_time  
toe_in_out 

我使用的這部分代碼,檢查結果:

print(h, r0, row[1], row[2]) 

這給我回了以下信息:

stepcount stepcount 3 
distance distance 292,34 
ambulationtime ambulationtime 1,67 
velocity velocity 175,1 
cadence cadence 107,8 
velocity normalizedvelocity , 
normalizedvelocity normalizedvelocity , 
steptimedifferential steptimedifferential 0,004 
steptime steptimedifferential 0,004 
steplengthdifferential steplengthdifferential 1,051 
steplength steplengthdifferential 1,051 
cycletimedifferential cycletimedifferential 0,008 
cycletime cycletimedifferential 0,008 
steptime steptime(sec) 0,558 0,554 
steplength steplength(cm) 96,746 97,797 
stepextremity stepextremity(ratio) , , 
cycletime cycletime(sec) 1,116 1,108 
stridelength stridelength(cm) 192,159 197,122 
hhbasesupport hhbasesupport(cm) 2,988 6,32 
swingtime swingtime(sec) 0,466 0,466 
stancetime stancetime(sec) 0,65 0,642 
velocity stridevelocity 172,185 177,908 
steptime steptimestddev , 0,006 
stridelength stridelengthstddev , , 
swingtime swingtimestddev , , 
stancetime stancetimestddev , , 
velocity stridevelocitystddev , , 
singlesupptime singlesupptimestddev , , 
doublesupptime doublesupptimestddev , , 

從上面的輸出,你可以看到有一些問題與多個字符串(如速度)相匹配的名字,並在所有(如toe_in_out)I H一些不匹配沒有線索如何解決這個問題。

另外我試圖計算平均每當有兩個值,但這給了我錯誤:ValueError:無法將字符串轉換爲浮動。我認爲這是造成逗號的原因。我試圖在應用下面的代碼循環來計算平均:

float(row[1]+float(row[2]))/2 
相關問題