2015-06-23 51 views
1

我有一個CSV文件,像這樣:重新排序列由字符串變量

Last,First,A00XXXXXX,1492-12-03,2015-06-23,Sentence Skills 67,Reading Comprehension 59,Elementary Algebra 41 
Last,First,A00XXXXXX,1492-12-03,2015-06-23,Sentence Skills 44,Reading Comprehension 40 
Last,First,A00XXXXXX,1492-12-03,2015-06-23,Reading Comprehension 39 
Last,First,A00XXXXXX,1492-12-03,2015-06-23,Elementary Algebra 41,Sentence Skills 82 
Last,First,A00XXXXXX,1492-12-03,2015-06-23,Sentence Skills 104,Elementary Algebra 82 
Last,First,A00XXXXXX,1492-12-03,2015-06-23,Sentence Skills 85 
Last,First,A00XXXXXX,1492-12-03,2015-06-23,Elementary Algebra 51 
Last,First,A00XXXXXX,1492-12-03,2015-06-23,Reading Comprehension 71,Sentence Skills 54,Elementary Algebra 33 
Last,First,A00XXXXXX,1492-12-03,2015-06-23,Reading Comprehension 70,Elementary Algebra 23,Arithmetic 42,Sentence Skills 75 
Last,First,A00XXXXXX,1492-12-03,2015-06-23,Sentence Skills 96,Reading Comprehension 88 
Last,First,A00XXXXXX,1492-12-03,2015-06-23,Elementary Algebra 53,Sentence Skills 97 

第5列都是一樣的,最後5列總是以不同的順序。我需要保持前5列相同,並重新排列最後5列,以便始終按以下順序閱讀理解,句子技能,算術,大學水平數學,初等代數

如果其中一個字符串不存在添加逗號

所以最後的結果是這樣的:

awk -F, -v OFS=, '!/Reading Comprehension/ { $5 = $5 "," } 1' 

Last,First,A00XXXXXX,1492-12-03,2015-06-23,Reading Comprehension 59,Sentence Skills 67,,,Elementary Algebra 41 
Last,First,A00XXXXXX,1492-12-03,2015-06-23,Reading Comprehension 40,Sentence Skills 44,,, 
Last,First,A00XXXXXX,1492-12-03,2015-06-23,Reading Comprehension 39,,,, 
Last,First,A00XXXXXX,1492-12-03,2015-06-23,,Sentence Skills 82,,,Elementary Algebra 41 
Last,First,A00XXXXXX,1492-12-03,2015-06-23,,Sentence Skills 104,,,Elementary Algebra 82 
Last,First,A00XXXXXX,1492-12-03,2015-06-23,,Sentence Skills 85,,, 
Last,First,A00XXXXXX,1492-12-03,2015-06-23,,,,,Elementary Algebra 51 
Last,First,A00XXXXXX,1492-12-03,2015-06-23,Reading Comprehension 71,Sentence Skills 54,,,Elementary Algebra 33 
Last,First,A00XXXXXX,1492-12-03,2015-06-23,Reading Comprehension 70,Sentence Skills 75,Arithmetic 42,,Elementary Algebra 23 
Last,First,A00XXXXXX,1492-12-03,2015-06-23,Reading Comprehension 88,Sentence Skills 96,,, 
Last,First,A00XXXXXX,1492-12-03,2015-06-23,,Sentence Skills 97,,,Elementary Algebra 53 

如果他們在相同的順序,我可以做這樣的事情總是210

如果他們至少在同一列總是在一起我可以做一個

awk {print $1,$2,$3,$4,$5,$7,$8,$6,$9,$10} 

但每一行以不同的順序是和在末端的一些變量扔我一個循環。

我想用AWK來做到這一點,但是我對任何事情都很開放。

從邏輯上來說,我認爲我需要做的是這樣的:J =閱讀*,I =句子*,K =算術*,L =學院*,M =小學*

然後AWK {打印$ 6J, $ 7i,$ 8k,$ 9l,$ 10m}

但是,我的谷歌搜索返回了模擬結果。所以即使評論是看這裏或尋找這個或檢查出這個答案...這將不勝感激

注:我盡了最大努力確保輸入和輸出是正確的。我發佈了另一個類似於這個問題的問題,但那是在列總是以相同的順序時。所以這是一個不同的要求。

回答

5

這是一個用python編寫的簡單幹淨的解決方案。您必須用您的CSV文件替換input.csvoutput.csv

import csv 

labels = [ 
    "Reading Comprehension", "Sentence Skills", "Arithmetic", 
    "College Level Math", "Elementary Algebra" 
] 

with open('output.csv', 'wb') as outfile, \ 
    open('input.csv', 'rb') as infile: 
    writer = csv.writer(outfile) 
    reader = csv.reader(infile) 

    for row in reader: 
     head = row[:5] 
     tail = [] 
     for label in labels: 
      tail.append(next((i for i in row[5:] if i.startswith(label)), "")) 
     writer.writerow(head + tail) 

這裏是另一個較短的解決方案,它使用管道:

#!/usr/bin/python  
from sys import stdin, stdout 

labels = [ 
    "Reading Comprehension", "Sentence Skills", "Arithmetic", 
    "College Level Math", "Elementary Algebra" 
] 

for line in stdin: 
    values = line.strip().split(',') 
    stdout.write(','.join(values[:5])) 
    for label in labels: 
     stdout.write(',') 
     stdout.write(next((i for i in values[5:] if i.startswith(label)), '')) 
    stdout.write('\n') 
stdout.flush() 

如果將這段代碼保存在一個文件中,例如所謂的reorder,並使該文件可執行文件,您可以重新格式化您的CSV文件像這樣:

$ cat input.csv | ./reorder 

然後將重新格式化的csv內容寫入標準輸出。

1

所以代碼@Glenn傑克遜張貼在這裏:Creating an AWK For Loop out of piped commands

並低於:

awk -F, -v OFS=, ' 
{ 
    delete val     # clear the previous values if any 
    for (i=6; i<=NF; i++) { 
     split($i, a, " ") 
     val[a[1]] = $i   # a[1] is the first space-separated word 
    } 
    print $1,$2,$3,$4,$5, val["Reading"], # null values are OK 
          val["Sentence"], 
          val["Arithmetic"], 
          val["College"], 
          val["Elementary"] 
} 
' input 

不正是我需要的,完美的作品,使足夠的理智,我能適應它。

3

看起來就像你自己回答了一樣,但是因爲我已經寫完了所有的東西(並且因爲它不要求第一個單詞像awk解決方案一樣是唯一的,只是沒有任何類別是任何其他的子字符串) :

在perl中,這可以通過以下來解決。

use strict; 
use warnings; 

my @categories = ('Reading Comprehension', 'Sentence Skills', 'Arithmetic', 'College Level Math', 'Elementary Algebra'); 

while(<ARGV>) { 
    chomp; 
    my @columns = split(/,/); 
    print join(',', @columns[0 .. 4], map { my $c = $_; (grep { /$c/ } @columns)[0] || '' } @categories)."\n"; 
} 

這可以接受文件名作爲輸入或標準輸入,如果沒有參數提供。

對連接線的解釋是,您需要前5列,後跟匹配給定類別的第一列或沒有列匹配的空字符串。

map { my $c = $_; ... } @categories:這樣做對每個類別(與$ C代表類別,而不是$ _)
grep { /$c/ } @columns:誰給定類別
(...)[0] || ''匹配的所有列:第一個匹配的事物或空字符串

作爲一襯墊,這可被表示爲下面的:

perl -nalF, -e 'print join(",", @F[0 .. 4], map { my $c = $_; (grep { /$c/ } @F)[0] || "" } ("Reading Comprehension", "Sentence Skills", "Arithmetic", "College Level Math", "Elementary Algebra"));' inputfile.txt 

-n:隱式地把一個WHILE(<ARGV>){}塊周圍的代碼提供d
-a:自動分割線,並把結果@F
-l:從輸入自動刪除換行符,並將其添加到輸出
-F,:拆分的逗號,而不是空格的默認。

2

另一個perl解決方案。

#!/usr/bin/env perl 

use warnings; 
use strict; 

my @column_order = (
    'Reading Comprehension', 
    'Sentence Skills', 
    'Arithmetic', 
    'College Level Math', 
    'Elementary Algebra', 
); 

my $csv = 'foo.csv'; # CHANGME 

# Open the File 
open my $fh, $csv 
    or die "Unable to open $csv : $!"; 

# Read through the file, line-by-line 
while (<$fh>) { 
    my @columns = split /,/; # Split each line by ',' 
    my $first_five = join ',', splice @columns, 0, 5; # Remove the first 5 columns 
    my %data = map { $_ => '' } @column_order; # default to empty for each column 

    # iterate over remaing columns 
    for my $col (@columns) { 
     # if we match any of our desired columns 
     if (my ($match) = grep { $col =~ m|^$_| } @column_order) { 
      $col =~ s|\s*$||; # delete any trailing whitespace 
      $data{$match} = $col; # store it in a hash 
     } 
    } 
    my $remaining_columns = join ',', @data{@column_order}; # join the hash values 
    print $first_five . ',', $remaining_columns . "\n"; 
}