重新排序列由字符串變量

我有一個CSV文件，像這樣：重新排序列由字符串變量

Last,First,A00XXXXXX,1492-12-03,2015-06-23,Sentence Skills 67,Reading Comprehension 59,Elementary Algebra 41 
Last,First,A00XXXXXX,1492-12-03,2015-06-23,Sentence Skills 44,Reading Comprehension 40 
Last,First,A00XXXXXX,1492-12-03,2015-06-23,Reading Comprehension 39 
Last,First,A00XXXXXX,1492-12-03,2015-06-23,Elementary Algebra 41,Sentence Skills 82 
Last,First,A00XXXXXX,1492-12-03,2015-06-23,Sentence Skills 104,Elementary Algebra 82 
Last,First,A00XXXXXX,1492-12-03,2015-06-23,Sentence Skills 85 
Last,First,A00XXXXXX,1492-12-03,2015-06-23,Elementary Algebra 51 
Last,First,A00XXXXXX,1492-12-03,2015-06-23,Reading Comprehension 71,Sentence Skills 54,Elementary Algebra 33 
Last,First,A00XXXXXX,1492-12-03,2015-06-23,Reading Comprehension 70,Elementary Algebra 23,Arithmetic 42,Sentence Skills 75 
Last,First,A00XXXXXX,1492-12-03,2015-06-23,Sentence Skills 96,Reading Comprehension 88 
Last,First,A00XXXXXX,1492-12-03,2015-06-23,Elementary Algebra 53,Sentence Skills 97

第5列都是一樣的，最後5列總是以不同的順序。我需要保持前5列相同，並重新排列最後5列，以便始終按以下順序閱讀理解，句子技能，算術，大學水平數學，初等代數

如果其中一個字符串不存在添加逗號

所以最後的結果是這樣的：

awk -F, -v OFS=, '!/Reading Comprehension/ { $5 = $5 "," } 1'

：

Last,First,A00XXXXXX,1492-12-03,2015-06-23,Reading Comprehension 59,Sentence Skills 67,,,Elementary Algebra 41 
Last,First,A00XXXXXX,1492-12-03,2015-06-23,Reading Comprehension 40,Sentence Skills 44,,, 
Last,First,A00XXXXXX,1492-12-03,2015-06-23,Reading Comprehension 39,,,, 
Last,First,A00XXXXXX,1492-12-03,2015-06-23,,Sentence Skills 82,,,Elementary Algebra 41 
Last,First,A00XXXXXX,1492-12-03,2015-06-23,,Sentence Skills 104,,,Elementary Algebra 82 
Last,First,A00XXXXXX,1492-12-03,2015-06-23,,Sentence Skills 85,,, 
Last,First,A00XXXXXX,1492-12-03,2015-06-23,,,,,Elementary Algebra 51 
Last,First,A00XXXXXX,1492-12-03,2015-06-23,Reading Comprehension 71,Sentence Skills 54,,,Elementary Algebra 33 
Last,First,A00XXXXXX,1492-12-03,2015-06-23,Reading Comprehension 70,Sentence Skills 75,Arithmetic 42,,Elementary Algebra 23 
Last,First,A00XXXXXX,1492-12-03,2015-06-23,Reading Comprehension 88,Sentence Skills 96,,, 
Last,First,A00XXXXXX,1492-12-03,2015-06-23,,Sentence Skills 97,,,Elementary Algebra 53

如果他們在相同的順序，我可以做這樣的事情總是210

如果他們至少在同一列總是在一起我可以做一個

awk {print $1,$2,$3,$4,$5,$7,$8,$6,$9,$10}

但每一行以不同的順序是和在末端的一些變量扔我一個循環。

我想用AWK來做到這一點，但是我對任何事情都很開放。

從邏輯上來說，我認爲我需要做的是這樣的：J =閱讀*，I =句子*，K =算術*，L =學院*，M =小學*

然後AWK {打印$ 6J， $ 7i，$ 8k，$ 9l，$ 10m}

但是，我的谷歌搜索返回了模擬結果。所以即使評論是看這裏或尋找這個或檢查出這個答案...這將不勝感激

注：我盡了最大努力確保輸入和輸出是正確的。我發佈了另一個類似於這個問題的問題，但那是在列總是以相同的順序時。所以這是一個不同的要求。

來源

2015-06-23 moore1emu

這是一個用python編寫的簡單幹淨的解決方案。您必須用您的CSV文件替換input.csv和output.csv。

import csv 

labels = [ 
    "Reading Comprehension", "Sentence Skills", "Arithmetic", 
    "College Level Math", "Elementary Algebra" 
] 

with open('output.csv', 'wb') as outfile, \ 
    open('input.csv', 'rb') as infile: 
    writer = csv.writer(outfile) 
    reader = csv.reader(infile) 

    for row in reader: 
     head = row[:5] 
     tail = [] 
     for label in labels: 
      tail.append(next((i for i in row[5:] if i.startswith(label)), "")) 
     writer.writerow(head + tail)

這裏是另一個較短的解決方案，它使用管道：

#!/usr/bin/python  
from sys import stdin, stdout 

labels = [ 
    "Reading Comprehension", "Sentence Skills", "Arithmetic", 
    "College Level Math", "Elementary Algebra" 
] 

for line in stdin: 
    values = line.strip().split(',') 
    stdout.write(','.join(values[:5])) 
    for label in labels: 
     stdout.write(',') 
     stdout.write(next((i for i in values[5:] if i.startswith(label)), '')) 
    stdout.write('\n') 
stdout.flush()

如果將這段代碼保存在一個文件中，例如所謂的reorder，並使該文件可執行文件，您可以重新格式化您的CSV文件像這樣：

$ cat input.csv | ./reorder

然後將重新格式化的csv內容寫入標準輸出。

來源

2015-06-24 00:04:13 miindlek

所以代碼@Glenn傑克遜張貼在這裏：Creating an AWK For Loop out of piped commands

並低於：

awk -F, -v OFS=, ' 
{ 
    delete val     # clear the previous values if any 
    for (i=6; i<=NF; i++) { 
     split($i, a, " ") 
     val[a[1]] = $i   # a[1] is the first space-separated word 
    } 
    print $1,$2,$3,$4,$5, val["Reading"], # null values are OK 
          val["Sentence"], 
          val["Arithmetic"], 
          val["College"], 
          val["Elementary"] 
} 
' input

不正是我需要的，完美的作品，使足夠的理智，我能適應它。

來源

2015-06-23 23:44:19 moore1emu

看起來就像你自己回答了一樣，但是因爲我已經寫完了所有的東西（並且因爲它不要求第一個單詞像awk解決方案一樣是唯一的，只是沒有任何類別是任何其他的子字符串）：

在perl中，這可以通過以下來解決。

use strict; 
use warnings; 

my @categories = ('Reading Comprehension', 'Sentence Skills', 'Arithmetic', 'College Level Math', 'Elementary Algebra'); 

while(<ARGV>) { 
    chomp; 
    my @columns = split(/,/); 
    print join(',', @columns[0 .. 4], map { my $c = $_; (grep { /$c/ } @columns)[0] || '' } @categories)."\n"; 
}

這可以接受文件名作爲輸入或標準輸入，如果沒有參數提供。

對連接線的解釋是，您需要前5列，後跟匹配給定類別的第一列或沒有列匹配的空字符串。

map { my $c = $_; ... } @categories：這樣做對每個類別（與$ C代表類別，而不是$ _）
grep { /$c/ } @columns：誰給定類別
(...)[0] || ''匹配的所有列：第一個匹配的事物或空字符串

作爲一襯墊，這可被表示爲下面的：

perl -nalF, -e 'print join(",", @F[0 .. 4], map { my $c = $_; (grep { /$c/ } @F)[0] || "" } ("Reading Comprehension", "Sentence Skills", "Arithmetic", "College Level Math", "Elementary Algebra"));' inputfile.txt

-n：隱式地把一個WHILE(<ARGV>){}塊周圍的代碼提供d
-a：自動分割線，並把結果@F
-l：從輸入自動刪除換行符，並將其添加到輸出
-F,：拆分的逗號，而不是空格的默認。

來源

2015-06-23 23:49:20

另一個perl解決方案。

#!/usr/bin/env perl 

use warnings; 
use strict; 

my @column_order = (
    'Reading Comprehension', 
    'Sentence Skills', 
    'Arithmetic', 
    'College Level Math', 
    'Elementary Algebra', 
); 

my $csv = 'foo.csv'; # CHANGME 

# Open the File 
open my $fh, $csv 
    or die "Unable to open $csv : $!"; 

# Read through the file, line-by-line 
while (<$fh>) { 
    my @columns = split /,/; # Split each line by ',' 
    my $first_five = join ',', splice @columns, 0, 5; # Remove the first 5 columns 
    my %data = map { $_ => '' } @column_order; # default to empty for each column 

    # iterate over remaing columns 
    for my $col (@columns) { 
     # if we match any of our desired columns 
     if (my ($match) = grep { $col =~ m|^$_| } @column_order) { 
      $col =~ s|\s*$||; # delete any trailing whitespace 
      $data{$match} = $col; # store it in a hash 
     } 
    } 
    my $remaining_columns = join ',', @data{@column_order}; # join the hash values 
    print $first_five . ',', $remaining_columns . "\n"; 
}

來源

2015-06-24 00:13:03 xxfelixxx

重新排序列由字符串變量

回答

相關問題