0
A
回答
0
如果有人感興趣。我並不滿意任何建議。可能是因爲我希望查看線路解決方案,而據我所知,這種解決方案並不存在。 反正我也寫了一個工具,叫做ljoin(用於數據庫的左連接等),其不正是我要求(當然:d)
#!/usr/bin/perl
=head1 NAME
ljoin.pl - Utility to left join files by specified key column(s)
=head1 SYNOPSIS
ljoin.pl [OPTIONS] <INFILE1>..<INFILEN> <OUTFILE>
To successfully join rows one must suply at least one input file and exactly one output file. Input files can be real file names or a patern, like [ABC].txt or *.in etc.
=head1 DESCRIPTION
This utility merges multiple file into one using specified column as a key
=head2 OPTIONS
=item --field-separator=<separator>, -fs <separator>
Specifies what string should be used to separate columns in plain file. Default value for this option is tab symbol.
=item --no-sort-fields, -no-sf
Do not sort columns when creating a key for merging files
=item --complex-key-separator=<separator>, -ks <separator>
Specifies what string should be used to separate multiple values in multikey column. For example "A B" in one file can be presented as "B A" meaning that this application should somehow understand that this is the same key. Default value for this option is space symbol.
=item --no-sort-complex-keys, -no-sk
Do not sort complex column values when creating a key for merging files
=item --include-primary-field, -i
Specifies whether key which is used to find matching lines in multiple files should be included in the output file. First column in output file will be the key in any case, but in case of complex column the value of first column will be sorted. Default value for this option is false.
=item --primary-field-index=<index>, -f <index>
Specifies index of the column which should be used for matching lines. You can use multiple instances of this option to specify a multi-column key made of more than one column like this "-f 0 -f 1"
=item --help, -?
Get help and documentation
=cut
use strict;
use warnings;
use Getopt::Long;
use Pod::Usage;
my $fieldSeparator = "\t";
my $complexKeySeparator = " ";
my $includePrimaryField = 0;
my $containsTitles = 0;
my $sortFields = 1;
my $sortComplexKeys = 1;
my @primaryFieldIndexes;
GetOptions(
"field-separator|fs=s" => \$fieldSeparator,
"sort-fields|sf!" => \$sortFields,
"complex-key-separator|ks=s" => \$complexKeySeparator,
"sort-complex-keys|sk!" => \$sortComplexKeys,
"contains-titles|t!" => \$containsTitles,
"include-primary-field|i!" => \$includePrimaryField,
"primary-field-index|[email protected]" => \@primaryFieldIndexes,
"help|?!" => sub { pod2usage(0) }
) or pod2usage(2);
pod2usage(0) if $#ARGV < 1;
push @primaryFieldIndexes, 0 if $#primaryFieldIndexes < 0;
my %primaryFieldIndexesHash;
for(my $i = 0; $i <= $#primaryFieldIndexes; $i++)
{
$primaryFieldIndexesHash{$i} = 1;
}
print "fieldSeparator = $fieldSeparator\n";
print "complexKeySeparator = $complexKeySeparator \n";
print "includePrimaryField = $includePrimaryField\n";
print "containsTitles = $containsTitles\n";
print "primaryFieldIndexes = @primaryFieldIndexes\n";
print "sortFields = $sortFields\n";
print "sortComplexKeys = $sortComplexKeys\n";
my $fieldsCount = 0;
my %keys_hash =();
my %files =();
my %titles =();
# Read columns into a memory
foreach my $argnum (0 .. ($#ARGV - 1))
{
# Find files with specified pattern
my $filePattern = $ARGV[$argnum];
my @matchedFiles = < $filePattern >;
foreach my $inputPath (@matchedFiles)
{
open INPUT_FILE, $inputPath or die $!;
my %lines;
my $lineNumber = -1;
while (my $line = <INPUT_FILE>)
{
next if $containsTitles && $lineNumber == 0;
# Don't use chomp line. It doesn't handle unix input files on windows and vice versa
$line =~ s/[\r\n]+$//g;
# Skip lines that don't have columns
next if $line !~ m/($fieldSeparator)/;
# Split fields and count them (store maximum number of columns in files for later use)
my @fields = split($fieldSeparator, $line);
$fieldsCount = $#fields+1 if $#fields+1 > $fieldsCount;
# Sort complex key
my @multipleKey;
for(my $i = 0; $i <= $#primaryFieldIndexes; $i++)
{
my @complexKey = split ($complexKeySeparator, $fields[$primaryFieldIndexes[$i]]);
@complexKey = sort(@complexKey) if $sortFields;
push @multipleKey, join($complexKeySeparator, @complexKey)
}
# sort multiple keys and create key string
@multipleKey = sort(@multipleKey) if $sortFields;
my $fullKey = join $fieldSeparator, @multipleKey;
$lines{$fullKey} = \@fields;
$keys_hash{$fullKey} = 1;
}
close INPUT_FILE;
$files{$inputPath} = \%lines;
}
}
# Open output file
my $outputPath = $ARGV[$#ARGV];
open OUTPUT_FILE, ">" . $outputPath or die $!;
my @keys = sort keys(%keys_hash);
# Leave blank places for key columns
for(my $pf = 0; $pf <= $#primaryFieldIndexes; $pf++)
{
print OUTPUT_FILE $fieldSeparator;
}
# Print column headers
foreach my $argnum (0 .. ($#ARGV - 1))
{
my $filePattern = $ARGV[$argnum];
my @matchedFiles = < $filePattern >;
foreach my $inputPath (@matchedFiles)
{
print OUTPUT_FILE $inputPath;
for(my $f = 0; $f < $fieldsCount - $#primaryFieldIndexes - 1; $f++)
{
print OUTPUT_FILE $fieldSeparator;
}
}
}
# Print merged columns
print OUTPUT_FILE "\n";
foreach my $key (@keys)
{
print OUTPUT_FILE $key;
foreach my $argnum (0 .. ($#ARGV - 1))
{
my $filePattern = $ARGV[$argnum];
my @matchedFiles = < $filePattern >;
foreach my $inputPath (@matchedFiles)
{
my $lines = $files{$inputPath};
for(my $i = 0; $i < $fieldsCount; $i++)
{
next if exists $primaryFieldIndexesHash{$i} && !$includePrimaryField;
print OUTPUT_FILE $fieldSeparator;
print OUTPUT_FILE $lines->{$key}->[$i] if exists $lines->{$key}->[$i];
}
}
}
print OUTPUT_FILE "\n";
}
close OUTPUT_FILE;
0
不適用排序文件對於任何選美比賽,這似乎接近:
#!/bin/bash
while read one two; do
one=`echo $one | sed -e 's/,/\n/g' | sort | sed -e '
1 {h; d}
$! {H; d}
H; g; s/\n/,/g;
'`
echo $one $two
done | sort
0
更改內部字段分隔符,然後com用「>」刪除前兩個字母:
(
IFS=" ,";
while read a b n; do
if [ "$a" \> "$b" ]; then
echo "$b,$a $n";
else
echo "$a,$b $n";
fi;
done;
) <<EOF | sort
A,C 1
C,B 2
B,A 3
EOF
相關問題
- 1. 包含另一個關鍵部分的關鍵部分?
- 2. 關鍵部分無法在onSensorChanged()
- 3. python部分與關鍵字參數
- 4. 信號量:關鍵部分與優先
- 5. MPI中的關鍵部分?
- 6. 關鍵部分,如果,否則在多線程應用程序
- 7. 大量關鍵部分
- 8. Python鎖定關鍵部分
- 9. 什麼是關鍵部分?
- 10. 關鍵部分隊列
- 11. 部分關鍵字搜索
- 12. 關鍵部分定義
- 13. 並行MSBUILD - 關鍵部分?
- 14. 某些關鍵部分
- 15. PHP/MySQL關鍵部分
- 16. GridLookUpEdit和多部分鍵
- 17. 自旋鎖無法保護多核系統上的關鍵部分
- 18. jQuery的自動完成與多個關鍵字,突出和部分匹配
- 19. 與無關成分
- 20. RestKit:與無關鍵路徑
- 21. 查找從關鍵字到關鍵字的字符串部分
- 22. 具有多個關鍵路徑的部分核心數據
- 23. ip掃描器多線程中的關鍵部分
- 24. 它是否適用於多層次的關鍵部分?
- 25. Windows 7中的關鍵部分問題
- 26. GAE的關鍵部分探索
- 27. Windows - SQLite的活動關鍵部分
- 28. System.Threading.Timer回調中的關鍵部分
- 29. 羣集環境中的關鍵部分
- 30. 鏽排序鍵值地圖與部分鍵搜索
嘗試http://unix.stackexchange.com/。 – 2011-02-10 16:11:45
版主能否遷移它? – 2011-02-10 16:26:24