2015-05-21 68 views
1

我需要一個shell腳本/ powershell,在文件中計算類似的字母數。在shell腳本文件中計數字母

輸入:

this is the sample of this script. 
This script counts similar letters. 

輸出:

t 9 
h 4 
i 8 
s 10 
e 4 
a 2 
... 

回答

1

這一個班輪應該做的:

awk 'BEGIN{FS=""}{for(i=1;i<=NF;i++)if(tolower($i)~/[a-z]/)a[tolower($i)]++} 
     END{for(x in a)print x, a[x]}' file 

輸出你的例子:

u 1 
h 4 
i 8 
l 3 
m 2 
n 1 
a 2 
o 2 
c 3 
p 3 
r 4 
e 4 
f 1 
s 10 
t 9 
+0

想要對downvote留下任何評論? – Kent

+0

我沒有讓你失望,但這不是PowerShell。 – aphoria

+0

@aphoria我看到'shell腳本/ powershell'將行保存在一個文件中,然後它是shell腳本。解釋也會顯示標籤爲'shell'的問題。 – Kent

2

在PowerShell中,你可以用Group-Object cmdlet的做到這一點:

function Count-Letter { 
    param(
     [String]$Path, 
     [Switch]$IncludeWhitespace, 
     [Switch]$CaseSensitive 
    ) 

    # Read the file, convert to char array, and pipe to group-object 
    # Convert input string to lowercase if CaseSensitive is not specified 
    $CharacterGroups = if($CaseSensitive){ 
     (Get-Content $Path -Raw).ToCharArray() | Group-Object -NoElement 
    } else { 
     (Get-Content $Path -Raw).ToLower().ToCharArray() | Group-Object -NoElement 
    } 

    # Remove any whitespace character group if IncludeWhitespace parameter is not bound 
    if(-not $IncludeWhitespace){ 
     $CharacterGroups = $CharacterGroups |Where-Object { "$($_.Name)" -match "\S" } 
    } 

    # Return the groups, letters first and count second in a default format-table 
    $CharacterGroups |Select-Object @{Name="Letter";Expression={$_.Name}},Count 
} 

這是輸出看起來像我的機器上的與樣品輸入+斷行 Count-Letter

+0

謝謝,但它在PS1格式中的外觀如何?我需要這樣的輸入:task.ps1 letters.txt –

+0

@MolnárBence刪除'函數Count-Letter {}'塊,以便您的'task.ps1'文件中的第一行是'param('opening - then you can像你所描述的那樣調用它,如果你不想在你輸出的頂部輸出 –

+0

,我把它移動到'Format-Table -HideTableHeaders'中,但是它不起作用,你可以看到問題:http ://people.inf.elte.hu/bencehun93/error。jpg –

0

PowerShell的一個班輪:

"this is the sample of this script".ToCharArray() | group -NoElement | sort Count -Descending | where Name -NE ' ' 
+1

我會在排序之前移動篩選器(無需排序您將要丟棄的任何內容) –

-1
echo "this is the sample of this script" | \ 
sed -e 's/ //g' -e 's/\([A-z]\)/\1|/g' | tr '|' '\n' | \ 
sort | grep -v "^$" | uniq -c | \ 
awk '{printf "%s %s\n",$2,$1}' 
0
echo "this is the sample of this script. \ 
This script counts similar letters." | \ 
    grep -o '.' | sort | uniq -c | sort -rg 

輸出,排序,最常見的字母第一:

10 s 
10 
    8 t 
    8 i 
    4 r 
    4 h 
    4 e 
    3 p 
    3 l 
    3 c 
    2 o 
    2 m 
    2 a 
    2 . 
    1 u 
    1 T 
    1 n 
    1 f 

注:沒有sedawk需要;一個簡單的grep -o '.'做了所有繁重的工作。爲了計空格和標點符號,用'[[:alpha:]]' |替換'.'

echo "this is the sample of this script. \ 
This script counts similar letters." | \ 
    grep -o '[[:alpha:]]' | sort | uniq -c | sort -rg 

要計算資本和小寫字母爲一體,使用--ignore-case選項sortuniq

echo "this is the sample of this script. \ 
This script counts similar letters." | \ 
    grep -o '[[:alpha:]]' | sort -i | uniq -ic | sort -rg 

輸出:

10 s 
    9 t 
    8 i 
    4 r 
    4 h 
    4 e 
    3 p 
    3 l 
    3 c 
    2 o 
    2 m 
    2 a 
    1 u 
    1 n 
    1 f