從文件中提取單詞，然後列出文件以及包含這些單詞的行號

我有一個名爲Strings.h的文件，用於本地化我擁有的應用程序。我想搜索我的所有類文件，並查明我是否在哪裏使用每個字符串，並輸出每個字符串的類和行號。從文件中提取單詞，然後列出文件以及包含這些單詞的行號

我的想法是使用Python，但也許這是工作的錯誤工具。另外，我有一個基本的算法，但是我擔心它運行需要很長時間。你能寫這個腳本來做我想做的事嗎，或者甚至只是建議一個更好的算法？

Strings.h看起來是這樣的：

#import "NonLocalizedStrings.h" 

#pragma mark Coordinate Behavior Strings 
#define LATITUDE_WORD NSLocalizedString(@"Latitude", @"used in coordinate behaviors") 
#define LONGITUDE_WORD NSLocalizedString(@"Longitude", @"used in coordinate behaviors") 
#define DEGREES_WORD NSLocalizedString(@"Degrees", @"used in coordinate behaviors") 
#define MINUTES_WORD NSLocalizedString(@"Minutes", @"Used in coordiante behaviors") 
#define SECONDS_WORD NSLocalizedString(@"Seconds", @"Used in DMSBehavior.m") 

...

腳本應該採取每個以＃定義啓動線，然後進行中的#define（例如）LATITUDE_WORD

後出現的單詞的列表

僞代碼可能是：

file = strings.h 
for line in file: 
    extract word after #define 
    search_words.push(word) 

print search_words 
[LATITUDE_WORD, LONGITUDE_WORD, DEGREES_WORD, MINUTES_WORD, SECONDS WORD]

後，我有話清單，我的僞代碼是一樣的東西：

found_words = {} 
for word in words: 
    found_words[word] = [] 

for file in files: 
    for line in file: 
    for word in search_words: 
     if line contains word: 
     found_words[word].push((filename, linenumber)) 

print found_words

因此，發現的話會看起來像：

{ 
    LATITUDE_WORD: [ 
        (foo.m, 42), 
        (bar.m, 132) 
        ], 
    LONGITUDE_WORD: [ 
        (baz.m, 22), 
        (bim.m, 112) 
        ], 

}

來源

2012-10-31 Andrew Johnson

這個[bash]怎麼樣？

$ pattern="\\<($(grep '^#define ' Strings.h | cut -d' ' -f2 | tr '\n' '|' | sed 's/|$//'))\\>" 
$ find project_dir -iname '*.m' -exec egrep -Hno "${pattern}" {} + > matches

輸出：

project_dir/bar.m:132:LATITUDE_WORD 
project_dir/baz.m:22:LONGITUDE_WORD 
project_dir/bim.m:112:LONGITUDE_WORD 
project_dir/foo.m:42:LATITUDE_WORD

編輯：我已經改變了上面的代碼來重定向其輸出到文件matches，所以我們可以用它來證明是從來沒有發現的話：

for word in $(grep '^#define ' Strings.h | cut -d' ' -f2) 
do 
    if ! cut -d':' -f3 matches | grep -q "${word}" 
    then 
     echo "${word}" 
    fi 
done

來源

2012-10-31 02:39:18

您可以更新它來遞歸搜索一個目錄，並輸出任何從未找到的單詞嗎？我認爲這可能是最好的解決方案，我很抱歉更新要求！ –

它已經遞歸搜索'project_dir'以查找匹配'* .m'的文件。鑑於其工作方式，輸出永遠不會找到的單詞更爲棘手。 –

好吧，編輯後輸出未找到的單詞。花了我一段時間想通過... –

你應該嘗試：

grep -oP '^#define\s+\K\S+' strings.h

如果您grep缺乏-P選項：

perl -lne 'print $& if /^#define\s+\K\S+/' strings.h

來源

2012-10-31 02:24:33

當我嘗試使用grep命令時，grep列出了使用規則：usage：grep [-abcDEFGHhIiJLlmnOoPqRSsUVvwxZ] [-A num] [-B num] [-C [num]] –

我會說缺乏'-P'。我猜'perl'命令可以工作 –

這裏是一個Python程序。它可能會減少並變得更簡單，但它的工作原理。

import re 
l=filecontent.split('\n') 
for item in l: 
    if item.startswith("#define"): 
    print re.findall("#define .+? ", item)[0].split(' ')[1]

來源

2012-10-31 02:37:54 CoffeeRain

#!/bin/bash 
# Assuming $files constains a list of your files 
word_list=($(grep '^#define' "${files[@]}" | awk '{ print $2 }'))

來源

2012-10-31 02:53:53

所以，它看起來像你有正確的想法。這裏有一些優點和缺點，你有什麼。

優點：

如果您使用Python，你的僞代碼轉換幾乎線上線直接向您的腳本。
你可以學習更多關於Python的知識（對於像這樣的東西有很高的技巧）。

缺點：

Python會跑得比一些已發佈的其他基於bash的解決方案（這是一個問題，如果你有很多的文件中搜索的）有點慢。
您的Python腳本將比這些其他解決方案稍長一點，但您的輸出也可以更靈活一些。

答：因爲我熟悉Python，這就是你問什麼，原來，這裏有更多的代碼，你可以使用：

#!/usr/bin/env python 

# List the files you want to search here 
search_files = [] 
word_file = open('<FILE_PATH_HERE>', 'r') 

# Allows for sorted output later. 
words = [] 

#Contains all found instances. 
inst_dict = {} 

for line in word_file: 
    if line[0:7] == "#define": 
     w = line[7:].split()[0] 
     words.append(w) 
     inst_dict[w] = [] 

for file_name in search_files: 
    file_obj = open(file_name, 'r') 
    line_num = 0 
    for line in file_obj: 
     for w in words: 
      if w in line: 
       inst_dict[w].append((file_name,line_num)) 
     line_num += 1 

# Do whatever you want with 'words' and 'inst_dict' 
words.sort() 
for w in words: 
    string = w + ":\n" 
    for inst in inst_dict[w]: 
     string += "\tFile: " + inst[0] + "\n" 
     string += "\tLine: " + inst[1] + "\n" 
    print string

我沒有測試過的搜索部分代碼，所以使用'原樣'需要您自擔風險。祝你好運，並隨時提出問題或增加你需要的代碼。你的請求很簡單，並有很多解決方案，所以我寧願你瞭解它是如何工作的。

來源

2012-10-31 03:38:14 strongMA

此解決方案使用awk和globstar（後者需要Bash 4）。我認爲可以有進一步的改進，但考慮一下這種草案。

shopt -s globstar 

awk 'NR==FNR { if ($0 ~ /^#define/) found[$2]=""; next; } 
    { 
     for (word in found){ 
     if ($0 ~ word) 
      found[word]=found[word] "\t" FILENAME ":" FNR "\n"; 
     } 
    } 
    END { for (word in found) print word ":\n" found[word]} 
    ' Strings.h **/*.m

使用Strings.h的您發佈的片斷，這裏的排序輸出我得到的（有一些testfiles我做了）

LATITUDE_WORD: 
    lala1.m, 2 
    lala3.m, 1 

DEGREES_WORD: 
    lala2.m, 5 

SECONDS_WORD: 

MINUTES_WORD: 
    lala3.m, 3 

LONGITUDE_WORD: 
    lala3.m, 2

P/S：有沒有與globstar測試這因爲我現在使用的bash是v3（pfff！）

來源

2012-10-31 08:03:44 doubleDown

從文件中提取單詞，然後列出文件以及包含這些單詞的行號

回答

相關問題