在邊界內抓取單詞

問題：表示在兩個邊界之間抓取單詞的正則表達式。下面的代碼無法正常工作在邊界內抓取單詞

regexp -- {/b/{(.+)/}}/b} $outputline8 - filtered

目的：

拼搶位於後 set_false_path和{和}之間的所有引腳名xxx/xxx[x]。
在set_false_path中可能有另一個選項，例如「through」，我仍然希望在這些選項後抓住這些引腳，並將這些引腳放入輸出文件中，如下所述。

這裏是我的輸入文件：input_file.txt

set_false_path -from [get_ports {AAAcc/BBB/CCC[1] \ 
BBB_1/CCC[1] CCC/DDD[1] \ 
DDD/EEE EEE/FFF[1] \ 
FFF/GGG[1]}] -through\ 
[get_pins {GGG/HHH[1] HHH/III[1] \ 
XXX/YYY[1] YYY/XXX[1] \ 
AAA/ZZZ[1]}] 
set_timing_derate -cell_sdada [get_cells \ 
{NONO[1]} 
set_false_path -from [get_ports {AAA/DDD[2]}]

這裏是輸出文件（我所預期的格式）：output_file.txt

AAAcc/BBB/CCC[1] 
BBB_1/CCC[1] 
CCC/DDD[1] 
DDD/EEE 
EEE/FFF[1] 
FFF/GGG[1] 
GGG/HHH[1] 
HHH/III[1] 
XXX/YYY[1] 
YYY/XXX[1] 
AAA/ZZZ[1] 
AAA/DDD[2]

一般來說，這些引腳沒有任何一般模式。所以唯一的辦法是抓住{和}之間的所有引腳。

從上面的輸入文件中，我們可以看到那些set_命令（來自input.txt）沒有連接在一個句子中。於是我做了一個代碼，只搶到內set_false path內容並加入這些線路，下面是我的代碼：

set inputfile [open "input_file.txt" r] 
set outputfile [open "output_file.txt" w] 

set first_word "" 
set outputline1 "" 
set filtered "" 

while { [gets $inputfile line] != 1} { 
set first_word [lindex [split $line ""] 0] 
set re2 {^set_+?} 
#match any "set_ " command 
if { [regexp $re2 $first_word matched] } { 
    #if the "set_ " command is found and the outputline1 is not empty, then it's 
    # the end of the last set_ command 
    if {$outputline1 != ""} { 
    #do the splitting here and put into the outputfile later on 
    regexp -- {/b/{(.+)/}}/b} $outputline8 - filtered 
    puts "$filtered:$filtered" 
    set outputline1 "" 
    } 

    # grab content if part of set_false_path 
    if{ [regexp "set_false_path" $first_word] } { 
    # if it's the expected command set, put "command_set" flag on which will be used on 
    # the next elseif 
    set command_set 1 
    lappend outputline1 $line 
    regsub -all {\\\[} $outputline1 "\[" outputline2 
    regsub -all {\\\]} $outputline2 "\]" outputline3 
    regsub -all {\\\{} $outputline3 "\{" outputline4 
    regsub -all {\\\}} $outputline4 "\}" outputline5 
    regsub -all {\\\\} $outputline5 "\\" outputline6 
    regsub -all {\\ +} $outputline6 " " outputline7 
    regsub -all {\s+} $outputline7 " " outputline8 
    } else { 
    set command_set 0 
    # if the line isn't started with set_false_path but it's part of set_false_path command 
    } elseif {$command_set} { 
    lappend outputline1 $line 
    regsub -all {\\\[} $outputline1 "\[" outputline2 
    regsub -all {\\\]} $outputline2 "\]" outputline3 
    regsub -all {\\\{} $outputline3 "\{" outputline4 
    regsub -all {\\\}} $outputline4 "\}" outputline5 
    regsub -all {\\\\} $outputline5 "\\" outputline6 
    regsub -all {\\ +} $outputline6 " " outputline7 
    regsub -all {\s+} $outputline7 " " outputline8 
    } else { 
    } 
} 
} 

puts "outputline:outputline8" 
#do the splitting here and put into the file later on for the last grabbed line! 

close $inputfile 
close $outputfile

代碼深入討論：

我發現後，我重疊行到outputline1，我會得到意想不到的輸出與多個空格和正斜槓：set_false_path\ -from\ \[get_ports\ \{AAA/BBB\[1\] \ ...等。

此輸出包含用於每個特殊字符（如{，[，空格等）的退格（\）。因此，我將許多regsub刪除所有這些不必要的添加。並最終加入結果位於$ outputline8

的$ outputline8結果：
```
set_false_path -from [get_ports {AAAcc/BBB/CCC[1] BBB_1/CCC[1] CCC/DDD[1] DDD/EEE EEE/FFF[1] FFF/GGG[1]}] -through [get_pins {GGG/HHH[1] HHH/III[1] XXX/YYY[1] YYY/XXX[1] AAA/ZZZ[1]}] 
set_false_path -from [get_ports {AAA/DDD[2]}] 
```
我打算抓住和內{和}

outputline8

參考：process multiple lines text file to print in single line

這裏是最後的更新開始：

如果輸入文件：

set_false_path -from [get_ports {AAAcc/BBB/CCC[1] BBB_1/CCC[1] DDD/EEE}] -through [get_pins {XXX_1[1]}]

我想要的輸出文件：

AAAcc/BBB/CCC[1] 
BBB_1/CCC[1] 
DDD/EEE 
XXX_1[1]

謝謝！ 這裏是最新的更新結束：

注：我是新來的TCL和這個論壇，任何建議真的很感激！

來源

2014-02-17 Andi Lee

不應該在'{/ b /{(.+)/}}/ b}'中有反斜槓而不是正斜槓嗎？ '{\ b \ {（。+）\}} \ b}' – devnull

是的，devnull ..我很笨:(我試過{/b({(.+)/}}/b}但它沒有'將不起作用既不 –

我曾嘗試使用 '正則表達式 - {\ {\}（+）} $ outputline8 - filtered' 但我得到： 'AAAcc/BBB [1] BBB_1/CCC [1 ] CCC/DDD [1] DDD/EEE EEE/FFF [1] FFF/GGG [1]}]通過[get_pins {GGG/HHH [1] HHH/III [1] XXX/YYY [1] YYY/XXX [1] AAA/ZZZ [1]' 好像它會得到第一個「{」到最後的「}」但我想： 'AAAcc/BBB [1] BBB_1/CCC [1] CCC/DDD [1] DDD/EEE EEE/FFF [1] FFF/GGG [1] GGG/HHH [1] HHH/III [1] XXX/YYY [1] YYY/XXX [1] AAA/ZZZ [1] ' 謝謝！ –

請嘗試以下腳本。我在代碼註釋中添加了解釋：

set inputfile [open "input_file.txt" r] 
set outputfile [open "output_file.txt" w] 

# This is a temp variable to store the partial lines 
set buffer "" 

while { [gets $inputfile line] != -1} { 
    # Take previous line and add to current line 
    set buffer "$buffer[regsub -- {\\[[:blank:]]*$} $line ""]" 

    # If there is no ending \ then stop adding and process the elements to extract 
    if {![regexp -- {\\[[:blank:]]*$} $line]} { 
    # Skip line if not "set_false_path" 
    if {[lindex [split $buffer " "] 0] ne "set_false_path"} { 
     set buffer "" 
     continue 
    } 

    # Grab each element with regexp into a list and print each to outputfile 
    # m contains whole match, groups contains sub-matches 
    foreach {m groups} [regexp -all -inline -- {\{([^\}]+)\}} $buffer] { 
     foreach out [split $groups] { 
     puts $outputfile $out 
     } 
    } 

    # Clear the temp variable 
    set buffer "" 
    } 
} 

close $inputfile 
close $outputfile

來源

2014-02-17 08:40:51 Jerry

Hay Jerry，我收到錯誤消息：關閉引號後的多餘字符。順便說一句，我想我打開新的話題，因爲有一個新的輸入文件的修改。請幫助我討論新話題！ –

@AndiLee哦？你使用的是什麼Tcl版本？我認爲導致錯誤的部分是'「$ buffer [regsub - {\\ [[：blank：]] * $} $ line」「]」'。你可以試試'$ buffer [regsub - {\\ [[：blank：]] * $} $ line「」]'？在此期間我檢查了新的問題。 – Jerry

@AndiLee另外，我不認爲有必要提出另一個問題，除非輸入文件是完全不同的。但是，那麼這個問題會發生什麼？ – Jerry

在邊界內抓取單詞

回答

相關問題