Ruby的正則表達式匹配UNIX和Windows的文件路徑

下面的實例方法需要一個文件路徑和返回文件的前綴（部分分離前）：Ruby的正則表達式匹配UNIX和Windows的文件路徑

@separator = "@" 

def table_name path 
    regex = Regexp.new("\/[^\/]+#{@separator}") 
    path.match(regex)[0].gsub(/^.|.$/,'').downcase.to_sym 
end 

table_name "bla/bla/bla/[email protected]" 
# => :prefix

到目前爲止，此方法僅適用在Unix上。爲了使它在Windows上工作，我還需要捕獲反斜槓（\）。不幸的是，那個時候我卡住了：

@separator = "@" 

def table_name path 
    regex = Regexp.new("(\/|\\)[^\/\\]+#{@separator}") 
    path.match(regex)[0].gsub(/^.|.$/,'').downcase.to_sym 
end 

table_name("bla/bla/bla/[email protected]") 
# RegexpError: premature end of char-class: /(\/|\)[^\/\][email protected]/ 

# Target result: 
table_name("bla/bla/bla/[email protected]") 
# => :prefix 
table_name("bla\bla\bla\[email protected]") 
# => :prefix

我懷疑Ruby的字符串插值和逃避是什麼在這裏我混淆了。

如何更改正則表達式使其在Unix和Windows上都能正常工作？

來源

2011-07-03 Stefan Rohlfing

我認爲有一個Ruby常量處理了這個問題 - 它在Unix中是'/'，在Windows中是'\'。 –

我實際上並不知道bla/bla/bla/[email protected]指的是什麼;是bla/bla/bla/bla所有目錄，並且文件名是[email protected]？

了一個我理解正確的文件名中的假設，我建議使用File.split()：

irb> (path, name) = File.split("bla/bla/bla/[email protected]") 
=> ["bla/bla/bla", "[email protected]"] 
irb> (prefix, postfix) = name.split("@") 
=> ["Prefix", "invoice.csv"]

它不僅是平臺無關的，它更清晰了。

更新

你激起了我的好奇心：

>> wpath="blah\\blah\\blah\\[email protected]" 
=> "blah\\blah\\blah\\[email protected]" 
>> upath="bla/bla/bla/[email protected]" 
=> "bla/bla/bla/[email protected]" 
>> r=Regexp.new(".+[\\\\/]([^@]+)@(.+)") 
=> /.+[\\\/]([^@]+)@(.+)/ 
>> wpath.match(r) 
=> #<MatchData "blah\\blah\\blah\\[email protected]" 1:"Prefix" 2:"invoice.csv"> 
>> upath.match(r) 
=> #<MatchData "bla/bla/bla/[email protected]" 1:"Prefix" 2:"invoice.csv">

你是對的，\必須進行雙重轉義爲它在正則表達式的工作：第一次讓過去的解釋，再次通過正則表達式引擎。（肯定感覺彆扭）的正則表達式是：

.+[\\/]([^@]+)@(.+)

的字符串是：

".+[\\\\/]([^@]+)@(.+)"

正則表達式，這可能是真正的使用過脆（如何將它處理的路徑沒有/或\路徑分隔符或沒有@的路徑名或太多的@？），查找任意數量的字符，單個路徑分隔符，任何數量的非@，@和任意數量的任何字符。我假設第一.+會貪婪地佔用盡可能多的字符可能使比賽儘量到權地：

>> evil_path="/foo/[email protected]/blorp/[email protected]" 
=> "/foo/[email protected]/blorp/[email protected]" 
>> evil_path.match(r) 
=> #<MatchData "/foo/[email protected]/blorp/[email protected]" 1:"Prefix" 2:"invoice.csv">

但根據畸形的輸入數據，它可能會做的非常錯誤的事情。

來源

2011-07-03 09:23:08 sarnold

謝謝你這個優雅的解決方案！因爲我對Ruby的類庫還不夠熟悉，所有我能想到的都是使用正則表達式。不過，我仍然想知道如何讓我的方法奏效。我讀了Ruby中的雙重轉義，但我不確定這是否適用於此。 –

我必須知道，所以我去試了一下。 :)不錯，但我認爲它不會立即清晰。 – sarnold

我很高興你必須知道:-)即使使用正則表達式可能不是這裏最好的解決方案，但我通過試用代碼示例瞭解了很多關於如何在Ruby中處理正則表達式的知識。非常感謝你！在附註中，下面的答案還增加了一些信息：[Ruby正則表達式中的反斜槓和捕獲組]（http://stackoverflow.com/questions/5755505/backslash-captured-group-within-ruby-regular-表達）。 –

Ruby的正則表達式匹配UNIX和Windows的文件路徑

回答

相關問題