在Ruby中自定義序列化和解析方法

我開發了這個class Directory，一些模擬使用哈希的目錄。我很難弄清楚如何執行serialize和parse方法。從serialize方法返回的string應該是這個樣子：在Ruby中自定義序列化和解析方法

2:README:19:string:Hello world!spec.rb:20:string:describe RBFS1:rbfs:4:0:0:

我們解釋究竟是什麼這意味着。這是主目錄，前面的2表示文件的數量，比我們的文件名爲README，之後文件的內容的長度爲19，用從另一個類的parse方法獲得的字符串表示在模塊中。在第二個文件之後，還要注意這兩個文件之間沒有被:分開，因爲在知道字符串長度的時候我們並不需要它。所以看起來好一點：

<file count><file1_data><file2_data>1:rbfs:4:0:0:，這裏<file1_data>，包括名稱，長度和內容部分。

現在1:rbfs:4:0:0:意味着我們有一個子目錄名稱爲rbfs，4代表它的長度是內容爲一個字符串，0:0:表示它是空的，沒有文件，沒有子目錄。下面是另一個例子：

0:1:directory1:40:0:1:directory2:22:1:README:9:number:420:這相當於：

. 
`-- directory1 
    `-- directory2 
     `-- README

我有一個文件的部分沒有問題，我知道如何讓目錄的數量和他們的名字，但另一部分我不知道該怎麼做。我知道recursion是最好的答案，但我不知道該遞歸的底部應該是什麼，以及如何實現它。另外解決這個問題將大大有助於解決如何通過逆向工程來完成parse方法。

的代碼如下：

module RBFS 
class File 
    ... # here I have working `serialize` and `parse` methods for `File` 
end 

class Directory 
attr_accessor :content 
def initialize 
    @content = {} 
end 

def add_file (name,file) 
    @content[name]=file 
end 

def add_directory(name, subdirectory = nil) 
    if subdirectory 
    @content[name] = subdirectory 
    else 
    @content[name] = RBFS::Directory.new 
    end 
end 

def serialize 
    ...? 
end 

def self.parse (string) 
    ...? 
end 
end 

end

PS：我檢查樣值與所述is_a?方法的哈希。

爲@Jordan又如：

2:file1:17:string:Test test?file2:10:number:4322:direc1:34:0:1:dir2:22:1:README:9:number:420:direc2::1:README2:9:number:33:0

...應該是這樣的結構（如果我制定了它的權利）：

. ->file1,file2 
`-- direc1,.....................................direc2 -> README2 
    `-- dir2(subdirectory of direc1) -> README

direc1只包含一個目錄，沒有文件，而direc2只包含一個文件。你可以看到主目錄沒有指定它的字符串長度，而其他所有的目錄都是這樣。

來源

2014-10-31 ama

您可以舉一個稍微複雜的目錄結構的序列化例子，例如多個非空兄弟目錄？ – 2014-10-31 22:27:41

好吧，讓我們通過這方面的工作重複，從你的例子：

str = "2:README:19:string:Hello world!spec.rb:20:string:describe RBFS1:rbfs:4:0:0:" 
entries = {} # No entries yet!

我們需要知道的第一件事情是多少文件有，而且我們知道，我們知道這是前第一數:：

num_entries, rest = str.split(':', 2) 
num_entries = Integer(num_entries) 
# num_entries is now 2 
# rest is now "README:19:string:Hello world!spec.rb:20:string:describe RBFS1:rbfs:4:0:0:"

的第二個參數split說：「我只想要2個，」所以第一:後停止分裂）我們用Integer(n)而不是n.to_i灣因爲它更嚴格。（to_i將轉換"10xyz"到10; Integer會產生一個錯誤，這是我們想在這裏。）

現在我們知道我們有兩個文件。我們不知道別的還，但還剩下些什麼我們的字符串是這樣的：

README:19:string:Hello world!spec.rb:20:string:describe RBFS1:rbfs:4:0:0:

我們可以得到接下來的事情是名稱，第一個文件的長度。

name, len, rest = rest.split(':', 3) 
len = Integer(len.to_i) 
# name = "README" 
# len = 19 
# rest = "string:Hello world!spec.rb:20:string:describe RBFS1:rbfs:4:0:0:"

酷，現在我們的名字，第一個文件的長度，所以我們可以得到它的內容：

content = rest.slice!(0, len) 
# content = "string:Hello world!" 
# rest = "spec.rb:20:string:describe RBFS1:rbfs:4:0:0:" 
entries[name] = content 
# entries = { "README" => "string:Hello world!" }

我們使用rest.slice!其修改從字符串前去除len字符和返回它們，所以content就是我們想要的（string:Hello world!）而rest就是它之後的所有東西。然後我們將其添加到entries哈希。一個文件放下，一個去！

對於第二個文件，我們做同樣的事情：

name, len, rest = rest.split(':', 3) 
len = Integer(len) 
# name = "spec.rb" 
# len = 20 
# rest = "string:describe RBFS1:rbfs:4:0:0:" 

content = rest.slice!(0, len) 
# content = "string:describe RBFS" 
# rest = "1:rbfs:4:0:0:" 
entries[name] = content 
# entries = { "README" => "string:Hello world!", 
#    "spec.rb" => "string:describe RBFS" }

因爲我們做同樣的事情兩次，顯然我們應該在一個循環做到這一點！但在寫這些之前，我們需要組織起來。到目前爲止，我們有兩個不連續的步驟：首先，獲取文件的數量。其次，獲取這些文件的內容。我們也知道我們需要獲取目錄和目錄的數量。我們將在如何將看看猜一猜：

def parse(serialized) 
    files, rest = parse_files(serialized) 
    # `files` will be a Hash of file names and their contents and `rest` will be 
    # the part of the string we haven't serialized yet 
    directories, rest = parse_directories(rest) 
    # `directories` will be a Hash of directory names and their contents 

    files.merge(directories) 
end 

def parse_files(serialized) 
    # Get the number of files from the beginning of the string 
    num_entries, rest = str.split(':', 2) 
    num_entries = Integer(num_entries) 
    entries = {} 

    # `rest` now starts with the first file (e.g. "README:19:...") 
    num_entries.times do 
    name, len, rest = rest.split(':', 3) # get the file name and length 
    len = Integer(len) 

    content = rest.slice!(0, len) # get the file contents from the beginning of the string 
    entries[name] = content # add it to the hash 
    end 

    [ entries, rest ] 
end 

def parse_directories(serialized) 
    # TBD... 
end

這parse_files方法是有點長對我的口味，雖然如此，我們怎麼辦分裂它？

def parse_files(serialized) 
    # Get the number of files from the beginning of the string 
    num_entries, rest = str.split(':', 2) 
    num_entries = Integer(num_entries) 
    entries = {} 

    # `rest` now starts with the first file (e.g. "README:19:...") 
    num_entries.times do 
    name, content, rest = parse_file(rest) 
    entries[name] = content # add it to the hash 
    end 

    [ entries, rest ] 
end 

def parse_file(serialized) 
    name, len, rest = serialized.split(':', 3) # get the name and length of the file 
    len = Integer(len) 

    content = rest.slice!(0, len) # use the length to get its contents 
    [ name, content, rest ] 
end

乾淨！

現在，我打算給你一個大搗蛋者：由於序列化格式設計合理，我們實際上並不需要parse_directories方法，因爲它的做法與parse_files完全相同。該唯一不同的是，此行之後：

name, content, rest = parse_file(rest)

...我們想，如果我們解析目錄，而不是文件做不同的事情。特別是，我們打電話給parse(content)，這將在目錄的內容上完成所有這些。既然它現在正在推行雙重任務，我們可能應該將其名稱更改爲更通用的名稱，例如parse_entries，並且我們還需要提供另一個參數來告訴它何時執行該遞歸。

與其在此處發佈更多代碼，我已發佈我的「已完成」產品over in this Gist。

現在，我知道這對serialize部分沒有幫助，但希望它能幫助您入門。 serialize是比較容易的部分，因爲關於在遞歸迭代哈希上有很多問題和答案。

來源

2014-10-31 23:14:43

謝謝，你真棒，快速的問題，但這確實是做什麼'[entries，rest]'和'[name，content，rest]' – ama 2014-10-31 23:39:07

這些只是數組。真正的技巧是Ruby的解構賦值：當你執行'var1，var2 = some_array'時，它將'some_array'中的第一項賦給'var1'，第二項賦給'var2'。如果不是'some_array'，而是有一個返回數組的方法，比如'parse_entries'，它會做同樣的事情。換句話說，這是一種從方法中返回多個值的方法。你應該謹慎地做 - 主要是因爲它不明顯，很難記錄 - 但在某些情況下，它可能非常有用。 – 2014-11-01 00:06:45

在Ruby中自定義序列化和解析方法

回答

相關問題