2017-06-13 101 views
0

考慮以下tree計算每個文件夾在複雜文件夾結構中有多少個文件夾?

library(data.tree) 

acme <- Node$new("Acme Inc.") 
    accounting <- acme$AddChild("Accounting") 
     software <- accounting$AddChild("New Software") 
     standards <- accounting$AddChild("New Accounting Standards") 
    research <- acme$AddChild("Research") 
     newProductLine <- research$AddChild("New Product Line") 
     newLabs <- research$AddChild("New Labs") 
    it <- acme$AddChild("IT") 
     outsource <- it$AddChild("Outsource") 
     agile <- it$AddChild("Go agile") 
     goToR <- it$AddChild("Switch to R") 

我再要計算averageBranchingFactor

averageBranchingFactor(acme) 

這就產生2.5

但是,由於種種原因,我希望能夠得到所有分枝因子,不僅是平均分枝因子。例如,我需要這樣做來統計比較兩個文件結構,以考慮平均分支因素的顯着差異。

根據manual對於data.treeAverageBranchingFactor()函數執行以下操作:「計算每個非葉具有的分支的平均數量」。因此,我第一次嘗試以下操作:

acme.df <- ToDataFrameTree(acme, "averageBranchingFactor") 
mean(acme.df$averageBranchingFactor[acme.df$averageBranchingFactor>0]) 

這就產生2.375,然後引導我去嘗試一個簡單的版本:

mean(acme.df$averageBranchingFactor) 

這就產生0.8636364

如何在所有到達個別分支因素的平均值爲2.5

理想情況下,我想創建一個data.frame,列出每個文件夾,其中包含爲每個文件夾列出分支因子的變量。例如,我有這個非常簡單的文件夾結構:

top_level_folder 
    sub_folder_1 
    sub_folder_2 
     sub_folder_3 

回答這個問題會涉及創建輸出看起來像這樣:

Folders    Subfolders (BranchingFactor) 
top_level_folder 2 
sub_folder_1  0 
sub_folder_2  1 
sub_folder_3  0 

能夠容易地生成第一列通過調用list.dirs("/Users/username/Downloads/top_level/"),但我不知道如何生成第二列。請注意,第二列是非遞歸的,這意味着子文件夾內的文件夾不計算在內(即top_level_folder僅包含2個子文件夾,即使sub_folder_2包含另一個文件夾sub_folder_2)。

如果您想了解您的解決方案是否可縮放,請下載Rails代碼庫:https://github.com/rails/rails/archive/master.zip並嘗試使用Rails更復雜的文件結構。

回答

1

你可以在每個級別沿着文件夾結構簡單循環和計數文件夾的nunber(不含遞歸性):

dir.create("top_level_folder/sub_folder_2/sub_folder_3", recursive = TRUE) 
dir.create("top_level_folder/sub_folder_1") 


dirs <- list.dirs() 
branching_factor <- vector(length = length(dirs)) 
for (i in 1:length(dirs)) { 
    branching_factor[i] <- length(list.dirs(path = dirs[i], 
              full.names = FALSE, recursive = FALSE)) 
} 

result <- data.frame(Folders = basename(dirs), BranchingFactor = branching_factor) 
result[-1,] 

你也可以使用此代碼的短,更idomatic和矢量化版本:

dirs <- list.dirs() 
branching_factor <- sapply(dirs, function(x) length(list.dirs(x, FALSE, FALSE))) 
result2 <- data.frame(Folders = basename(dirs), BranchingFactor = branching_factor, 
         row.names = NULL)[-1,] 

結果看起來像這樣:

> head(result2[rev(order(result2[,2])),]) 
      Folders BranchingFactor 
208  fixtures    24 
122  fixtures    23 
42  fixtures    18 
440  core_ext    17 
340 active_record    17 
562   rails    16 
+0

將您的代碼應用於[https://github.com/rails/rails/archive/master.zip](https://github.com/rails/rails/archive/master.zip),'result'是不正確的 – parth

+0

原因是:'長度(dir(path = dirs [i]))'也計數'.yml'和'.md'文件 – parth

+0

你是對的謝謝你!查看編輯後的版本。看起來,前面的代碼(在循環中使用'dir'而不是'list.dirs'來計算所有文件和目錄。 – Gilles

0

我塔基ng遞歸地列出所有文件夾,然後製作一個文件夾子文件夾對的表格,從這些我可以按文件夾計算子文件夾的數量。

雖然我錯過了空文件夾,所以我用左連接重新初始化這個文件夾,然後用零填充NA。

path <- getwd() 
all_folders <- path %>% list.dirs(full.names=TRUE,recursive=TRUE) %>% 

data.frame(stringsAsFactors=FALSE) %>% setNames("Folders") 
all_sub_folders <- all_folders$Folders %>% 
    strsplit("/") %>% 
    lapply(function(x){c(x[length(x)-1],x[length(x)])}) %>% 
    do.call(rbind,.) %>% 
    as.data.frame(stringsAsFactors=FALSE) %>% 
    setNames(c("ParentFolders","Folders")) 
output <- all_sub_folders$ParentFolders %>% table %>% as.data.frame(stringsAsFactors=FALSE) %>% setNames(c("Folders","SubFolders"))) 
output <- merge(all_sub_folders,output,all.x = TRUE)[,c("Folders","SubFolders")] 
output$SubFolders[is.na(output$SubFolders)] <- 0 
output <- output[match(all_sub_folders$Folders,output$Folders),] 

head(output) 
#  Folders SubFolders 
# 2160 Rhome  126 
# 17 acepack   5 
# 856  help   1 
# 992  html   9 
# 1486 libs  124 
# 1130 i386   0 
1

只是修正@Gilles解決方案,

path <- "SO/rails-master/" 
dirs <- list.dirs(path) 
branching_factor <- vector(length = length(dirs)) 
for (i in 1:length(dirs)) { 
    branching_factor[i] <- length(list.dirs(path = dirs[i], recursive = FALSE)) 
} 

result <- data.frame(Folders = basename(dirs), BranchingFactor = branching_factor) 

> head(result) 
     Folders BranchingFactor 
1 rails-master    14 
2  .github    0 
3 actioncable    4 
4   app    1 
5  assets    1 
6 javascripts    1 

希望這有助於。

+0

你正在糾正解決方案? – histelheim

+0

@histelheim,他現在正確地更新了他的解決方案 – parth

0

您可以在your other question適應my answer,與recursive = FALSElist.dirslist.files

library(purrr) 

files <- .libPaths()[1] %>% # omit for current directory or supply alternate path 
    list.dirs() %>% 
    map_df(~list(path = .x, 
       dirs = length(list.dirs(.x, recursive = FALSE)))) 

files 
#> # A tibble: 4,457 x 2 
#>                   path dirs 
#>                   <chr> <int> 
#> 1    /Library/Frameworks/R.framework/Versions/3.4/Resources/library 314 
#> 2  /Library/Frameworks/R.framework/Versions/3.4/Resources/library/abind  4 
#> 3 /Library/Frameworks/R.framework/Versions/3.4/Resources/library/abind/help  0 
#> 4 /Library/Frameworks/R.framework/Versions/3.4/Resources/library/abind/html  0 
#> 5 /Library/Frameworks/R.framework/Versions/3.4/Resources/library/abind/Meta  0 
#> 6  /Library/Frameworks/R.framework/Versions/3.4/Resources/library/abind/R  0 
#> 7  /Library/Frameworks/R.framework/Versions/3.4/Resources/library/acepack  5 
#> 8 /Library/Frameworks/R.framework/Versions/3.4/Resources/library/acepack/help  0 
#> 9 /Library/Frameworks/R.framework/Versions/3.4/Resources/library/acepack/html  0 
#> 10 /Library/Frameworks/R.framework/Versions/3.4/Resources/library/acepack/libs  1 
#> # ... with 4,447 more rows 

mean(files$dirs[files$dirs != 0]) 
#> [1] 2.952949 

或基礎R,

files <- do.call(rbind, lapply(list.dirs(.libPaths()[1]), function(path){ 
    data.frame(path = path, 
       dirs = length(list.dirs(path, recursive = FALSE)), 
       stringsAsFactors = FALSE) 
})) 

head(files) 
#>                  path dirs 
#> 1   /Library/Frameworks/R.framework/Versions/3.4/Resources/library 314 
#> 2  /Library/Frameworks/R.framework/Versions/3.4/Resources/library/abind 4 
#> 3 /Library/Frameworks/R.framework/Versions/3.4/Resources/library/abind/help 0 
#> 4 /Library/Frameworks/R.framework/Versions/3.4/Resources/library/abind/html 0 
#> 5 /Library/Frameworks/R.framework/Versions/3.4/Resources/library/abind/Meta 0 
#> 6 /Library/Frameworks/R.framework/Versions/3.4/Resources/library/abind/R 0 

mean(files$dirs[files$dirs != 0]) 
#> [1] 2.952949 
0

的averageBranchingFactor排除葉子。 注意事項:你可以直接使用data(acme)

library(data.tree) 
data(acme) 
acme$averageBranchingFactor 
acme$count 
print(acme, abf = "averageBranchingFactor", "count") 

這將表明這樣的:

      levelName abf count 
1 Acme Inc.      2.5  3 
2 ¦--Accounting     2.0  2 
3 ¦ ¦--New Software    0.0  0 
4 ¦ °--New Accounting Standards 0.0  0 
5 ¦--Research      2.0  2 
6 ¦ ¦--New Product Line   0.0  0 
7 ¦ °--New Labs     0.0  0 
8 °--IT       3.0  3 
9  ¦--Outsource    0.0  0 
10  ¦--Go agile     0.0  0 
11  °--Switch to R    0.0  0 

?averageBranchingFactor實現不承擔任何祕密,所以你可以把它調整到您的需要。只需輸入averageBranchingFactor到您的控制檯(不含括號):

function (node) 
{ 
    t <- Traverse(node, filterFun = isNotLeaf) 
    if (length(t) == 0) 
     return(0) 
    cnt <- Get(t, "count") 
    if (!is.numeric(cnt)) 
     browser() 
    return(mean(cnt)) 
} 

總之,我們遍歷樹(除葉),並得到每個節點的count值。最後,我們計算平均值。

希望有所幫助。

相關問題