2017-09-25 163 views
2

我有一個數據幀,我從.csv文件讀取,看起來像這樣:數據幀嵌套列表

   job name `phone number` 
      <chr> <chr>   <int> 
1  developer john   654 
2  developer mike   321 
3  developer albert   987 
4  manager dana   741 
5  manager  guy   852 
6  manager anna   936 
7  developer  dan   951 
8  developer shean   841 
9 administrative rebeca   357 
10 administrative krissy   984 
11 administrative hilma   651 
12 administrative otis   325 
13 administrative piper   654 
14  manager mendy   984 
15  manager corliss   321 

DT = structure(list(job = c("developer", "developer", "developer", 
"manager", "manager", "manager", "developer", "developer", "administrative", 
"administrative", "administrative", "administrative", "administrative", 
"manager", "manager"), name = c("john", "mike", "albert", "dana", 
"guy", "anna", "dan", "shean", "rebeca", "krissy", "hilma", "otis", 
"piper", "mendy", "corliss"), phone = c(654L, 321L, 987L, 741L, 
852L, 936L, 951L, 841L, 357L, 984L, 651L, 325L, 654L, 984L, 321L 
)), .Names = c("job", "name", "phone"), row.names = c(NA, -15L 
), class = "data.frame") 

我想把它改造成列表,列表,其中,例如:

myList$developer 

會給我一個包含所有開發者的列表,然後

myList$developer$john 

會給我相關電話號碼的列表名爲John的開發人員。有沒有簡單的方法來做到這一點?

如果你很好奇,爲什麼我願意做這樣的事情:我的工作中的實際數據幀是巨大的,所以找到由4個參數中的特定條目(在這個例子中,我可以找到一個具有2個參數的特定條目:作業,名稱)使用過濾器需要太多時間。我認爲嵌套列表的哈希表結構可能需要花費很多時間來構建,但是可以在O(1)中進行搜索,這對我來說確實很有用。 如果我錯了,你必須做的更好的方式我很樂意聽到它。

+0

'lapply(分割(DF,DF $工作),功能(X)分裂(X $ PHONE_NUMBER,X $名))'會做到這一點。 –

+0

@AndrewGustar和我的答案一樣;-) – Jaap

回答

2

您可以使用雙splitlapplydrop = TRUE-參數。使用drop = TRUE會降低不會發生的級別,從而阻止創建空列表元素。

使用:

l <- split(dat, dat$job, drop = TRUE) 
nestedlist <- lapply(l, function(x) split(x, x[['name']], drop = TRUE)) 

或者一氣呵成:

nestedlist <- lapply(split(dat, dat$job, drop = TRUE), 
        function(x) split(x, x[['name']], drop = TRUE)) 

給出:

> nestedlist 
$administrative 
$administrative$hilma 
       job name phonenumber 
11 administrative hilma   651 

$administrative$krissy 
       job name phonenumber 
10 administrative krissy   984 

$administrative$otis 
       job name phonenumber 
12 administrative otis   325 

$administrative$piper 
       job name phonenumber 
13 administrative piper   654 

$administrative$rebeca 
      job name phonenumber 
9 administrative rebeca   357 


$developer 
$developer$albert 
     job name phonenumber 
3 developer albert   987 

$developer$dan 
     job name phonenumber 
7 developer dan   951 

$developer$john 
     job name phonenumber 
1 developer john   654 

$developer$mike 
     job name phonenumber 
2 developer mike   321 

$developer$shean 
     job name phonenumber 
8 developer shean   841 


$manager 
$manager$anna 
     job name phonenumber 
6 manager anna   936 

$manager$corliss 
     job name phonenumber 
15 manager corliss   321 

$manager$dana 
     job name phonenumber 
4 manager dana   741 

$manager$guy 
     job name phonenumber 
5 manager guy   852 

$manager$mendy 
     job name phonenumber 
14 manager mendy   984 

所使用的數據:

dat <- structure(list(job = c("developer", "developer", "developer", "manager", "manager", "manager", "developer", "developer", "administrative", "administrative", "administrative", "administrative", "administrative", "manager", "manager"), 
         name = c("john", "mike", "albert", "dana", "guy", "anna", "dan", "shean", "rebeca", "krissy", "hilma", "otis", "piper", "mendy", "corliss"), 
         phonenumber = c(654L, 321L, 987L, 741L, 852L, 936L, 951L, 841L, 357L, 984L, 651L, 325L, 654L, 984L, 321L)), 
       .Names = c("job", "name", "phonenumber"), class = "data.frame", row.names = c("1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15")) 
5

我的工作中的實際數據幀是巨大的,所以找到由4個參數中的特定條目(在這個例子中,我能找到兩個參數的特定條目:作業時,名稱)使用過濾器會花費太多時間。我認爲,一個嵌套列表的哈希表結構可能需要大量的時間來建立,但會在O(1),這無疑對我的作品進行搜索。如果我錯了,而且你有更好的辦法,我也很樂意聽到它。

顯然名稱查找behaves like O(n), not O(1)

一個可能更好的辦法是使用data.table,它使用二進制搜索。

library(data.table) 
setDT(DT, key = c("job", "name")) 

get_phones = function(..., d = DT) d[list(...), phone] 

示例用法

get_phones("developer", "john") 
# [1] 654 

get_phones("administrative") 
# [1] 651 984 325 654 357 

參見vignette("datatable-keys-fast-subset")或(可能過時)copy online

+1

不是標記爲解決方案,因爲它不是標題問題的解決方案,但它更適合我的需求。非常感謝! – shayelk