2014-04-05 87 views
0

我有一個結構類似這樣的文件:讀取標準輸入爲R

123 

Jhon: NewYork, Boston, gainesville 

Mike: LosAngeles 

Almudena: Baltimore, SanDiego, Austin, Memphis 

Anna: Washington, Oklahoma, Nashville, Denver, Phenix, Tucson 

... 

依此類推,直到123名和高達每人50個城市。我想將文件讀入R中一個可用的表格,例如,具有123行和51列(名稱+最多50個城市)的表格。理想的情況是,桌子在沒有城市的情況下有空白空間(例如,與僅在美國的兩個城市中的人相對應的行將具有48個空白空間)。

另一個,更有用的選項也將是一個兩列表(或矩陣),兩列的形式爲

Name City 
Jhon NewYork 
Jhon Boston 
Jhon gainesville 
Mike LosAngeles 
... 
+0

BTW:與* stdin *無關的恕我直言。爲什麼你將它標記爲* stdin *? – sgibb

+0

嗨,因爲有人告訴我,「在這裏你必須在標準輸入格式的數據」,這是所有:) – Javier

回答

1

我不太確定是否有可用的函數。但是爲這個文件編寫導入程序並不難:

ll <- readLines("input.txt") 

## keep only lines with "name: cities" 
ll <- ll[grep(":", ll)] 

## split at ":" to divide in name and cities 
s <- strsplit(ll, ":") 

## split by "," to divide cities 
s <- lapply(s, function(x) { 
    return(cbind(x[1], strsplit(x[2], ",")[[1]])) 
}) 

## bind list of matrices to one matrix 
m <- do.call(rbind, s) 

## remove whitespace in front of the cities 
m[, 2] <- gsub("^\\s+", "", m[, 2]) 
m 

#  [,1]  [,2] 
# [1,] "Jhon"  "NewYork" 
# [2,] "Jhon"  "Boston" 
# [3,] "Jhon"  "gainesville" 
# [4,] "Mike"  "LosAngeles" 
# [5,] "Almudena" "Baltimore" 
# [6,] "Almudena" "SanDiego" 
# [7,] "Almudena" "Austin" 
# [8,] "Almudena" "Memphis" 
# [9,] "Anna"  "Washington" 
#[10,] "Anna"  "Oklahoma" 
#[11,] "Anna"  "Nashville" 
#[12,] "Anna"  "Denver" 
#[13,] "Anna"  "Phenix" 
#[14,] "Anna"  "Tucson" 
+0

非常感謝,非常完美 – Javier