2013-12-12 38 views
2

我從一個沒有考慮過數據分析的人那裏繼承了一個項目。因此,我輸出的數據文件帶有多個分隔符,包括多種類型的大括號和不同程度的嵌入到分組數據中,逗號用大括號分隔數字。有些地方還有一些純文本句子可以很好地衡量。在R中解析多個分隔符和嵌入大括號

任何人都可以幫助一種簡單的方法來將嵌入式結構和劃定轉換爲數據框在R

下面是一個示例:

[(3, None, 1), (1, 0.36, 1), (3, None, 1), (2, 0.41, 1), (5, 0.47, 1), (6, 0.36, 1), (2, 0.45, 1), (2, 0.36, 1), (4, 0.39, 1), (6, 0.34, 1), (1, 0.47, 1), (7, 0.44, 1), (4, 0.39, 1), (6, 0.38, 1), (9, 0.39, 1), (5, 0.37, 1), (8, 0.41, 1), (9, 0.38, 1), (1, 0.44, 1), (9, 0.38, 1), (4, 0.36, 1), (8, 0.41, 1), (7, 0.38, 1), (7, 0.41, 1), (7, 0.36, 1), (7, 0.39, 1), (9, 0.41, 1), (5, 0.36, 1), (8, 0.31, 1), (6, 0.38, 1), (1, 0.44, 1), (3, None, 1), (5, 0.59, 1), (7, 0.52, 1), (7, 0.44, 1), (7, 0.38, 1), (8, 0.34, 1), (9, 0.39, 1), (3, None, 1), (7, 0.44, 1), (7, 0.53, 1), (8, 0.36, 1), (3, 0.36, 0), (8, 0.34, 1), (5, 0.38, 1), (3, None, 1), (5, 0.52, 1), (3, None, 1), (9, 0.55, 1), (9, 0.36, 1), (4, 0.38, 1), (2, 0.73, 1), (9, 0.36, 1), (7, 0.44, 1), (4, 0.45, 1), (4, 0.62, 1), (9, 0.39, 1), (3, 0.31, 0), (1, 0.42, 1), (4, 0.34, 1), (5, 0.53, 1), (8, 0.34, 1), (3, None, 1), (8, 0.47, 1), (6, 0.39, 1), (1, 0.42, 1), (5, 0.53, 1), (1, 0.53, 1), (8, 0.62, 1), (1, 0.39, 1), (8, 0.44, 1), (8, 0.45, 1), (9, 0.38, 1), (1, 0.36, 1), (4, 0.38, 1), (6, 0.36, 1), (7, 0.36, 1), (9, 0.39, 1), (8, 0.41, 1), (8, 0.31, 1), (3, None, 1), (2, 0.36, 1), (4, 0.36, 1), (2, 0.31, 1), (9, 0.36, 1), (1, 0.31, 1), (4, 0.34, 1), (1, 0.56, 1), (7, 0.61, 1), (9, 0.38, 1), (3, None, 1), (1, 0.36, 1), (1, 0.53, 1), (5, 0.33, 1), (3, None, 1), (1, 0.39, 1), (6, 0.34, 1), (9, 0.33, 1), (4, 0.38, 1), (3, None, 1), (5, 0.44, 1), (2, 0.52, 1), (1, 0.42, 1), (6, 0.38, 1), (9, 0.33, 1), (4, 0.38, 1), (5, 0.31, 1), (6, 0.31, 1), (8, 0.31, 1), (2, 0.33, 1), (9, 0.33, 1), (1, 0.56, 1), (6, 0.38, 1), (3, None, 1), (7, 0.34, 1), (5, 0.34, 1), (2, 0.36, 1), (2, 0.47, 1), (3, None, 1), (2, 0.39, 1), (2, 0.36, 1), (6, 0.31, 1), (1, 0.53, 1), (5, 0.45, 1), (7, 0.42, 1), (5, 0.45, 1), (2, 0.39, 1), (2, 0.45, 1), (6, 0.36, 1), (2, 0.45, 1), (1, 0.39, 1), (1, 0.34, 1), (4, 0.39, 1), (2, 0.34, 1), (2, 0.31, 1), (3, 0.31, 0), (8, 0.39, 1), (6, 0.34, 1), (6, 0.31, 1), (5, 0.38, 1), (9, 0.34, 1), (7, 0.31, 1), (1, 0.33, 1), (4, 0.38, 1), (6, 0.38, 1), (5, 0.38, 1), (9, 0.38, 1), (2, 0.5, 1), (8, 0.44, 1), (8, 0.39, 1), (4, 0.38, 1), (5, 0.5, 1), (9, 0.48, 1), (2, 0.59, 1), (8, 0.41, 1), (7, 0.41, 1), (3, None, 1), (4, 0.5, 1), (4, 0.36, 1), (7, 0.38, 1), (5, 0.44, 1), (6, 0.34, 1), (6, 0.41, 1), (3, None, 1), (7, 0.39, 1), (6, 0.34, 1), (2, 0.34, 1), (9, 0.36, 1), (4, 0.36, 1), (5, 0.38, 1), (3, None, 1), (6, 0.36, 1), (5, 0.33, 1), (4, 0.44, 1), (7, 0.34, 1), (8, 0.48, 1), (6, 0.34, 1), (8, 0.38, 1), (3, None, 1), (4, 0.31, 1), (3, 0.31, 0)] 
Percentage of correctly suppressed responses per five-target section: 
[80, 80, 100, 80] 
Average reaction time per five-target section: 
[0.4, 0.43, 0.39, 0.39] 
Percentage of correctly suppressed responses per ten-target section: 
[80, 90] 
Average reaction time per ten-target section: 
[0.41, 0.39] 

在理想情況下的第一行就會變成一個3列數據幀,第二行忽略,第三線4整數向量等

回答

2

readLines使用至讓您的數據,然後gsubstrsplit給它的所有排序方式:

#txt <- readLines(textConnection("<insert your text here>")) 
#or probably more appropriately 
txt <- readLines("filename.txt") 

# remove labels 
txt <- txt[-c(2,4,6,8)] 

# remove first [ character 
txt <- lapply(txt,function(x) substr(x,2,nchar(x)-1)) 

# reformat element 1 
txt[[1]] <- gsub("[()]","",txt[[1]]) 
txt[[1]] <- gsub("None","0",txt[[1]]) 
txt[[1]] <- as.numeric(unlist(strsplit(txt[[1]],","))) 
txt[[1]] <- data.frame(matrix(txt[[1]],ncol=3,byrow=TRUE)) 

# reformat elements 2-5 
txt[2:5] <- lapply(txt[2:5],function(x) as.numeric(unlist(strsplit(x,",")))) 

結果:

txt 

#[[1]] 
# X1 X2 X3 
#1 3 0.00 1 
#2 1 0.36 1 
#3 3 0.00 1 
#4 2 0.41 1 
#5 5 0.47 1 
#6 6 0.36 1 
# etc... etc... 
# 
#[[2]] 
#[1] 80 80 100 80 
# 
#[[3]] 
#[1] 0.40 0.43 0.39 0.39 
# 
#[[4]] 
#[1] 80 90 
# 
#[[5]] 
#[1] 0.41 0.39 
+0

這很完美,謝謝!未來任何人的小編輯 - 如果從文件名讀取文本,應該是: 'txt < - readLines(「filename.txt」)#ie刪除textConnection() – jzadra