閱讀R中的複雜數據集

我的數據集看起來如下所示。第一個數字是功能號碼，然後是冒號，然後是與該特定功能關聯的值。我不確定如何在R中導入這個數據集。任何人有任何想法？閱讀R中的複雜數據集

236：24 500：163 732：234 869 117 885：106 1249：103 1280：158 1889：119 2015：55 2718：126 3307：137 3578：25 3770：26 4139：128 4723 ：114 4957：82 5128：50 5420：124 5603：135 5897：34 5946：117 6069：154 6153：55 6347：87 6372：77 6666：109 6866：223 6984：39 7709：253 7950：87 8078：38 8945：141 9316：111 9948：103 9989：68 10276：43 10530：76 10532：55 10799：15 10802：20 10848：82 11347：16 11871：51 11883：105 12534：133 12601：13 12781：178 12798： 116 12842：106 12916：7 12935：51 12968：154 13028：58 13330：105 13384：2 13568：47 13641：632 13829：18 13964：62 14385：93 14392：272 15280：140 15424：119 15492：52 15523 ：31 16311：23 16464：69 16478：94 16584：102 16586：107 16705：272 17138：108 17181：150 17526：280 17540：163 18007：114 18050：53 18180：2 18806：160 18943：73 19055：41 19255：88 19774：59 19889：72 19921：45 101：68 572：57 732：63 962：120 1304：61 1831：60 1889：58 1973：105 2518：161 2629：228 2990：158 3147：75 3578：11 3860：88 4011：18 4623：141 4684 ：411 4758：69 4820：120 6149：102 6234：134 6306：118 6866：147 6927：89 6988：51 7048：178 7193：31 7257：61 7709：229 8061：125 8202：188 8272：17 8759 165 9104：77 9325 135 9860 97 10055：684 10532：180 10735：64 10744：267 10820：120 10848：186 10923：128 10936：129 11203：160 11303：144 11668：87 11867：97 11871：207 12191： 83 12238：193 12380：51 12968：164 13369：58 13929：39 14531：102 14800：130 14931 99 15314 91 15632 62 16165：7 16353：120 16584：137 17216：172 18372：31 18893：75 19133 ：93 19154：101 19165：133 19607：20 19784：141 19889 97 19921：60

來源

2017-09-04 Janr

整個數據集是在一條線上嗎？也許讓我更困擾的是你計劃使用哪種機器學習方法，它可以處理〜10K的特性。 –

你本質上有一個由'：'分開的2列數據集 - 例如：'x < - 「236：24 500：163 732：234 869：117」'然後'read.table（text = scan（text = x， what =「」），sep =「：」）'工作。 – thelatemail

假設您的數據被存儲在input.txt，

input <- scan('input.txt', what = 'character') 

data <- as.data.frame(matrix(as.numeric(unlist(strsplit(input, ':'))), ncol = 2)) 
colnames(data) <- c('Feature', 'Value') 
str(data) 
# 'data.frame': 158 obs. of 2 variables: 
# $ Feature: num 236 24 500 163 732 234 869 117 885 106 ... 
# $ Value : num 18943 73 19055 41 19255 ...

備選地，可以使用函數read.table解析輸入而不是手動分割這是稍微較慢但更可讀的字符串。

data <- read.table(text = input, sep = ':') 
colnames(data) <- c('Feature', 'Value') 
str(data) 
# 'data.frame': 158 obs. of 2 variables: 
# $ Feature: num 236 24 500 163 732 234 869 117 885 106 ... 
# $ Value : num 18943 73 19055 41 19255 ...

編輯：適合你的數據集。將您的功能/值對讀入數據框。

url <- 'https://archive.ics.uci.edu/ml/machine-learning-databases/dexter/DEXTER/dexter_test.data' 
input <- scan(url, what = 'character') 
data <- as.data.frame(matrix(as.numeric(unlist(strsplit(input, ':'))), ncol = 2)) 
colnames(data) <- c('Feature','Value') 
str(data) 
# 'data.frame': 192449 obs. of 2 variables: 
# $ Feature: num 236 24 500 163 732 234 869 117 885 106 ... 
# $ Value : num 79 10848 105 11018 76 ...

來源

2017-09-04 02:09:28

我應該更具體。我正在使用鏈接中提供的德克斯特數據集：https：//archive.ics.uci.edu/ml/machine-learning-databases/dexter/DEXTER/。冒號前的數字表示該功能，冒號後的數字是與該功能相關的值。 – Janr

看我的編輯應該增加一些清晰度 –

閱讀R中的複雜數據集

回答

相關問題