我想將原始氣象站數據從數據記錄器轉換爲易於使用的csv文件。數據如下所示。數據以這種空格分隔的格式出現,其中第一行數據有47列,第一列值等於111,第47列等於329.1。第二行的第一個值也是111,第47行是354.2。並非所有行都具有相同數量的列,並且任何數字前面的「 - 」符號表示負數。使用R和正則表達式將分隔文件轉換爲csv
01+0111. 02+0262. 03+1900. 04-15.68 05+64.50 06+08.82 07+1.013 08+0.943
09+342.1 10+21.26 11+0.000 12+31.76 13+18.46 14+16.50 15+1800. 16+5.250
17+69.46 18+1859. 19+55.25 20+27.61 21+1808. 22+50.25 23+2.367 24+1806.
25+15.25 26+14.78 27+1859. 28+55.25 29+60.11 30+1800. 31-5.250 32+0.000
33+1854. 34+5.250 35+0.447 36+1819. 37+50.25 38+14.80 39+69.40 40+0.073
41+275.3 42+0.447 43+18.29 44+22.30 45+22.43 46+2.367 47+329.1
01+0111. 02+0262. 03+2000. 04-14.28 05+070.7 06+0.012 07+0.755 08+0.694
09+337.5 10+22.90 11+0.000 12+0.044 13+18.13 14+14.78 15+1900. 16+15.25
17+072.6 18+1908. 19+15.25 20+0.146 21+1946. 22+10.25 23+1.567 24+1948.
25+25.25 26+14.02 27+1959. 28+20.25 29+69.21 30+1936. 31-25.25 32+0.000
33+1900. 34+20.25 35+0.447 36+1900. 37+5.250 38+14.02 39+69.95 40+0.000
41+343.6 42+0.607 43+17.97 44+21.97 45+22.13 46+1.567 47+354.2
01+0111. 02+0262. 03+2100. 04-13.01 05+075.7 06+0.007 07+0.617 08+0.528
09+20.10 10+30.68 11+0.000 12+0.026 13+17.79 14+14.02 15+2000. 16+5.250
17+082.7 18+2050. 19+55.25 20+0.146 21+2028. 22+30.25 23+1.407 24+2001.
25+25.25 26+11.78 27+2051. 28+40.25 29+69.68 30+2001. 31-25.25 32+0.000
33+2000. 34+5.250 35+0.447 36+2002. 37+25.25 38+12.00 39+081.0 40+0.000
41+39.42 42+0.447 43+17.61 44+21.68 45+21.82 46+1.407 47+349.4
01+0111. 02+0262. 03+1900. 04-15.68 05+64.50 06+08.82
01+0111. 02+0262. 03+2100. 04-13.01 05+075.7 06+0.007 07+0.617 08+0.528
09+20.10 10+30.68 11+0.000
我讀在像這樣的數據:
test <- readLines(data)
這導致與每個觀測由所述數據的行的,以字符格式,這是沒有用的數據幀。也許有辦法解決這個問題,但我嘗試了很多方法,沒有運氣。我確信有一種方法可以將上面的數據讀入5行,每行包含適當數量的列和逗號分隔,並且每個值都沒有前面的列號(如下所示)。但是,我不知道如何做到這一點,尤其是使用正則表達式。如果有人能夠提供幫助,我會非常感激。謝謝。
111,262,1900,-15.68,64.50,8.82,1.013,0.943,342.1,21.26,0,31.76,18.46,16.50,1800,5.250,69.46,1859,55.25,27.61,1808,50.25,2.367,1806,15.25,14.78,1859,55.25,60.11,1800,-5.250,0,1854,5.250,0.447,1819,50.25,14.80,69.40,0.073,275.3,0.447,18.29,22.30,22.43,2.367,329.1
111,262,2000,-14.28,70.7,0.012,0.755,0.694,337.5,22.90,0,0.044,18.13,14.78,1900,15.25,072.6,1908,15.25,0.146,1946,10.25,1.567,1948,25.25,14.02,1959,20.25,69.21,1936,-25.25,0,1900,20.25,0.447,1900,5.250,14.02,69.95,0,343.6,0.607,17.97,21.97,22.13,1.567,354.2
111,262,2100,-13.01,75.7,0.007,0.617,0.528,20.10,30.68,0,0.026,17.79,14.02,2000,5.250,082.7,2050,55.25,0.146,2028,30.25,1.407,2001,25.25,11.78,2051,40.25,69.68,2001,-25.25,0,2000,5.250,0.447,2002,25.25,12.00,081.0,0,39.42,0.447,17.61,21.68,21.82,1.407,349.4
111,262,1900,-15.68,64.50,8.82
111,262,2100,-13.01,75.7,0.007,0.617,0.528,20.10,30.68,0
This is what my data looks like after I read it in:
c("01+0111. 02+0262. 03+1900. 04-15.68 05+64.50 06+08.82
07+1.013 08+0.943",
"09+342.1 10+21.26 11+0.000 12+31.76 13+18.46 14+16.50 15+1800.
16+5.250",
"17+69.46 18+1859. 19+55.25 20+27.61 21+1808. 22+50.25 23+2.367
24+1806.",
"25+15.25 26+14.78 27+1859. 28+55.25 29+60.11 30+1800. 31-5.250
32+0.000",
"33+1854. 34+5.250 35+0.447 36+1819. 37+50.25 38+14.80 39+69.40
40+0.073",
"41+275.3 42+0.447 43+18.29 44+22.30 45+22.43 46+2.367 47+329.1",
"01+0111. 02+0262. 03+2000. 04-14.28 05+070.7 06+0.012 07+0.755
08+0.694",
"09+337.5 10+22.90 11+0.000 12+0.044 13+18.13 14+14.78 15+1900.
16+15.25",
"17+072.6 18+1908. 19+15.25 20+0.146 21+1946. 22+10.25 23+1.567
24+1948.",
"25+25.25 26+14.02 27+1959. 28+20.25 29+69.21 30+1936. 31-25.25
32+0.000",
"33+1900. 34+20.25 35+0.447 36+1900. 37+5.250 38+14.02 39+69.95
40+0.000",
"41+343.6 42+0.607 43+17.97 44+21.97 45+22.13 46+1.567 47+354.2",
"01+0111. 02+0262. 03+2100. 04-13.01 05+075.7 06+0.007 07+0.617
08+0.528",
"09+20.10 10+30.68 11+0.000 12+0.026 13+17.79 14+14.02 15+2000.
16+5.250",
"17+082.7 18+2050. 19+55.25 20+0.146 21+2028. 22+30.25 23+1.407
24+2001.",
"25+25.25 26+11.78 27+2051. 28+40.25 29+69.68 30+2001. 31-25.25
32+0.000",
"33+2000. 34+5.250 35+0.447 36+2002. 37+25.25 38+12.00 39+081.0
40+0.000",
"41+39.42 42+0.447 43+17.61 44+21.68 45+21.82 46+1.407 47+349.4",
"01+0111. 02+0262. 03+1900. 04-15.68 05+64.50 06+08.82",
"01+0111. 02+0262. 03+2100. 04-13.01 05+075.7 06+0.007 07+0.617
08+0.528",
"09+20.10 10+30.68 11+0.000")
This is the result of dput(text) after running the code:
structure(list(X01 = c(111, 342.1, 69.46, 15.25, 1854, 275.3,
111, 337.5, 72.6, 25.25, 1900, 343.6, 111, 20.1, 82.7, 25.25,
2000, 39.42, 111, 111, 20.1), X02 = c(262, 21.26, 1859, 14.78,
5.25, 0.447, 262, 22.9, 1908, 14.02, 20.25, 0.607, 262, 30.68,
2050, 11.78, 5.25, 0.447, 262, 262, 30.68), X03 = c(1900, 0,
55.25, 1859, 0.447, 18.29, 2000, 0, 15.25, 1959, 0.447, 17.97,
2100, 0, 55.25, 2051, 0.447, 17.61, 1900, 2100, 0), X04 = c(-15.68,
31.76, 27.61, 55.25, 1819, 22.3, -14.28, 0.044, 0.146, 20.25,
1900, 21.97, -13.01, 0.026, 0.146, 40.25, 2002, 21.68, -15.68,
-13.01, NA), X05 = c(64.5, 18.46, 1808, 60.11, 50.25, 22.43,
70.7, 18.13, 1946, 69.21, 5.25, 22.13, 75.7, 17.79, 2028, 69.68,
25.25, 21.82, 64.5, 75.7, NA), X06 = c(8.82, 16.5, 50.25, 1800,
14.8, 2.367, 0.012, 14.78, 10.25, 1936, 14.02, 1.567, 0.007,
14.02, 30.25, 2001, 12, 1.407, 8.82, 0.007, NA), X07 = c(1.013,
1800, 2.367, -5.25, 69.4, 329.1, 0.755, 1900, 1.567, -25.25,
69.95, 354.2, 0.617, 2000, 1.407, -25.25, 81, 349.4, NA, 0.617,
NA), X08 = c(0.943, 5.25, 1806, 0, 0.073, NA, 0.694, 15.25, 1948,
0, 0, NA, 0.528, 5.25, 2001, 0, 0, NA, NA, 0.528, NA), X09 = c(NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_),
X10 = c(NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_), X11 = c(NA_real_,
.....)
使用'dput(測試)'你的問題,以顯示你的數據是什麼樣子,你看它在 – Tunn
什麼是你在讀取數據的原始文件類型後從? –
Richard,它是一個.dat文件。 – user8229029