2017-05-26 61 views
1

下面的代碼是一個包含溫度分組數據的樣本(記住這是一個人在醫院採取的溫度)從我們的源系統。如何處理壞數據的質量在一個SQL查詢

顯然,數據是可怕的,但不知道是否有可能以某種方式把這些數據轉化和INT,因爲我們有一個計量單位(計量單位)字段,所以我們只需要數。

數據問題:

88度顯然華氏和攝氏不是3635 將36.35 0.368將是36.8 37.3。將37.3 .37.7是37.7 377將37.7 .3.8將爲38

我覺得任何其它變化應該只是排除無效數據是公平爲不能準確地做出明智的假設。

DECLARE @Test TABLE (
    [Temperature] VARCHAR(500), 
    [Count] VARCHAR(50) 
       ) 
INSERT INTO @Test ([Temperature],[Count]) 
VALUES 

('34.4    oC',' 9 '), 
('36.02    oC',' 1 '), 
('36.36    oC',' 3 '), 
('36.5    oC',' 5593 '), 
('36.5.    oC',' 1 '), 
('36.6.    oC',' 2 '), 
('36.74    oC',' 2 '), 
('36.82    oC',' 2 '), 
('37.36    oC',' 2 '), 
('37.49    oC',' 4 '), 
('40     oC',' 1 '), 
('88     oC',' 1 '), 
('  3635     oC',' 1 '), 
(' .368    oC',' 1 '), 
('33.5    oC',' 1 '), 
('35.2    oC',' 84 '), 
('35.20    oC',' 1 '), 
('35.99    oC',' 1 '), 
('36.35    oC',' 2 '), 
('37.3.    oC',' 1 '), 
('39.5    oC',' 5 '), 
('86     oC',' 1 '), 
('   356     oC',' 12 '), 
('   364     oC',' 72 '), 
('   379     oC',' 9 '), 
('   385     oC',' 2 '), 
('  3535     oC',' 1 '), 
(' .37.7    oC',' 1 '), 
('35.5    oC',' 290 '), 
('35.87    oC',' 1 '), 
('36..6    oC',' 1 '), 
('36.25    oC',' 2 '), 
('36.45    oC',' 2 '), 
('36.62    oC',' 2 '), 
('36.68    oC',' 5 '), 
('36.8.    oC',' 2 '), 
('37.03    oC',' 5 '), 
('37.1    oC',' 3610 '), 
('37.16    oC',' 3 '), 
('37.2  oCC000715799',' 1 '), 
('37.27    oC',' 2 '), 
('37.91    oC',' 1 '), 
('38.9    oC',' 28 '), 
('63.5    oC',' 1 '), 
('71     oC',' 1 '), 
('   377     oC',' 8 '), 
('    36.5 oC',' 1 '), 
(' 3.4    oC',' 3 '), 
(' 3.7    oC',' 3 '), 
('36.59    oC',' 1 '), 
('36.67    oC',' 5 '), 
('37.13    oC',' 1 '), 
('37.18    oC',' 1 '), 
('37.24    oC',' 1 '), 
('39.7    oC',' 5 '), 
('76     oC',' 2 '), 
('80     oC',' 2 '), 
('   347     oC',' 1 '), 
('   352     oC',' 2 '), 
('   368     oC',' 64 '), 
('  3602     oC',' 1 '), 
('  3688     oC',' 1 '), 
(' .36.4    oC',' 1 '), 
(' .8    oC',' 1 '), 
(' 3.2    oC',' 2 '), 
('34.3    oC',' 5 '), 
('34.9    oC',' 20 '), 
('35     oC',' 124 '), 
('35.81    oC',' 1 '), 
('36.17    oC',' 2 '), 
('36.23    oC',' 1 '), 
('36.37    oC',' 2 '), 
('36.38    oC',' 4 '), 
('36.42    oC',' 1 '), 
('36.76    oC',' 2 '), 
('37..2    oC',' 1 '), 
('37.00    oC',' 4 '), 
('37.07    oC',' 6 '), 
('37.12    oC',' 2 '), 
('37.2    oC',' 3151 '), 
('37.48    oC',' 2 '), 
('39.     oC',' 1 '), 
('39.2    oC',' 9 '), 
('39.9    oC',' 2 '), 
('   370     oC',' 1 '), 
('30.1    oC',' 1 '), 
('34.1    oC',' 2 '), 
('34.8    oC',' 17 '), 
('35.43    oC',' 1 '), 
('36..8    oC',' 2 '), 
('36.05    oC',' 1 '), 
('36.21    oC',' 4 '), 
('36.31    oC',' 2 '), 
('36.41    oC',' 1 '), 
('36.58    oC',' 8 '), 
('36.8    oC',' 8134 '), 
('36.81    oC',' 3 '), 
('36.88    oC',' 2 '), 
('36.89    oC',' 2 '), 
('36.99    oC',' 4 '), 
('37.01    oC',' 6 '), 
('37.14    oC',' 3 '), 
('37.33    oC',' 1 '), 
('37.37    oC',' 6 '), 
('37.44    oC',' 1 '), 
('37.59    oC',' 2 '), 
('38.5    oC',' 85 '), 
('39.4    oC',' 9 '), 
('78     oC',' 2 '), 
('92     oC',' 1 '), 
('   361     oC',' 19 '), 
('   383     oC',' 1 '), 
('   391     oC',' 1 '), 
('  3642     oC',' 1 '), 
('  3699     oC',' 2 '), 
('    37.6 oC',' 1 '), 
('35.59    oC',' 1 '), 
('35.69    oC',' 1 '), 
('35.90    oC',' 1 '), 
('36..9    oC',' 1 '), 
('36.08    oC',' 2 '), 
('36.27    oC',' 1 '), 
('36.365    oC',' 1 '), 
('36.51    oC',' 1 '), 
('36.78    oC',' 4 '), 
('36.84    oC',' 1 '), 
('36.85    oC',' 3 '), 
('36.97    oC',' 2 '), 
('37.29    oC',' 1 '), 
('37.3    oC',' 2306 '), 
('37.8    oC',' 730 '), 
('38.08    oC',' 1 '), 
('38.4    oC',' 113 '), 
('38.49    oC',' 1 '), 
('38.7    oC',' 53 '), 
('39.3    oC',' 10 '), 
('70     oC',' 2 '), 
('   357     oC',' 5 '), 
('   362     oC',' 49 '), 
('   396.8    oC',' 1 '), 
('  3700     oC',' 1 '), 
('  3752     oC',' 1 '), 
(' .381    oC',' 1 '), 
(' 0.37    oC',' 1 '), 
(' 3.1    oC',' 1 '), 
('14     oC',' 1 '), 
('27     oC',' 1 '), 
('34.2    oC',' 5 '), 
('34.5    oC',' 22 '), 
('35.9    oC',' 633 '), 
('36.44    oC',' 2 '), 
('36.57    oC',' 1 '), 
('36.65    oC',' 1 '), 
('36.66    oC',' 3 '), 
('37.04    oC',' 7 '), 
('65.9    oC',' 1 '), 
('82     oC',' 2 '), 
('   118     oC',' 1 '), 
('   358     oC',' 6 '), 
('   381     oC',' 2 '), 
('   396.6    oC',' 1 '), 
('  3704     oC',' 1 '), 
('  3801     oC',' 1 '), 
('  ',' 195340 '), 
('    362 oC',' 1 '), 
(' .374    oC',' 1 '), 
(' 3.6    oC',' 3 '), 
('26.5    oC',' 1 '), 
('35.0    oC',' 28 '), 
('35.79    oC',' 1 '), 
('36..7    oC',' 1 '), 
('36.00    oC',' 2 '), 
('36.18    oC',' 1 '), 
('36.48    oC',' 4 '), 
('36.49    oC',' 3 '), 
('37.19    oC',' 2 '), 
('37.46    oC',' 1 '), 
('37.9    oC',' 465 '), 
('38.12    oC',' 1 '), 
('39     oC',' 25 '), 
('   351     oC',' 2 '), 
('   369.     oC',' 1 '), 
('   389     oC',' 1 '), 
('  3736     oC',' 1 '), 
(' NULL ',' 7 '), 
('35.98    oC',' 1 '), 
('36     oC',' 2948 '), 
('36.28    oC',' 1 '), 
('36.69    oC',' 1 '), 
('36.72    oC',' 2 '), 
('36.77    oC',' 4 '), 
('36.98    oC',' 7 '), 
('37.05    oC',' 3 '), 
('37.06    oC',' 2 '), 
('37.15    oC',' 3 '), 
('37.25    oC',' 5 '), 
('37.26    oC',' 3 '), 
('37.39    oC',' 3 '), 
('37.42    oC',' 1 '), 
('37.68    oC',' 3 '), 
('38.3    oC',' 160 '), 
('38.6.    oC',' 1 '), 
('   376     oC',' 18 '), 
('  3617     oC',' 1 '), 
('  3703     oC',' 1 '), 
(' 3.8    oC',' 2 '), 
(' 7.6    oC',' 1 '), 
('30.6    oC',' 1 '), 
('34     oC',' 3 '), 
('34.7    oC',' 9 '), 
('35.06    oC',' 1 '), 
('35.7    oC',' 324 '), 
('35.74    oC',' 1 '), 
('36.01    oC',' 2 '), 
('36.1    oC',' 1517 '), 
('36.12    oC',' 1 '), 
('36.4    oC',' 5001 '), 
('36.6    oC',' 7044 '), 
('36.79    oC',' 5 '), 
('36.86    oC',' 1 '), 
('36.90    oC',' 1 '), 
('36.93    oC',' 1 '), 
('37.30    oC',' 1 '), 
('37.92    oC',' 1 '), 
('38.     oC',' 5 '), 
('38.6    oC',' 65 '), 
('38.8    oC',' 46 '), 
('97     oC',' 1 '), 
('   354     oC',' 4 '), 
('   355     oC',' 5 '), 
('   365     oC',' 107 '), 
('  3654     oC',' 1 '), 
('35.8    oC',' 495 '), 
('36.09    oC',' 6 '), 
('36.2    oC',' 2526 '), 
('36.3.    oC',' 1 '), 
('36.47    oC',' 1 '), 
('36.53    oC',' 2 '), 
('36.9    oC',' 5449 '), 
('37.0    oC',' 1209 '), 
('37.1.    oC',' 1 '), 
('37.32    oC',' 2 '), 
('37.38    oC',' 5 '), 
('37.45    oC',' 1 '), 
('37.5    oC',' 1477 '), 
('37.6    oC',' 1101 '), 
('37.80    oC',' 1 '), 
('38.1    oC',' 215 '), 
('40.2    oC',' 1 '), 
('62     oC',' 1 '), 
('   366     oC',' 61 '), 
('   375     oC',' 28 '), 
('16     oC',' 1 '), 
('34.0    oC',' 1 '), 
('35.     oC',' 3 '), 
('35.1    oC',' 61 '), 
('35.23    oC',' 1 '), 
('35.58    oC',' 2 '), 
('36.     oC',' 59 '), 
('36.03    oC',' 1 '), 
('36.16    oC',' 2 '), 
('36.94    oC',' 2 '), 
('37.08    oC',' 7 '), 
('37.21    oC',' 1 '), 
('37.47    oC',' 1 '), 
('39.8    oC',' 3 '), 
('   346     oC',' 1 '), 
('   353     oC',' 2 '), 
('   369     oC',' 57 '), 
('   374     oC',' 28 '), 
('  3677     oC',' 1 '), 
('    37.4 oC',' 1 '), 
('34.6    oC',' 15 '), 
('35.3    oC',' 74 '), 
('35.4    oC',' 120 '), 
('35.6    oC',' 320 '), 
('36.06    oC',' 1 '), 
('36.07    oC',' 2 '), 
('36.14    oC',' 1 '), 
('36.19    oC',' 1 '), 
('36.54    oC',' 1 '), 
('36.71    oC',' 1 '), 
('36.92    oC',' 1 '), 
('37.50    oC',' 1 '), 
('37.54    oC',' 1 '), 
('37.7    oC',' 836 '), 
('39.0    oC',' 8 '), 
('39.6    oC',' 3 '), 
('60     oC',' 1 '), 
('   127     oC',' 1 '), 
('   336.8    oC',' 1 '), 
('  1500     oC',' 1 '), 
('    36.4 oC',' 1 '), 
('36.0    oC',' 829 '), 
('36.3    oC',' 3192 '), 
('36.56    oC',' 3 '), 
('36.63    oC',' 2 '), 
('36.7    oC',' 6348 '), 
('36.73    oC',' 3 '), 
('36.96    oC',' 4 '), 
('37.     oC',' 64 '), 
('37.4    oC',' 1861 '), 
('37.69    oC',' 1 '), 
('38.01    oC',' 1 '), 
('93     oC',' 1 '), 
('   351.     oC',' 1 '), 
('   371     oC',' 24 '), 
('   372     oC',' 45 '), 
('   373     oC',' 30 '), 
('  3722     oC',' 1 '), 
(' .3.8    oC',' 1 '), 
('26.1    oC',' 1 '), 
('35.97    oC',' 4 '), 
('36.61    oC',' 3 '), 
('37     oC',' 4890 '), 
('37.02    oC',' 3 '), 
('37.66    oC',' 1 '), 
('38     oC',' 367 '), 
('38.0    oC',' 72 '), 
('38.2    oC',' 225 '), 
('39.1    oC',' 22 '), 
('   359     oC',' 14 '), 
('   360     oC',' 3 '), 
('   363     oC',' 49 '), 
('   367     oC',' 112 '), 
('   378     oC',' 8 ') 


Select 

* 
from @Test 
+1

你已經知道你的數據是什麼樣子,你會希望它是什麼喜歡。什麼阻止你改變它? –

+0

沒有什麼問題,這真的是模型表中的數據,我需要得到一個有效的INT值到DW中。 – Simon

+0

這歸結於使用'CASE','LIKE'實現你已經在T-SQL,它分解成很多的子問題組成的規則,'PATINDEX'等什麼給你的是「明顯的」將爲了計算機的利益必須精心編碼。對於更復雜的情況,您可能需要使用正則表達式的客戶端清理,因爲T-SQL的字符串操作不是很先進。 –

回答

1

我相信這會讓你接近。

SELECT *, 
CASE 
WHEN ISNUMERIC(REPLACE(REPLACE(REPLACE(Temperature, 'oC', ''), ' ', ''), '.', '')) = 1 
THEN CONVERT(INT, (REPLACE(REPLACE(REPLACE(Temperature, 'oC', ''), ' ', ''), '.', ''))) 
ELSE NULL END AS TemperatureValue 
FROM @Test 

希望這可以幫助你。

1

這工作得很好,但沒有考慮我們所保留,因爲這樣有DQ的問題是什麼記錄

TRY_CAST(REPLACE(Temperature, 'oC', '') AS DECIMAL(19,8)) 
+0

我沒有工作,試圖轉換爲INT,所以去了容易出去! – Simon

相關問題