2016-08-12 29 views
0

例如,我有下列兩列,分別爲Address1refAddr識別兩列中的相似字符串值

表中的一些示例數據如下所示。

enter image description here

我想的比較兩列用於匹配。顯然在這張表中,5235 JFK BLVD & 5235 John F Kennedy是一對,424 N 2ND ST & 424 NORTH SECOND是一對。

無論如何SQL或SSIS我可以用來擺脫非對結果並保留對?

+2

地址匹配和固定是特別通常不包含在數據庫中的通用軟件。 –

+0

購買主數據管理軟件來做到這一點。 – dfundako

+0

在SSIS中使用帶有正則表達式的腳本組件,並標記那些在附加列中匹配的行,然後您可以過濾這些行。 –

回答

3

一個選項是您可以使用GOOGLE API對地址進行地理編碼,解析JSON結果以返回更加標準化的結果。這可能會很耗時,但您會對數據更有信心。

該API允許(我相信)每天2500次點擊,但您可以購買更多。

例如,我選擇了5232 JFK Blvd並添加了72116的郵政編碼以縮小搜索範圍。如果沒有郵政編碼它返回了多個地址(NY,NJ,AR,等)

https://maps.googleapis.com/maps/api/geocode/json?address=5232%20JFK%20Blvd&72116sensor=false 

的關鍵要素可以是:

formatted_address: "5232 J.F.K. Blvd, North Little Rock, AR 72116, USA", 
or 
long_name: "John F. Kennedy Boulevard", 

返回

{ 
results: [ 
{ 
address_components: [ 
{ 
long_name: "5232", 
short_name: "5232", 
types: [ 
"street_number" 
] 
}, 
{ 
long_name: "J.F.K. Boulevard", 
short_name: "J.F.K. Blvd", 
types: [ 
"route" 
] 
}, 
{ 
long_name: "North Little Rock", 
short_name: "North Little Rock", 
types: [ 
"locality", 
"political" 
] 
}, 
{ 
long_name: "Hill Township", 
short_name: "Hill Township", 
types: [ 
"administrative_area_level_3", 
"political" 
] 
}, 
{ 
long_name: "Pulaski County", 
short_name: "Pulaski County", 
types: [ 
"administrative_area_level_2", 
"political" 
] 
}, 
{ 
long_name: "Arkansas", 
short_name: "AR", 
types: [ 
"administrative_area_level_1", 
"political" 
] 
}, 
{ 
long_name: "United States", 
short_name: "US", 
types: [ 
"country", 
"political" 
] 
}, 
{ 
long_name: "72116", 
short_name: "72116", 
types: [ 
"postal_code" 
] 
} 
], 
formatted_address: "5232 J.F.K. Blvd, North Little Rock, AR 72116, USA", 
geometry: { 
bounds: { 
northeast: { 
lat: 34.8032656, 
lng: -92.2538364 
}, 
southwest: { 
lat: 34.8032599, 
lng: -92.2538538 
} 
}, 
location: { 
lat: 34.8032599, 
lng: -92.2538364 
}, 
location_type: "RANGE_INTERPOLATED", 
viewport: { 
northeast: { 
lat: 34.8046117302915, 
lng: -92.2524961197085 
}, 
southwest: { 
lat: 34.8019137697085, 
lng: -92.2551940802915 
} 
} 
}, 
place_id: "EjI1MjMyIEouRi5LLiBCbHZkLCBOb3J0aCBMaXR0bGUgUm9jaywgQVIgNzIxMTYsIFVTQQ", 
types: [ 
"route", 
"street_address" 
] 
}, 
{ 
address_components: [ 
{ 
long_name: "5232", 
short_name: "5232", 
types: [ 
"street_number" 
] 
}, 
{ 
long_name: "John F. Kennedy Boulevard", 
short_name: "John F. Kennedy Blvd", 
types: [ 
"route" 
] 
}, 
{ 
long_name: "West New York", 
short_name: "West New York", 
types: [ 
"locality", 
"political" 
] 
}, 
{ 
long_name: "Hudson County", 
short_name: "Hudson County", 
types: [ 
"administrative_area_level_2", 
"political" 
] 
}, 
{ 
long_name: "New Jersey", 
short_name: "NJ", 
types: [ 
"administrative_area_level_1", 
"political" 
] 
}, 
{ 
long_name: "United States", 
short_name: "US", 
types: [ 
"country", 
"political" 
] 
}, 
{ 
long_name: "07093", 
short_name: "07093", 
types: [ 
"postal_code" 
] 
} 
], 
formatted_address: "5232 John F. Kennedy Blvd, West New York, NJ 07093, USA", 
geometry: { 
bounds: { 
northeast: { 
lat: 40.78574, 
lng: -74.0231416 
}, 
southwest: { 
lat: 40.7857366, 
lng: -74.0231598 
} 
}, 
location: { 
lat: 40.78574, 
lng: -74.0231416 
}, 
location_type: "RANGE_INTERPOLATED", 
viewport: { 
northeast: { 
lat: 40.78708728029149, 
lng: -74.02180171970849 
}, 
southwest: { 
lat: 40.7843893197085, 
lng: -74.0244996802915 
} 
} 
}, 
place_id: "Ejc1MjMyIEpvaG4gRi4gS2VubmVkeSBCbHZkLCBXZXN0IE5ldyBZb3JrLCBOSiAwNzA5MywgVVNB", 
types: [ 
"route", 
"street_address" 
] 
} 
], 
status: "OK" 
}