0
例如,我有下列兩列,分別爲Address1
和refAddr
。識別兩列中的相似字符串值
表中的一些示例數據如下所示。
我想的比較兩列用於匹配。顯然在這張表中,5235 JFK BLVD
& 5235 John F Kennedy
是一對,424 N 2ND ST
& 424 NORTH SECOND
是一對。
無論如何SQL或SSIS我可以用來擺脫非對結果並保留對?
例如,我有下列兩列,分別爲Address1
和refAddr
。識別兩列中的相似字符串值
表中的一些示例數據如下所示。
我想的比較兩列用於匹配。顯然在這張表中,5235 JFK BLVD
& 5235 John F Kennedy
是一對,424 N 2ND ST
& 424 NORTH SECOND
是一對。
無論如何SQL或SSIS我可以用來擺脫非對結果並保留對?
一個選項是您可以使用GOOGLE API對地址進行地理編碼,解析JSON結果以返回更加標準化的結果。這可能會很耗時,但您會對數據更有信心。
該API允許(我相信)每天2500次點擊,但您可以購買更多。
例如,我選擇了5232 JFK Blvd並添加了72116的郵政編碼以縮小搜索範圍。如果沒有郵政編碼它返回了多個地址(NY,NJ,AR,等)
https://maps.googleapis.com/maps/api/geocode/json?address=5232%20JFK%20Blvd&72116sensor=false
的關鍵要素可以是:
formatted_address: "5232 J.F.K. Blvd, North Little Rock, AR 72116, USA",
or
long_name: "John F. Kennedy Boulevard",
返回
{
results: [
{
address_components: [
{
long_name: "5232",
short_name: "5232",
types: [
"street_number"
]
},
{
long_name: "J.F.K. Boulevard",
short_name: "J.F.K. Blvd",
types: [
"route"
]
},
{
long_name: "North Little Rock",
short_name: "North Little Rock",
types: [
"locality",
"political"
]
},
{
long_name: "Hill Township",
short_name: "Hill Township",
types: [
"administrative_area_level_3",
"political"
]
},
{
long_name: "Pulaski County",
short_name: "Pulaski County",
types: [
"administrative_area_level_2",
"political"
]
},
{
long_name: "Arkansas",
short_name: "AR",
types: [
"administrative_area_level_1",
"political"
]
},
{
long_name: "United States",
short_name: "US",
types: [
"country",
"political"
]
},
{
long_name: "72116",
short_name: "72116",
types: [
"postal_code"
]
}
],
formatted_address: "5232 J.F.K. Blvd, North Little Rock, AR 72116, USA",
geometry: {
bounds: {
northeast: {
lat: 34.8032656,
lng: -92.2538364
},
southwest: {
lat: 34.8032599,
lng: -92.2538538
}
},
location: {
lat: 34.8032599,
lng: -92.2538364
},
location_type: "RANGE_INTERPOLATED",
viewport: {
northeast: {
lat: 34.8046117302915,
lng: -92.2524961197085
},
southwest: {
lat: 34.8019137697085,
lng: -92.2551940802915
}
}
},
place_id: "EjI1MjMyIEouRi5LLiBCbHZkLCBOb3J0aCBMaXR0bGUgUm9jaywgQVIgNzIxMTYsIFVTQQ",
types: [
"route",
"street_address"
]
},
{
address_components: [
{
long_name: "5232",
short_name: "5232",
types: [
"street_number"
]
},
{
long_name: "John F. Kennedy Boulevard",
short_name: "John F. Kennedy Blvd",
types: [
"route"
]
},
{
long_name: "West New York",
short_name: "West New York",
types: [
"locality",
"political"
]
},
{
long_name: "Hudson County",
short_name: "Hudson County",
types: [
"administrative_area_level_2",
"political"
]
},
{
long_name: "New Jersey",
short_name: "NJ",
types: [
"administrative_area_level_1",
"political"
]
},
{
long_name: "United States",
short_name: "US",
types: [
"country",
"political"
]
},
{
long_name: "07093",
short_name: "07093",
types: [
"postal_code"
]
}
],
formatted_address: "5232 John F. Kennedy Blvd, West New York, NJ 07093, USA",
geometry: {
bounds: {
northeast: {
lat: 40.78574,
lng: -74.0231416
},
southwest: {
lat: 40.7857366,
lng: -74.0231598
}
},
location: {
lat: 40.78574,
lng: -74.0231416
},
location_type: "RANGE_INTERPOLATED",
viewport: {
northeast: {
lat: 40.78708728029149,
lng: -74.02180171970849
},
southwest: {
lat: 40.7843893197085,
lng: -74.0244996802915
}
}
},
place_id: "Ejc1MjMyIEpvaG4gRi4gS2VubmVkeSBCbHZkLCBXZXN0IE5ldyBZb3JrLCBOSiAwNzA5MywgVVNB",
types: [
"route",
"street_address"
]
}
],
status: "OK"
}
地址匹配和固定是特別通常不包含在數據庫中的通用軟件。 –
購買主數據管理軟件來做到這一點。 – dfundako
在SSIS中使用帶有正則表達式的腳本組件,並標記那些在附加列中匹配的行,然後您可以過濾這些行。 –