如何從我的文件中刪除重複項，而某些字段優先？

所以我有SQL將添加到字段的代碼，如果它檢測到重複。有一個叫DS如何從我的文件中刪除重複項，而某些字段優先？

另一場

DS可以是「是」，也可以是「不」

我怎樣才能使它所以如果發現重複，「是」不編碼和「不」是什麼？

基本上'是'會獲得優先權。

我的SQL：

WITH cte 
    AS (SELECT *, 
       Row_Number() OVER(partition BY fips_county_code, last, suffix, first, birthdate Order by (select null)) AS Rn 
     FROM [PULLED REC]) 
UPDATE cte 
SET BAD_CODES = Isnull(BAD_CODES, '') + 'D' 
WHERE RN > 1;

來源

2017-01-23 rohanharrison

爲什麼能」你是否將DS設置爲「否」？ – Anand

我不明白你在這裏要做什麼。你想更新重複的所有行嗎？你是否忽略了DS列以確定它是否是重複的？如果沒有表格的一些細節和你想要做的事情，這真的很難回答。 –

@SeanLange我想更新所有行的重複，而是說，如果我們有兩個記錄名稱：山姆DS：是名：山姆DS：沒有然後我們唯一的代碼的DS =沒有。 – rohanharrison

只更新行，其中ds='No'可以添加到where條款。

要確保rn > 1沒有跳過重複的內容，你需要更新的一個，您可以使用exists()替代count()

with cte as (
    select 
     * 
    , rn = row_number() over (
      partition by fips_county_code, last, suffix, first, birthdate 
      order by (case when DS = 'yes' then 0 else 1 end) asc 
      ) 
    from [pulled rec] 
) 
/* -- check with select first -- */ 
select * from cte 

/* 
update cte set 
    bad_codes = isnull(bad_codes, '') + 'D' 
--*/ 
/* -- Update all records that have a duplicate 
    -- except the First row, ordered by ds='Yes' first */ 
/* 
    where cte.ds = 'No' 
    and cte.rn > 1 
--*/ 
-- Update all records that have a duplicate and ds='No' -- 
--/* 
    where cte.ds = 'No' 
    and exists (
     select 1 
     from cte as i 
     where i.rn > 1 
      and i.fips_county_code = cte.fips_county_code 
      and i.last = cte.last 
      and i.suffix = cte.suffix 
      and i.first = cte.first 
      and i.birthdate = cte.birthdate 
    ); 
--*/

使用 count() over()

替代版本：

with cte as (
    select 
     * 
    , CountOver = count() over (
      partition by fips_county_code, last, suffix, first, birthdate 
      ) 
    from [pulled rec] 
) 
/* -- check with select first -- */ 
select * from cte 

/* 
update cte set 
    bad_codes = isnull(bad_codes, '') + 'D' 
--*/ 

    where cte.ds = 'No' 
    and cte.CountOver > 1

來源

2017-01-23 16:10:38 SqlZim

我想這應該讓你在正確的方向。

WITH cte 
    AS (SELECT *, 
       Row_Number() OVER(partition BY fips_county_code, last, suffix, first, birthdate Order by (select null)) AS Rn 
       , COUNT(*) as DupeCount 
     FROM [PULLED REC] 
     group by fips_county_code, last, suffix, first, birthdate --and whatever other columns are present 
) 
UPDATE cte 
SET BAD_CODES = case RN when 1 then BAC_CODES else Isnull(BAD_CODES, '') + 'D' end 
    , DS = Case DupeCount when 1 then 'no' else 'yes' end

來源

2017-01-23 16:06:27

如何從我的文件中刪除重複項，而某些字段優先？

回答

相關問題