2013-04-12 80 views
0

在我剛纔的問題 SQL Server XML String Manipluation撕碎SQL Server中的XML Unicode字符

我得到了以下(感謝的Mikael埃裏克森)的答案切碎XML文檔,並去掉不必要的話說出來的字符。我現在需要更進一步,去掉超過255的Unicode字符。當我在XML中包含這些字符時,它們將作爲問號存儲在@T表變量(在下面的代碼中)中。我怎樣才能讓這些角色作爲實際的Unicode字符來實現,這樣我就可以將它們去掉了?

我有一個很好地工作,以除去不需要的字符的功能,但是由於統一進來爲問號,不接觸他們

-- A table to hold the bad words 
declare @BadWords table 
(
    ID int identity, 
    Value nvarchar(10) 
) 

-- These are the bad ones. 
insert into @BadWords values 
('one'), 
('three'), 
('five'), 
('hold') 

-- XML that needs cleaning 
declare @XML xml = ' 
<root> 
    <itemone ID="1one1">1one1</itemone> 
    <itemtwo>2two2</itemtwo> 
    <items> 
    <item>1one1</item> 
    <item>2two2</item> 
    <item>onetwothreefourfive</item> 
    </items> 
    <hold>We hold these truths to be self evident</hold> 
</root> 
' 

-- A helper table to hold the values to modify 
declare @T table 
(
    ID int identity, 
    Pos int, 
    OldValue nvarchar(max), 
    NewValue nvarchar(max), 
    Attribute bit 
) 

-- Get all attributes from the XML 
insert into @T(Pos, OldValue, NewValue, Attribute) 
select row_number() over(order by T.N), 
     T.N.value('.', 'nvarchar(max)'), 
     T.N.value('.', 'nvarchar(max)'), 
     1 
from @XML.nodes('//@*') as T(N) 

-- Get all values from the XML 
insert into @T(Pos, OldValue, NewValue, Attribute) 
select row_number() over(order by T.N), 
     T.N.value('text()[1]', 'nvarchar(max)'), 
     T.N.value('text()[1]', 'nvarchar(max)'), 
     0 
from @XML.nodes('//*') as T(N) 

declare @ID int 
declare @Pos int 
declare @Value nvarchar(max) 
declare @Attribute bit 

-- Remove the bad words from @T, one bad word at a time 
select @ID = max(ID) from @BadWords 
while @ID > 0 
begin 
    select @Value = Value 
    from @BadWords 
    where ID = @ID 

    update @T 
    set NewValue = replace(NewValue, @Value, '') 

    set @ID -= 1 
end 

-- Write the cleaned values back to the XML 
select @ID = max(ID) from @T 
while @ID > 0 
begin 
    select @Value = nullif(NewValue, OldValue), 
     @Attribute = Attribute, 
     @Pos = Pos 
    from @T 
    where ID = @ID 

    print @Attribute 

    if @Value is not null 
    if @Attribute = 1 
     set @XML.modify('replace value of ((//@*)[sql:variable("@Pos")])[1] 
         with sql:variable("@Value")') 
    else 
     set @XML.modify('replace value of ((//*)[sql:variable("@Pos")]/text())[1] 
          with sql:variable("@Value")') 
    set @ID -= 1 
end 

select @XML 

回答

2

這部分看起來關:

insert into @BadWords values 
('one'), 
('three'), 
('five'), 
('hold') 

您需要Unicode字符串文字的N前綴。如果沒有N,你的代碼將它們視爲VARCHAR,並且你會得到多字節字符的問號。還有其他地方你也必須使用Unicode友好的字符串。 XML通常是UTF-8,所以應該能夠處理Unicode字符,儘管標準不鼓勵these。你的代碼應該看起來像:

insert into @BadWords values 
(N'one'), 
(N'three'), 
(N'five'), 
(N'hold') 
+0

明白了。非常感謝!我實際上去了,並在XML片段中加入了「N」,這解決了我的問題。 – user1873604