cancel
Showing results for 
Search instead for 
Did you mean: 

Remove illegal char from xml column

Former Member
8,764

So I've inherited this database with some invalid xml data in one table.

I know for a fact that the character is a control char and I can find it in the column (which is of type XML):

SELECT CHARINDEX(char(26), BLA_COLUMN) FROM BLA_TABLE where ID_RECORD = 1234 Returns a valid index. Now, when I try to replace that char with something else:

UPDATE BLA_TABLE SET BLA_COLUMN = REPLACE(BLA_COLUMN, char(26), char(32)) WHERE ID_RECORD = 1234

The statement fails with message: "XML parser error: character: 603, line:1, column:603 Illegal control character SQLCODE=-888, ODBC 3 State="HY000"

Any ideas how to get rid of that character? (Running SQLA 12)

Thank you!

EDIT

Looks like Volker's suggestion to first update the BLA_COLUMN to a Null value then to update to a valid XML, replacing char(26) with char (32), provides a workaround. Thanks Volker!

Accepted Solutions (1)

Accepted Solutions (1)

VolkerBarth
Contributor

Is the same true if you try to set the column to null (which would look like the previous contents is parsed, too)...?


EDIT: As this seems to have done the trick, it shows that

  • the parser does not validate the old contents (preventing some kind of "correction deadlock")
  • and NULL is not that bad now and then(Breck, do you hear me?:)
justin_willey
Participant
0 Kudos

If that is happening it's hard to see how you would ever be able to fix it - it's odd that it could get in there in the first place unless the behaviour has been changed at some time and it is now stricter. If this is the case, it's beginning to look more like a bug than a feature 🙂

Former Member

Setting the column to Null first, then updating it with a valid XML (ie char 26 replaced with char 32) seems to do the trick.

justin_willey
Participant

I had a go reproducing the problem (on 10.0.1 and 16.0) but couldn't - I could select and update the bad XML without an issue - so I was wondering what was causing the problem. Anyway, you have a work around so that's great.

create table TestXML(PK int default autoincrement, XMLStuff xml, primary key(PK));

insert into TestXML(XMLStuff) values ('<Bad XML>');
insert into  TestXML(XMLStuff) values ('<GoodXML/>');
insert into  TestXML(XMLStuff) values (string('<ReallyBadXML',char(26),'>'));
insert into  TestXML(XMLStuff) values (string('<EvenWorseXML',char(26)));

select * from TestXML where PK = 1;
-- works OK

update TestXML set XMLStuff = '<BetterXML/>' where  PK = 1;
-- works OK

select * from TestXML where PK = 3;
-- works OK

update TestXML set XMLStuff = '<BetterXML/>' where  PK = 3;
-- works OK

select * from TestXML where PK = 4;
-- works OK

update TestXML set XMLStuff = '<BetterXML/>' where  PK = 4;
-- works OK

drop table TestXML;
VolkerBarth
Contributor

12.0.1.4085 shows the same success.

Also trying to turn a valid XML into an invalid one doesn't raise an error, such as:

update TestXML set XMLStuff =
  replace(XMLStuff, char(47), char(26)) where  PK = 2;
-- works OK
Former Member

SQL Anywhere only validates XML when parsing it (for example, using OPENXML). It is not validated when inserting or updating a value, or casting to the XML type. I wonder if the original database has a trigger, computed column, or check constraint that ends up parsing the XML.

Former Member

You guys are right. Having looked more carefully at the table, I've notice the presence of a trigger that indeed does some parsing on update (disabling the trigger allowed the updates of fixing the xml just fine). Can't believe that didn't cross my mind! So there you go, case closed. Thank you!

Answers (1)

Answers (1)

justin_willey
Participant

I think you need to convert the xml data to something that SQLA doesn't try to interpret.

eg (untested)

create variable mytext long varchar;
set mytext = (SELECT (BLA_COLUMN) FROM BLA_TABLE where ID_RECORD = 1234);
set mytext= REPLACE(mytext, char(26), char(32));
update BLA_TABLE set BLA_COLUMN = mytext where ID_RECORD = 1234;
commit;

If you have to do this a lot you could write a user defined function. Also if you have characters which have problems because of collation tables eg

update person set notes=replace(notes,'ú','£')
and every u in the field will also be changed as the collation sequence helpfully reckons that 'ú' is the same as 'u', you can use this approach:

create function BinaryReplace(in x long varchar,in targetascii smallint,
in replacementascii smallint) returns long varchar deterministic begin
  declare rv long varchar;
  declare l integer;
  declare i integer;
  declare c char(1);
  set l=length(x);
  set i=0;
  set rv='';
  while i < l loop
    set i=i+1;
    set c=substr(x,i,1);
    if ascii(c) = targetascii then
      set c=char(replacementascii)
    end if;
    set rv=rv+c
  end loop;
  return rv
end;

not fast, but quicker than typing!

Former Member
0 Kudos

Weird! I can't even run a statement like: UPDATE BLA_TABLE set BLA_COLUMN = '<bla/>' where ID_RECORD=1234; without the XML parser complaining as above

justin_willey
Participant
0 Kudos

Is Breck's method allowed to work without the parser complaining?