cancel
Showing results for 
Search instead for 
Did you mean: 

Binary Replace / Byte Replace

justin_willey
Participant
2,382

Introduction

At the recent excellent Technical Summit held in Waterloo, I was challenged to make the case for a BinaryReplace() built-in function.

This would be a function that replaced a character string with another character string without reference to collation semantics etc (unlike the existing Replace() function - see this question for details of how Replace() interprets data. ByteReplace() might be a better name or it could perhaps be implemented as an extension to Replace().

What is is the problem?

Imagine you have a database containing data imported from another system, which uses a different character set or is from a different locale. You find that a particular character is incorrectly represented - a frequent example in the UK is the Pound Sterling sign - £. This often appears as ú. No problem you think - write a little routine that runs through all the affected tables and columns doing something like

update person set notes=replace(notes,'ú','£')

This is q£ite fast, b£t oh dear, yo£ soon find yo£ have q£ite a problem on yo£r hands. ALL yo£r £s (letter after t) have been changed to £s.

So what you (I've rolled back) have to do is write a letter by letter comparison user defined function (not very elegant example below) which now takes hours and hours to run. The same issue arises with trying to replace capital letters with lower case or vice versa (assuming a case insensitive database) and any operation involving accented characters that are treated alike by a collation.

There may be a very obvious solution I'm missing - possibly involving casting everything to binary beforehand (which can work for making case sensitive comparisons) - but it has so far evaded me. When I have had to deal with very large scale problems it's been faster to dump the data out, use a grep tool and read it all in again.

create function BinaryReplace(in x long varchar,in targetascii smallint,
in replacementascii smallint) returns long varchar deterministic begin
  declare rv long varchar;
  declare l integer;
  declare i integer;
  declare c char(1);
  set l=length(x);
  set i=0;
  set rv='';
  while i < l loop
    set i=i+1;
    set c=substr(x,i,1);
    if ascii(c) = targetascii then
      set c=char(replacementascii)
    end if;
    set rv=rv+c
  end loop;
  return rv
end;
VolkerBarth
Contributor

I'm quite sure the enhancement request will already take this into account:

In contrast to Justin's sample, the replacement should not work on a single character/byte but should work - like REPLACE() - with a "search-string" and a "replace-string", allowing both strings to have different length.

E.g. with German locale, it's often necessary to replace umlauts with their "extended" form (and vice versa, though that's more difficult as it's not a bijective function...), say like

select byte_replace('Müller', 'ü', 'ue');
VolkerBarth
Contributor
0 Kudos

Just another thought:

For replacing (in contrast to finding), I think it is usually helpful to distinguish case and/or accent, even in a case-insensitive database. In the sample above, I would certainly not want to replace upper-case letters with lower-case and vice versa, even with character semantics. So instead of a separate "binary replace", it might be more comfortable to enhance REPLACE() with a "collation-name/collation-tailoring-option", just like COMPARE().

Accepted Solutions (1)

Accepted Solutions (1)

johnsmirnios
Participant

Your request was noted at the summit and added into our set of enhancement requests. I can't make promises but it would be nice if it made it into Zermatt. I also noticed that there is no binary version of LOCATE either.

justin_willey
Participant
0 Kudos

I hadn't realised I was that persuasive! LOCATE would be great too. Many thanks.

johnsmirnios
Participant

Did you think the conference was for you to learn cool things about SQLAnywhere? It's real purpose is so that we can find out what you need. 😉

VolkerBarth
Contributor

v17? Zermatt? Guys, what are you talking about? Where can us poor non-participants learn about those cool things?


OK, for Zermatt, I just have to make it a some hundred miles southward:)

MarkCulp
Participant
0 Kudos
VolkerBarth
Contributor
0 Kudos

...and the according changes/enhancements can be found here (under the "Improved support for BYTE strings" section):

Changes to SQL statements, functions, procedures, and data types

Answers (0)