on 2010 Jan 19 8:24 PM
Using SA 11.0.1.2299, I want to strip some characters from a string - independent of their position.
An example: I want to get rid of several punction characters like hyphen, slash and the like in formatted numbers to get the "raw numbers" - say for telephone numbers.
Of course, I can do that with several calls of the REPLACE function like
DECLARE strNr varchar(30);
SET strNr = '+49-2345/67 89';
SELECT REPLACE(REPLACE(REPLACE(REPLACE(strNr, ' ', ''), '-', ''), '/', ''), '+', '');
But this is not very readable (though I could put it in a function).
Question: Are there better ways - e.g. by means of REGEXP_SUBSTR()?
Request clarification before answering.
I think Jon is on the right track. Depending on your input, there could be some performance gains by filtering out all bad characters which are similar as it is found in the string by using the replace() function. Doing so limits the number of times through the loop to the number of distinct characters in your character set (as opposed to the length of your string).
The other optimization that I would do is to strip off the initial characters from the input string that are known to be valid so that they are not looked at again. Note that the database server has optimizations so that long strings are not actually copied when such operations are performed and hence this operation is not as expensive as you might think.
Here is a revised version which also generalizes the set of valid characters that are desired:
CREATE FUNCTION FilterChars( in @sInput long varchar,
in @valid char(256) default '0-9' )
RETURNS long varchar
DETERMINISTIC
BEGIN
declare @pattern long varchar;
declare @iNotValid integer;
declare @sOutput long varchar;
set @pattern = '%[^' || @valid || ']%';
set @sOutput = '';
set @iNotValid = patindex( @pattern, @sInput );
while @iNotValid > 0 loop
// pick up everything to the left of the non numeric
set @sOutput = @sOutput || left( @sInput, @iNotValid - 1 );
// strip off the invalid character and any similar chars in the string
set @sInput = replace( substr( @sInput, @iNotValid+1 ),
substr( @sInput, @iNotValid, 1 ), '' );
// find the next invalid character
set @iNotValid = patindex( @pattern, @sInput );
end loop;
// return our string plus any remaining chars
return @sOutput || @sInput;
END;
Example usage:
select FilterChars( 'abc123def' ) // returns '123'
select FilterChars( 'abc123def', 'a-z' ) // returns 'abcdef'
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
I personally like the use of regex for string matching. But ASA is missing a full blown REGEXP_REPLACE function.
At least im not able to come up with a regular expression that would do it with REGEXP_SUBSTRING in one run.
But in a loop it is possbile to strip the string. I have prepared a iSQL script for copy an paste. I think it should be easy to build a Function out of it.
BEGIN
declare cSearch varchar(1024);
declare cPattern varchar(512);
declare cResult varchar(1024);
declare cFound varchar(1024);
declare nFound integer;
declare nOccurrence integer;
set cSearch = '+49-2345/67 89';
set cPattern = '(?<=(^|[^[:digit:]+]))[[:digit:]+]+';
set nOccurrence = 1;
set cFound = REGEXP_SUBSTR( cSearch, cPattern, 1, nOccurrence );
message 'Loop ', nOccurrence, ' ', cFound type info to client ;
while cFound is not null and length(cFound) > 0 loop
set cResult = string( cResult, cFound );
set nOccurrence = nOccurrence + 1;
set cFound = REGEXP_SUBSTR( cSearch, cPattern, 1, nOccurrence );
message 'in Loop ', nOccurrence, ' ', cFound type info to client ;
end loop;
message 'Final ', cResult, '<!' type info to client;
END
Output of this script is
Loop 1 +49
Loop 2 2345
Loop 3 67
Loop 4 89
Loop 5
Final +4923456789<!
Execution time: 0.016 seconds
I think as always it depends on the string and the pattern which solution is faster.
One remark you have to avoid that the normal pattern '[[:digit:]+]+'
to match to fast. So you have to add a assertion that only makes sure you are at the begin of the string '^'
or after a not matched character '[^[:digit:]+]'
.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Thanks for that nice proposal - it underlines my assumption that REGEXP_SUBSTR() is generally suited for this task but needs to be looped in order to replace/strip more than one occurrence of the chars to filter out. - For the quite simple task discussed here, Mark's suggestion seems easier to understand. But for different tasks (like formatting/checking e-mail addresses), the RegExp search seems a lot more flexible.
@TDuemesnil: please post a new "question" entitled "Product Suggestion: Please implement REGEXP_REPLACE". Include a business case for why you want it. FWIW here's a description of the Oracle version: http://download.oracle.com/docs/cd/B19306_01/server.102/b14200/functions130.htm#SQLRF06302
Hi, see this question!
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Is it only stripping out non-numerics from a number, if so then the solution we worked on here would probably work. Here is my version from that thread.
ALTER FUNCTION onlyNumber(in val long varchar)
RETURNS long varchar
DETERMINISTIC
BEGIN
declare sOut long varchar;
declare iCurPos integer;
declare iNonNum integer;
set sOut = '';
set iCurPos = 1;
set iNonNum = patindex('%[^0-9]%', val);
while iNonNum <> 0 loop
// pick up everything to the left of the non numeric
set sOut = sOut || substr(val, iCurPos, iNonNum - 1);
// check starting at next numeric
set iCurPos = iCurPos + iNonNum + patindex('%[0-9]%', substr(val, iNonNum + 1)) - 1;
set iNonNum = patindex('%[^0-9]%', substr(val, iCurPos));
end loop;
// return our string plus any remaining chars
return sOut || substr(val, iCurPos);
END;
This might change based upon whether we expected lots of non numerics and whether non numerics might show up grouped together or not which is what I designed for above. If we're just looking at something like SSN 999-99-9999 then the set iCurPos should just be set iCurPos = iCurPos + iNonNum;
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
User | Count |
---|---|
74 | |
30 | |
8 | |
8 | |
7 | |
6 | |
6 | |
4 | |
4 | |
4 |
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.