cancel
Showing results for 
Search instead for 
Did you mean: 

How can I generate a sequence of 4 random digits/upper letters? - Problem with rand()

VolkerBarth
Contributor
8,943

[This is a follow-up form the question "Find random free alphanumeric value" in the general newsgroup]:

If one wants to generate a sequence of 4 random digits/upper letters in SQL, how can this be done?

My intention is to use

  1. rand() to generate random values
  2. cast(truncnum(rand() * 36, 0) as int) to generate integer values in the range 0..35 that can be used as some kind of char index
  3. add an according constant to build ASCII values in the desired range and use CHAR() with that value.

However, as the range of the ASCII values for digits (48-57) and upper letters (65-90) is not adjacent, a simple usage of *CHAR(cast(truncnum(rand() * 36, 0) as int) + 48)* won't do.

Instead, based on the generated integer value, one would have to add 48 (for char indexes < 10) or 55. As a consequence, it's necessary to call CHAR() with an if expression as argument. In order to evaluate the expression, it must be used twice, i.e. for comparison and then to add the according constant.

Therefore, I have used a derived table to build the rand() values, as in the sample below:

select
   rndInd1, rndInd1 + if rndInd1 < 10 then 48 else 55 end if as rndCharInd1, char(rndCharInd1) as chr1,
   rndInd2, rndInd2 + if rndInd2 < 10 then 48 else 55 end if as rndCharInd2, char(rndCharInd2) as chr2,
   rndInd3, rndInd3 + if rndInd3 < 10 then 48 else 55 end if as rndCharInd3, char(rndCharInd3) as chr3,
   rndInd4, rndInd4 + if rndInd4 < 10 then 48 else 55 end if as rndCharInd4, char(rndCharInd4) as chr4
from (select
   cast(truncnum(rand() * 36, 0) as int) as rndInd1,
   cast(truncnum(rand() * 36, 0) as int) as rndInd2,
   cast(truncnum(rand() * 36, 0) as int) as rndInd3,
   cast(truncnum(rand() * 36, 0) as int) as rndInd4) S

But my attempts to use an IF EXPRESSION for that seem to fail. The syntax seems o.k. but I have the impression that the RAND values (as used in the aliased columns) are re-evaluated for each usage of the alias, and as such, are not deterministic during the evaluation of the expression. That's wrong, IMHO.

An exemplary output is:

34,53,'>',1,67,'5',20,72,'L',19,67,':'

Obviously, the 2nd column is not based on the value 34 that is used in the 1st column (it should be 34 + 55 = 89), and so on. As a consequence, the columns chr1 to chr4 with the desired character may return characters that are - in contrast to the requirement - neither digits nor upper letters, e.g. a colon.


I can bring this down to the simple query:

select rndInd1, rndInd1 + 0
from (select
   cast(truncnum(rand() * 36, 0) as int) as rndInd1) S

IMHO, this should return a row with two identical values, with different results for each call.

But with SA 12.0.0.2566, it returns different columns, such as

16, 26 or 34, 17

Only when the second expression is simplified to a repetition of the same expression, then two identical values get returned:

select rndInd1, rndInd1
from (select
   cast(truncnum(rand() * 36, 0) as int) as rndInd1) S

Resume:

Basically this seems a wrong treatment of rand()-based expressions in derived tables.

VolkerBarth
Contributor
0 Kudos

Just to add: The same behaviour appears with SA 11.0.1.2427, whereas older versions (e.g. 8.0.3.5574) return correct results. Some kind of over-optimization in newer versions?

VolkerBarth
Contributor
0 Kudos

I just stumbled over that again with 12.0.1.3484 - the behaviour is still the same.

Is this behaviour of rand() by design? - Anymore insights are still appreciated:)

MCMartin
Participant
0 Kudos

It seems to be more by design of how the "as" keyword works: rndInd1 ist obviously just a placeholder for the casted value and so probably the select statement is rewritten by SQL Anywhere to:

select cast(truncnum(rand() * 36, 0) as int), cast(truncnum(rand() * 36, 0) as int)+0

For this statement the observed behavior is correct. The "as" defines just an alias not a variable.

VolkerBarth
Contributor
0 Kudos

If this is true, then

select rndInd1, rndInd1 from (select cast(truncnum(rand() * 36, 0) as int) as rndInd1) S

would be rewritten as well, and would give different results for both (same-named) columns as well - but it does not in that case. So I'm not sure if the aliasing is the source of the problem here.

Accepted Solutions (0)

Answers (2)

Answers (2)

Breck_Carter
Participant

Maybe you want RAND() to be deterministic, but you will be lonely 🙂

The RAND function is treated as a non-deterministic function. The query optimizer does not cache the results of the RAND function.

http://dcx.sybase.com/index.html#1200en/dbreference/rand-function.html

Once upon a time (I think) RAND was deterministic, but that bug got fixed.

FWIW here's the code I use...

--------------------------------------------------------------------- 
BEGIN
   DROP FUNCTION generate_random_string;
   EXCEPTION WHEN OTHERS THEN
END;

CREATE FUNCTION generate_random_string (
   IN @return_length INTEGER,
   IN @character_set VARCHAR ( 100 ) DEFAULT '' )
   RETURNS LONG BINARY
   NOT DETERMINISTIC
BEGIN
   DECLARE @characters_to_use   LONG BINARY;
   DECLARE @character_count     DOUBLE;
   DECLARE @random_number       DOUBLE;
   DECLARE @return_position     INTEGER;
   DECLARE @character_position  INTEGER;
   DECLARE @return_value        LONG BINARY;

CASE @character_set

WHEN 'caps' THEN
         SET @characters_to_use = STRING ( 
            'QWERTYUIOPASDFGHJKLZXCVBNM' );

WHEN 'letters' THEN
         SET @characters_to_use = STRING ( 
            'QWERTYUIOPASDFGHJKLZXCVBNM',
            'qwertyuiopasdfghjklzxcvbnm' );

WHEN 'numbers' THEN
         SET @characters_to_use = STRING ( 
            '1234567890' );

WHEN 'alphanumeric' THEN
         SET @characters_to_use = STRING ( 
            'QWERTYUIOPASDFGHJKLZXCVBNM',
            'qwertyuiopasdfghjklzxcvbnm',
            '1234567890' );

WHEN 'loweralphanumeric' THEN
         SET @characters_to_use = STRING ( 
            'qwertyuiopasdfghjklzxcvbnm',
            '1234567890' );

WHEN 'upperalphanumeric' THEN
         SET @characters_to_use = STRING ( 
            'QWERTYUIOPASDFGHJKLZXCVBNM',
            '1234567890' );

ELSE
         SET @characters_to_use = STRING ( -- omitted: single quote, back slash
            'QWERTYUIOPASDFGHJKLZXCVBNM',
            'qwertyuiopasdfghjklzxcvbnm',
            '1234567890',
            '`~-=!@#$%^&*()_+{}|[]:"<>?;,./' );

END CASE;

SET @character_count = LENGTH ( @characters_to_use );

SET @random_number = RAND();

SET @return_position = 1;
   WHILE @return_position <= @return_length LOOP
      SET @character_position = CAST ( ROUND ( @random_number * @character_count, 0 ) AS INTEGER );
      SET @return_value = STRING ( @return_value, SUBSTR ( @characters_to_use, @character_position, 1 ) );
      -- MESSAGE STRING ( @return_position, ', ', @random_number, ', ', @character_position, ', ', @return_value ) TO CLIENT;
      SET @random_number = RAND();
      SET @return_position = @return_position + 1;
   END LOOP;

RETURN @return_value;

END; -- generate_random_string

--------------------------------------------------------------------- 
-- Testing...

/*

BEGIN
   DECLARE @seed        INTEGER;
   DECLARE @dummy       DOUBLE;

-- Note: DATEDIFF returns INTEGER so the range of @seed is limited.
   --       However, this logic will work well past the year 2055 since
   --       a negative seed is OK.

-- Note: The extra "seeding" crap may no longer be necessary in V12.

SET @seed = DATEDIFF ( SECOND, CURRENT TIMESTAMP, '2055 10 27 00:00:00' );
   SET @dummy = RAND ( @seed );
   SET @dummy = RAND();
   SET @dummy = RAND();

SELECT CAST ( generate_random_string ( 5 ) AS VARCHAR )                       AS "1",
          CAST ( generate_random_string ( 5, 'caps' ) AS VARCHAR )               AS "2",
          CAST ( generate_random_string ( 16, 'alphanumeric' ) AS VARCHAR )      AS "3",
          CAST ( generate_random_string ( 15, 'loweralphanumeric' ) AS VARCHAR ) AS "4",
          CAST ( generate_random_string ( 30, 'upperalphanumeric' ) AS VARCHAR ) AS "5";

END;

1,2,3,4,5
'yk2{j','XGBCL','SKJhKeOmxhAhGJWF','dxw4oef9rkocygh','FDQGCWZ6RIZQMJE447Y7BP84N0QOK8'

*/
VolkerBarth
Contributor
0 Kudos

No, that's a misunderstanding. I do want RAND() to be not deterministic, of course. Everything else would be nonsense. But I don't think that an expression based on RAND() should be re-evaluated each time is it referenced. And I remember that Glenn has pointed out in the general newsgroup a while ago that this kind of re-evaluation is not desired. (Haven't got the actual news thread, however). Sidenote: As stated in my comment, v8 treats that kind of non-determinism correctly: Different calls of RAND() within the same query are not deterministic, computations based on their results are.

VolkerBarth
Contributor
0 Kudos

General newsgroup reference: "rand() vs order by" from 2010-09-04 - here there are other problems with RAND() but possibly the same reason applies.

Breck_Carter
Participant
0 Kudos

Too deep for me... I wouldn't intentionally code multiple calls to RAND() inside a query to save my life, I have absolutely no idea how the query optimizer rearranges things for execution. I leave that for you and Glenn to discuss 🙂

MCMartin
Participant

You could create a mapping table with two columns "index","char" then you could select based on rand() from the mapping table, without the need to do any "if" or "calculation"

VolkerBarth
Contributor
0 Kudos

Sure, and I could also write a stored proc that would select rand() into some kind of local temporary table and then use that for further processing. But the general (though not verbalized!) approach is to use just a query, i.e. not using any kind of DDL. Besides that, I'm mainly posting a bug report here, methinks:)