cancel
Showing results for 
Search instead for 
Did you mean: 

Upper case umlauts not recognised

huber1
Participant
3,926

Hi

I found that upper case umlauts in a SELECT does not lead to a result (set). Following SELECT works as expected, i. e. results in some rows displayed.

SELECT name
FROM persons
WHERE name LIKE 'Müller%'

Changing only the lower case ü to an upper case Ü, and no rows are displayed any more. Any idea why this behaviour?

Regards, Robert

SQL Anywhere 17.0.4.2129 macOS 10.12.5 Java 1.8.0_131

Accepted Solutions (0)

Answers (3)

Answers (3)

johnsmirnios
Advisor
Advisor

The SQL Anywhere collation "UTF8" (not to be confused with the UTF8 character set which is used to encode the characters) is very similar to UTF8BIN and is a SQL Anywhere legacy collation. It can only do case equivalency on single-byte characters. To sort UTF-8 encoded characters correctly, you must use UCA.

If you don't need Unicode / UTF-8 then you might use a collation such as windows-1252 but that, of course, limits what characters you can store to just the 256 characters in windows-1252.

Breck_Carter
Participant
0 Kudos

Try waving a dead chicken over the keyboard... in this case, LCASE ( ... )

Alas, I can't reproduce this error on a new V17 database on Windows; all the result sets are the same:

CREATE TABLE persons ( name VARCHAR ( 10 ) );
INSERT persons VALUES ( 'Müller' );
INSERT persons VALUES ( 'MÜLLER' );
COMMIT;
SELECT @@VERSION, name FROM persons WHERE name LIKE 'Müller%'
SELECT @@VERSION, name FROM persons WHERE name LIKE 'MÜLLER%'
SELECT @@VERSION, name FROM persons WHERE name LIKE 'MÜller%'
SELECT @@VERSION, name FROM persons WHERE name LIKE LCASE ( 'MÜller%' );
SELECT @@VERSION, name FROM persons WHERE LCASE ( name ) LIKE LCASE ( 'MÜller%' );

@@VERSION,name
'17.0.4.2053','Müller'
'17.0.4.2053','MÜLLER'
johnsmirnios
Advisor
Advisor
0 Kudos

LCASE may also not work -- depending on the collation being used. UTF8BIN cannot do case conversion beyond the ASCII range.

huber1
Participant
0 Kudos

Hi Breck

With

SELECT @@VERSION, name FROM persons WHERE name LIKE 'Müller%'

I expected to get the 2 records with 'Müller' and 'MÜLLER' but only get 'Müller'.

VolkerBarth
Contributor
0 Kudos

I can't tell about that particular umlaut/case issue, however, you can use a fitting collation independent of your global database setting with the help of the builtin COMPARE function and the "collation tailoring" feature:

create table test (ch varchar(5));

insert into test values ('a');
insert into test values ('A');
insert into test values ('ä'); -- add accented char (umlaut)
insert into test values ('Á');

-- lists all four chars in a case-insensitive database
select * from test
where ch = 'A';

-- Test for equality based on UCA collation - lists just 'A'
select * from test
where compare(ch, 'A', 'UCA(locale=en;case=respect;accent=respect)') = 0;

-- Test for equality based on default 1252LATIN1 with case respect
-- - list 'A' and 'Ä' here (i.e. does ignore accents)
select * from test
where compare(ch, 'A', '1252LATIN1(case=respect)') = 0;
huber1
Participant
0 Kudos

Hi Volker

Thanks for your answer. I recognise I must be more specific. What I mean is that if I have names written as 'MÜLLER' as well as 'Müller' in the database, I do not get both of them with the WHERE clause like 'Müller'. If I use 'MÜLLER', I get all names written in capitals, but not the ones written as CamelCase. And vice à versa. The database settings are (N)CHAR case sensitivity: Ignore (see screenshot).

I tried with your suggestion (your example works of course), but I can't figure out how to apply it on my SELECT, giving all 'Müller' as result, however there are stored in the database.

But what bothers me is that although the database is set to case sensitivity ignore, an upper case Umlaut does not "translate" in a lower case Umlaut.

johnsmirnios
Advisor
Advisor
0 Kudos

Please answer the comments to your question. If you are using, for example, UTF8BIN collation rather than UCA, the behaviour is expected.

See http://dcx.sybase.com/index.html#sqla170/en/html/3bccdcfe6c5f10148f50801e84c733a2.html

VolkerBarth
Contributor
0 Kudos

John, still on sybase.com? 🙂


So you basically say, one needs to use the Unicode Collation Algorithm (UCA) to get proper case conversion for characters beyond the 7-bit English characters, like the umlauts, right?

huber1
Participant
0 Kudos

Hi John

I hopefully did answer it with the above?

Datatype correction of my answer: The datatype is VARCHAR(255).

Thanks and regards, Robert

johnsmirnios
Advisor
Advisor
0 Kudos

The "legacy" collations for SQL Anywhere can do case conversion for single-byte characters. That includes all characters for collations such as windows-1252. In UTF8, accented characters are multibyte characters and require ICU to do case conversion.

VolkerBarth
Contributor
0 Kudos

Ah yes, I should've known that, since I have never ever had problems converting umlauts with v5.5 and above using those legacy collations and (VAR)CHAR data types, and of course Germans use those a lot...:)

So my statement would only be true for UTF8.