SQL Anywhere: Problem with JSON data with German u...

VolkerBarth · ‎2025 Aug 28

While consuming JSON web service data, I've stumbled over certain problems with nvarchar data with German umlauts. In some cases, these umlauts (and the German sharp s) are decoded incorrectly.

IMHO, these characters are correctly coded in JSON with \u and 4 digits, say a "ü" as \u00fe. However, depending on the following character (usually a regular latin character, coded as one byte so without the \u prefix), the sequence seems to be treated correctly or as a totally different unicode point with a way higher number.

Here's a small sample with umlaut "ü", followed by a space, the text "ge" and the text "be". In the last case, the umlaut is wrongly decoded as codepoint 64702 (FCBE), which is "ARABIC LIGATURE FEH WITH JEEM INITIAL FORM".

This is run in a 17.0.11.7908 database with CharSet = windows-1252, collation = 1252LATIN1 resp. NCharSet = UTF-8 and nchar collation = UCA.

select '\u00fc' as umlaut, unistr('\u00fc ') as UmlautWithSpace, unistr('\u00fcge') as UmlautWithGe, unistr('\u00fcbe') as UmlautWithBe, unicode('\u00fc'), unicode(UmlautWithSpace), unicode(UmlautWithGe), unicode(UmlautWithBe)

returns

Is there a problem in the UTF-8 decoding? And does is there a workaround to fix this behaviour?

To clarify: I'm aware that the UNISTR function does both support \uXXXX and \uXXXXXX, so the value \u00fcbe can be treated as a escape sequence with six digits. However, I'm only using here UNISTR to show the effect, I'm generally relying on the sp_parse_json() function, and AFAIK the JSON standard does only allow Unicode escape sequences with 4 digits (\uXXXX).

johnsmirnios · ‎2025 Aug 29

I looked at our JSON parser and it does accept 6-character hex sequences. It should not be doing so and it's a quick fix.

However, I would also recommend that you consider using a CHAR collation based on UTF8 rather than windows-1252. If I recall correctly, sp_parse_json returns its string values as CHAR datatype; however, although our parser internally converts the bare text in strings from NCHAR to CHAR, it explicitly generates UTF8 encoding for escape sequences. The result can be strings that are in a mix of CHAR and UTF8 encoding. It's an old bug but changing our JSON support to use NCHAR everywhere had backward compatibility issues and some other complications that meant it never got implemented. Using a UTF8-based CHAR collation is an easy workaround. Unfortunately, it means a database unload/reload would be required.

By Category

Related Content

Activity Groups

Industry Groups

Influence and Feedback Groups

Interest Groups

Location Groups

Customer Only Groups

Forums

Related Resources

Products

Learning and Support

About

My SAP Profile

My SAP Profile

SQL Anywhere: Problem with JSON data with German umlauts, wrongly decoded based on following chars

Know the answer?

Need more details?