cancel
Showing results for 
Search instead for 
Did you mean: 

How to insert utf-8 string into nvarchar from php?

3,397

I'm trying to insert the multibyte string 'test 人造景' into an NVARCHAR field from PHP.

The platform is I'm running SA 17.0.9.4935 and php 5.5.9.

I have created the following test code to illustrate the issue:

$cstring = "UID=DBA;PWD=sql;Server=myserver;DBN=mydb;ASTART=No;host=localhost;CS=UTF-8";
$conn = sasql_connect( $cstring );

$sql = "CREATE OR REPLACE TABLE UNICODE_TEST(ID INTEGER DEFAULT AUTOINCREMENT, NFIELD LONG NVARCHAR)"; $pstmt = sasql_prepare( $conn, $sql ); sasql_stmt_execute( $pstmt );

$sql = "INSERT INTO UNICODE_TEST(NFIELD) VALUES (?)"; $s = "tis is a test 人造景 again and again"; $pstmt = sasql_prepare( $conn, $sql ); sasql_stmt_bind_param_ex( $pstmt, 0, $s, "s", false ); sasql_stmt_execute( $pstmt );

$sql = "INSERT INTO UNICODE_TEST (NFIELD) VALUES (N'unitest1 / 人造景 again')"; $pstmt = sasql_prepare( $conn, $sql ); sasql_stmt_execute( $pstmt );

$sql = "INSERT INTO UNICODE_TEST (NFIELD) VALUES (UNISTR(N'unitest1 / \\u4eba\\u9020\\u666f'))"; $pstmt = sasql_prepare( $conn, $sql ); sasql_stmt_execute( $pstmt );

sasql_commit( $conn );

$sql = "SELECT * FROM UNICODE_TEST"; $rs = sasql_query($conn,$sql); while($obj = sasql_fetch_object($rs)){ var_dump($obj); } print "php default_charset: " . ini_get("default_charset")."\\n"; print "DONE!\\n";

If I run this on a database with the following settings, it works for all three cases:

Number,PropNum,PropName,PropDescription,Value
0,255,'CharSet','Character set used for CHAR data',UTF-8
0,256,'NcharCharSet','Character set used for NCHAR data',UTF-8
0,257,'MultiByteCharSet','Multi Byte Character Set ( on/off )',On
0,260,'NcharCollation','Name of collation used for NCHAR data',UCA
0,425,'HasNCHARLegacyCollationFix','Fixed NCHAR legacy collation implementation',On

and I get the expected output

$ php sasql_test06.php
object(stdClass)#1 (2) {
  ["ID"]=>
  int(1)
  ["NFIELD"]=>
  string(29) "tis is a test 人造景 again"
}
object(stdClass)#2 (2) {
  ["ID"]=>
  int(2)
  ["NFIELD"]=>
  string(26) "unitest1 / 人造景 again"
}
object(stdClass)#1 (2) {
  ["ID"]=>
  int(3)
  ["NFIELD"]=>
  string(20) "unitest1 / 人造景"
}
php default_charset: UTF-8
DONE!

However, if I run it on a database with the following settings, it fails with the first 2 inserts but works with the third (where there are no real unicode characters, just their escaped representations):

Number,PropNum,PropName,PropDescription,Value
0,255,'CharSet','Character set used for CHAR data',windows-1252
0,256,'NcharCharSet','Character set used for NCHAR data',UTF-8
0,257,'MultiByteCharSet','Multi Byte Character Set ( on/off )',Off
0,260,'NcharCollation','Name of collation used for NCHAR data',UCA
0,423,'HasNCHARLegacyCollationFix','Fixed NCHAR legacy collation implementation',On
I get the following output with incorrect multibyte characters
$ php sasql_test06.php
object(stdClass)#1 (2) {
  ["ID"]=>
  int(1)
  ["NFIELD"]=>
  string(23) "tis is a test  again"
}
object(stdClass)#2 (2) {
  ["ID"]=>
  int(2)
  ["NFIELD"]=>
  string(20) "unitest1 /  again"
}
object(stdClass)#1 (2) {
  ["ID"]=>
  int(3)
  ["NFIELD"]=>
  string(20) "unitest1 / 人造景"
}
php default_charset: UTF-8
DONE!
I've also tried it without the CS=UTF-8 connection parameter - different output but still incorrect.

Documentation (here) seems to say that if "Use a host variable to specify the Unicode character values" (which I am), this should work in the case where the default CharSet is windows-1252, given that the field I'm inserting into is defined as NVARCHAR which is UTF-8.

However, there is other documentation (here) that seems to indicate PHP uses the C API and "these are not available from the DBTools or C API interfaces".

I find the documentation confusing, but I fear that this may not supported in PHP!

It looks to me like the only way I can make this work is to unload my database into a new one that has both CHAR and NCHAR data set to UTF-8 (which will be very time-consuming in my case). Is that true? Or am I missing something here?

Terry

VolkerBarth
Contributor
0 Kudos

I can't answer that, however, does this work when not using a host parameter but supplying the value as string literal within the SQL statement and using the UNISTR function to supply that value?

VolkerBarth
Contributor
0 Kudos

To add possibly further doc confusion w.r.t. to your second quote: v17 hat introduced NCHAR support for the SQL Anywhere C API, see here:

SQL Anywhere C API: Support added for national character types
Support for national character (NCHAR, NVARCHAR, LONG NVARCHAR, NTEXT) types has been added to version 4 of the SQL Anywhere C API. This functionality is available when _SACAPI_VERSION is defined as 4. The native_type field of the a_sqlany_column_info structure includes DT_NVARCHAR, DT_NFIXCHAR, DT_NSTRING, and DT_LONGNVARCHAR as possible values.

However, as stated, I neither do know whether this affects the conversion for PHP host parameters, nor do I know whether you can specify the API version from PHP...

0 Kudos

I just added 2 more INSERT examples to the script. The only way it seems to work (other than rebuilding the database) is to not use the actual unicode characters but replaced them with their escaped versions and then apply the UNISTR() function.

VolkerBarth
Contributor
0 Kudos

Well, the failing of the second sample (i.e. with NCHAR literal) is expected because string literals in SQL statements must be specified in the database charset, which is windows-1252 in your case and as such cannot recognize Chinese characters. That is also the reason why using UNISTR with the according code points does work. Note, that the code points nevertheless are the actual unicode characters.

0 Kudos

Agreed. And I should have included the following case, which also fails:

$sql = "INSERT INTO UNICODE_TEST (NFIELD) VALUES (UNISTR(N'unitest1 / 人造景'))";
$pstmt = sasql_prepare( $conn, $sql );
sasql_stmt_execute( $pstmt );

Accepted Solutions (0)

Answers (0)