cancel
Showing results for 
Search instead for 
Did you mean: 

How to insert utf-8 string into nvarchar from php?

3,535

I'm trying to insert the multibyte string 'test 人造景' into an NVARCHAR field from PHP.

The platform is I'm running SA 17.0.9.4935 and php 5.5.9.

I have created the following test code to illustrate the issue:

$cstring = "UID=DBA;PWD=sql;Server=myserver;DBN=mydb;ASTART=No;host=localhost;CS=UTF-8";
$conn = sasql_connect( $cstring );

$sql = "CREATE OR REPLACE TABLE UNICODE_TEST(ID INTEGER DEFAULT AUTOINCREMENT, NFIELD LONG NVARCHAR)"; $pstmt = sasql_prepare( $conn, $sql ); sasql_stmt_execute( $pstmt );

$sql = "INSERT INTO UNICODE_TEST(NFIELD) VALUES (?)"; $s = "tis is a test 人造景 again and again"; $pstmt = sasql_prepare( $conn, $sql ); sasql_stmt_bind_param_ex( $pstmt, 0, $s, "s", false ); sasql_stmt_execute( $pstmt );

$sql = "INSERT INTO UNICODE_TEST (NFIELD) VALUES (N'unitest1 / 人造景 again')"; $pstmt = sasql_prepare( $conn, $sql ); sasql_stmt_execute( $pstmt );

$sql = "INSERT INTO UNICODE_TEST (NFIELD) VALUES (UNISTR(N'unitest1 / \\u4eba\\u9020\\u666f'))"; $pstmt = sasql_prepare( $conn, $sql ); sasql_stmt_execute( $pstmt );

sasql_commit( $conn );

$sql = "SELECT * FROM UNICODE_TEST"; $rs = sasql_query($conn,$sql); while($obj = sasql_fetch_object($rs)){ var_dump($obj); } print "php default_charset: " . ini_get("default_charset")."\\n"; print "DONE!\\n";

If I run this on a database with the following settings, it works for all three cases:

Number,PropNum,PropName,PropDescription,Value
0,255,'CharSet','Character set used for CHAR data',UTF-8
0,256,'NcharCharSet','Character set used for NCHAR data',UTF-8
0,257,'MultiByteCharSet','Multi Byte Character Set ( on/off )',On
0,260,'NcharCollation','Name of collation used for NCHAR data',UCA
0,425,'HasNCHARLegacyCollationFix','Fixed NCHAR legacy collation implementation',On

and I get the expected output

$ php sasql_test06.php
object(stdClass)#1 (2) {
  ["ID"]=>
  int(1)
  ["NFIELD"]=>
  string(29) "tis is a test 人造景 again"
}
object(stdClass)#2 (2) {
  ["ID"]=>
  int(2)
  ["NFIELD"]=>
  string(26) "unitest1 / 人造景 again"
}
object(stdClass)#1 (2) {
  ["ID"]=>
  int(3)
  ["NFIELD"]=>
  string(20) "unitest1 / 人造景"
}
php default_charset: UTF-8
DONE!

However, if I run it on a database with the following settings, it fails with the first 2 inserts but works with the third (where there are no real unicode characters, just their escaped representations):

Number,PropNum,PropName,PropDescription,Value
0,255,'CharSet','Character set used for CHAR data',windows-1252
0,256,'NcharCharSet','Character set used for NCHAR data',UTF-8
0,257,'MultiByteCharSet','Multi Byte Character Set ( on/off )',Off
0,260,'NcharCollation','Name of collation used for NCHAR data',UCA
0,423,'HasNCHARLegacyCollationFix','Fixed NCHAR legacy collation implementation',On
I get the following output with incorrect multibyte characters
$ php sasql_test06.php
object(stdClass)#1 (2) {
  ["ID"]=>
  int(1)
  ["NFIELD"]=>
  string(23) "tis is a test  again"
}
object(stdClass)#2 (2) {
  ["ID"]=>
  int(2)
  ["NFIELD"]=>
  string(20) "unitest1 /  again"
}
object(stdClass)#1 (2) {
  ["ID"]=>
  int(3)
  ["NFIELD"]=>
  string(20) "unitest1 / 人造景"
}
php default_charset: UTF-8
DONE!
I've also tried it without the CS=UTF-8 connection parameter - different output but still incorrect.

Documentation (here) seems to say that if "Use a host variable to specify the Unicode character values" (which I am), this should work in the case where the default CharSet is windows-1252, given that the field I'm inserting into is defined as NVARCHAR which is UTF-8.

However, there is other documentation (here) that seems to indicate PHP uses the C API and "these are not available from the DBTools or C API interfaces".

I find the documentation confusing, but I fear that this may not supported in PHP!

It looks to me like the only way I can make this work is to unload my database into a new one that has both CHAR and NCHAR data set to UTF-8 (which will be very time-consuming in my case). Is that true? Or am I missing something here?

Terry

Accepted Solutions (0)

Answers (0)