cancel
Showing results for 
Search instead for 
Did you mean: 

Special characters in java mapping

ranjit_deshmukh
Active Participant
0 Kudos
413

hi,

I am using java mapping in my Interface mapping, in which there is a condition based on field length

like:

pad(Field1) with padding char say '_'

I am using fixed length file at the receiver end

In my source I have japanese characters and it looks like:

<DATA>

<Field1>&#12354;</Field1>

<Field2>abc</Field2>

</DATA>

In the java mapping after parsing this, what I can see is(when i print the document element using trace parameter)

<DATA>

<Field1>&#x3042;</Field1>

<Field2>abc</Field2>

</DATA>

if I operate the string functions on the japanese character

eg. Field1.length() it will give me '1' as answer.

but as the Japanese chars use 2 bytes for storage the function-

'Field1.getBytes().length' shud give me '2' (i.e. no of bytes)

but this doesnt happen in the java code.

If i test the same thing in UDF to my surprise I can see that the 'Field1.getBytes().length' gives me '2' as the value!!

All i need is the java mapping shud also behave like UDF does, but this is not happening..

any kind of help will be welcome

Ranjit

Accepted Solutions (0)

Answers (2)

Answers (2)

stefan_grube
Active Contributor
0 Kudos

Is the incoming XML document encoded in UTF-8?

The Java mapping operates on the incoming encoding, the graphical mapping transforms the incoming document to UTF-8 before starting the processing.

Regards

Stefan

ranjit_deshmukh
Active Participant
0 Kudos

Hi Stefan,

yes my incoming message is UTF-8 encoded.

see, when I use the trace object to trace the InputStream parameter then what I can see is the xml message as interpreted by the browsers do to interpret special characters, this is fine.

<DATA>

<Field1>&#12354;</Field1>

<Field2>abc</Field2>

</DATA>

will be seen in the trace as:

<DATA>

<Field1>&#x3042;</Field1>

<Field2>abc</Field2>

</DATA>

as you can see that value in Field1 is a Japanese characters 2 byte representation.

Now if I parse this xml in java and print as:

say Node.getValue().getBytes().length (where Node is 'Field1') then the answer should be 2 and not 1

as the japanese character is stored using 2 bytes

But unfortunately the answer I get is '1' and which is same as Node.getValue().length() (which normally gives the no of characters and not no of bytes.)

this seems to be little weird.

Ranjit

stefan_grube
Active Contributor
0 Kudos

Could you send me the file before mapping to my email address:

Regards

Stefan

**********

Please read the Forum's Rules of Engagement,

i.e.,

[/thread/117188 [original link is broken];

and refrain from using email correspondence. The main objective of the Forums is to share knowledge.

SDN PI/XI Forum Moderator

ranjit_deshmukh
Active Participant
0 Kudos

Hi Stefan,

I have sent mail to you but no reply.

Ranjit

ranjit_deshmukh
Active Participant
0 Kudos

Thanks all for your help

I am using method getBytes(encoding).length by which my problem is solved

Ranjit

aashish_sinha
Active Contributor
0 Kudos

Hi,

Please refer this link..

http://java.sun.com/developer/technicalArticles/Intl/Supplementary/

It will help you to how to solve this issue.

Regards

Aashish Sinha

PS : reward points if helpful

ranjit_deshmukh
Active Participant
0 Kudos

Thanks for the link,

let me make it more clear:

for Field1=one japanese character occupying 2 bytes in XI

if the code Field1.getBytes().length

in UDF gives output as '2'

why in Java mapping after parsing gives output as '1'?

Ranjit

aashish_sinha
Active Contributor
0 Kudos

Hi,

I guess this is happening coz charter encoding you are using.

Each XI system is based on Unicode and these messages are usually XML-based.

When connecting XI with other systems it might first be necessary to convert the XI

message into other codepages like Latin, Hebrew, Chinese, Japanese, Arabic and so on.

However messages can also be converted from XML to plain text format, or they can be

either zipped or encoded. It is therefore important to know how the transformation from

the Unicode to another codepage is to be done.

Each codepage has an identifier, for example ISO-8859-1 is West European Latin, and

GB18030 is a standard for Chinese characters. For each language there are several

codepages available which are not always compatible, therefore you must check which

codepage is used for each system.

UTF-8 and all ISO-8859 codepages are based on ASCII. Therefore they are compatible

when only printable characters from ASCII are used. Other characters like European ä,

ê, ñ, š and letters from other alphabets are represented differently or are only available in

specific codepages. These characters cause errors in interpreting XML messages if the

wrong codepage is used.

There are two places where the codepage of a message has to be declared:

• The codepage for a text message is taken from the HTTP header Content-Type

with the attribute charset. For example:

Content-Type = text/plain; charset=”UTF-8”

• The codepage for an XML message is taken from the attribute “encoding” of the

XML declaration. For example:

<?xml version="1.0" encoding="UTF-8"?>

Hope this clear bit of cloud.

Regards

Aashish Sinha

PS : reward points if helpful.

ranjit_deshmukh
Active Participant
0 Kudos

I have seen this SAP document

that also doesnt help.

Basically I dont have a problem with my encoding.

I am able to see messages properly and everything is fine

only thing is UDF and Java mapping act differently than each other.

this is what is my whole concern.

Ranjit