on 2008 Jan 23 7:07 AM
hi,
I am using java mapping in my Interface mapping, in which there is a condition based on field length
like:
pad(Field1) with padding char say '_'
I am using fixed length file at the receiver end
In my source I have japanese characters and it looks like:
<DATA>
<Field1>あ</Field1>
<Field2>abc</Field2>
</DATA>
In the java mapping after parsing this, what I can see is(when i print the document element using trace parameter)
<DATA>
<Field1>あ</Field1>
<Field2>abc</Field2>
</DATA>
if I operate the string functions on the japanese character
eg. Field1.length() it will give me '1' as answer.
but as the Japanese chars use 2 bytes for storage the function-
'Field1.getBytes().length' shud give me '2' (i.e. no of bytes)
but this doesnt happen in the java code.
If i test the same thing in UDF to my surprise I can see that the 'Field1.getBytes().length' gives me '2' as the value!!
All i need is the java mapping shud also behave like UDF does, but this is not happening..
any kind of help will be welcome
Ranjit
Is the incoming XML document encoded in UTF-8?
The Java mapping operates on the incoming encoding, the graphical mapping transforms the incoming document to UTF-8 before starting the processing.
Regards
Stefan
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Hi Stefan,
yes my incoming message is UTF-8 encoded.
see, when I use the trace object to trace the InputStream parameter then what I can see is the xml message as interpreted by the browsers do to interpret special characters, this is fine.
<DATA>
<Field1>あ</Field1>
<Field2>abc</Field2>
</DATA>
will be seen in the trace as:
<DATA>
<Field1>あ</Field1>
<Field2>abc</Field2>
</DATA>
as you can see that value in Field1 is a Japanese characters 2 byte representation.
Now if I parse this xml in java and print as:
say Node.getValue().getBytes().length (where Node is 'Field1') then the answer should be 2 and not 1
as the japanese character is stored using 2 bytes
But unfortunately the answer I get is '1' and which is same as Node.getValue().length() (which normally gives the no of characters and not no of bytes.)
this seems to be little weird.
Ranjit
Could you send me the file before mapping to my email address:
Regards
Stefan
**********
Please read the Forum's Rules of Engagement,
i.e.,
[/thread/117188 [original link is broken];
and refrain from using email correspondence. The main objective of the Forums is to share knowledge.
SDN PI/XI Forum Moderator
Hi,
Please refer this link..
http://java.sun.com/developer/technicalArticles/Intl/Supplementary/
It will help you to how to solve this issue.
Regards
Aashish Sinha
PS : reward points if helpful
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Hi,
I guess this is happening coz charter encoding you are using.
Each XI system is based on Unicode and these messages are usually XML-based.
When connecting XI with other systems it might first be necessary to convert the XI
message into other codepages like Latin, Hebrew, Chinese, Japanese, Arabic and so on.
However messages can also be converted from XML to plain text format, or they can be
either zipped or encoded. It is therefore important to know how the transformation from
the Unicode to another codepage is to be done.
Each codepage has an identifier, for example ISO-8859-1 is West European Latin, and
GB18030 is a standard for Chinese characters. For each language there are several
codepages available which are not always compatible, therefore you must check which
codepage is used for each system.
UTF-8 and all ISO-8859 codepages are based on ASCII. Therefore they are compatible
when only printable characters from ASCII are used. Other characters like European ä,
ê, ñ, and letters from other alphabets are represented differently or are only available in
specific codepages. These characters cause errors in interpreting XML messages if the
wrong codepage is used.
There are two places where the codepage of a message has to be declared:
The codepage for a text message is taken from the HTTP header Content-Type
with the attribute charset. For example:
Content-Type = text/plain; charset=UTF-8
The codepage for an XML message is taken from the attribute encoding of the
XML declaration. For example:
<?xml version="1.0" encoding="UTF-8"?>
Hope this clear bit of cloud.
Regards
Aashish Sinha
PS : reward points if helpful.
User | Count |
---|---|
68 | |
10 | |
10 | |
7 | |
6 | |
6 | |
6 | |
5 | |
5 | |
4 |
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.