As a middle-ware SAP PI / PO integrates SAP / non-SAP systems, which use different formats (text(XML, CSV...), binary) to represent data. Sometimes they even encode text in different formats OR use different code-pages. This document helps to understand and handle those situations.
Code-page is a table, assigning a number for each character. Example 'A' is 65, 'a' is 97 and 'b' is 98 and so on.
Click on image to expand. HTML form of below screenshots are attached (please rename .txt to .html). ASCII, ISO 8859-1, CP-1252 and Unicode.
'A' is 65. 65 = 10 0001 (64*1 32*0 16*0 8*0 4*0 2*0 1*1). Representing code-page number in 0's and 1's is encoding.
10 0001 is 65. Lookup 65 in code-page, it is 'A'. Looking up code-page number is decoding.
Some encodings are fixed length. Example ASCII, ISO 8859-1, cp1252, UTF-32 and ISO 8859-1 and cp1252 have to use 1 byte to represent code-page number. ASCII has to use 1 byte (it actually use only 7 bites, 1st bit is ignored). UTF-32 has to use 4 bytes.
Some encodings are variable length. Example UTF-8 and UTF-16. UTF-8 will start with 1 byte, if code-page number is too big to be represented in 1 byte, it can use 2 or 3 or 4 bytes. UTF-16 will start with 2 bytes, if needed it will use 4 bytes (i.e., 2 bytes or 4 bytes).
UTF-8: - UTF-8 is the preferred encoding on internet. HTML, XML, JSON ... are encoded in UTF-8 by default.
Understand UTF-8, BOM, endian. FYI..Characters, Symbols and the Unicode Miracle - Computerphile - YouTube, Characters in a computer - Unicode Tutorial UTF-8 - YouTube
Byte Order Mark (BOM):- It's a heads-up notice to target system about encoding. Some Microsoft Windows applications require BOM to properly decode UTF text. This is how BOM works. If we are sending UTF-8 encoded text, then we prefix that text stream with binary form of EF BB BF (hex). Then target system reads these characters and understands "This text stream starts with EF BB BF, then this text must be UTF-8 and I should use UTF-8 decode logic". It will not display EF BB BF. If we are sending UTF-16 Big-Endian, then we will prefix that text stream with FE FF (hex). Then target system reads these characters and understands "This text stream starts with FE FF, then this text must be UTF-16 BE".
If target program does not understand BOM heads-up notice, i.e., when it sees EF BB BF (hex) at starting of text stream and it is not programmed to understand it. It may interpret it as cp1252 characters . If you see any error or display starting with  OR þÿ OR ÿþ. It means that, target program is not decoding data properly.
Click on image to expand.
To test whether source, PI/PO and target system are using proper encoding or not. You can request source system to send Euro sign € in one of data elements. If target system does not decode € properly, then there is issue with code-page / encoding.
Why Euro sign € is displayed as €?
€ -> U+20AC (hex) -> 0010 0000 1010 1100 -> 11100010 10000010 10101100 -> E2 82 AC -> €
Please go through How to Work with Character Encodings in Process Integration.
Here are some points to note from above document.
When reading XML, SAP recommend to "File Type" as 'Binary'. As XML prolog has encoding details <?xml version="1.0" encoding="utf-8"?>. SAP note 821267.
You can use below adapter modules to change encoding.
MessageTransformationBean: Transfer.ContentType = text/xml;charset="cp1252"
TextCodepageConvertionBean: Conversion.charset = "utf-8"
XMLAnonymizerBean: anonymizer.encoding = "utf-8"
FYI. cp1252 is superset to ASCII and ISO 8859-1. UTF-8 is superset of cp1252, but number of bytes used may vary.
Lets handle issues mentioned section 5 and 6 in How to Work with Character Encodings in Process Integration.
1) Java mapping to change code-page/encoding. Supported Encodings.
package com.map;
import com.sap.aii.mapping.api.*;
import java.io.*;
public class ChangeEncoding_JavaMapping extends AbstractTransformation {
@Override
public void transform(TransformationInput transformationInput, TransformationOutput transformationOutput) throws StreamTransformationException {
try {
InputStream inputStream = transformationInput.getInputPayload().getInputStream();
OutputStream outputStream = transformationOutput.getOutputPayload().getOutputStream();
//Read input as cp1252 and write output as UTF-8.
byte[] b = new byte[inputStream.available()];
inputStream.read(b);
String inS = new String(b, "Cp1252");
outputStream.write(inS.getBytes("UTF-8"));
} catch (Exception ex) {
getTrace().addDebugMessage(ex.getMessage());
throw new StreamTransformationException(ex.toString());
}
}
}
Result: -
2) Java mapping to handle Quoted-Printable input.
package com.map;
import com.sap.aii.mapping.api.*;
import java.io.*;
public class QuotedPrintable_JavaMapping extends AbstractTransformation {
@Override
public void transform(TransformationInput transformationInput, TransformationOutput transformationOutput) throws StreamTransformationException {
try {
InputStream inputStream = transformationInput.getInputPayload().getInputStream();
OutputStream outputStream = transformationOutput.getOutputPayload().getOutputStream();
//Convert quoted-printable to unicode output. Add JAX-WS library when compiling.
inputStream = javax.mail.internet.MimeUtility.decode(inputStream, "quoted-printable");
//Copy Input content to Output content.
byte[] b = new byte[inputStream.available()];
inputStream.read(b);
outputStream.write(b);
} catch (Exception ex) {
getTrace().addDebugMessage(ex.getMessage());
throw new StreamTransformationException(ex.toString());
}
}
}
Result: -
3) Java mapping to handle Base64 input.
package com.map;
import com.sap.aii.mapping.api.*;
import java.io.*;
public class Base64_JavaMapping extends AbstractTransformation {
@Override
public void transform(TransformationInput transformationInput, TransformationOutput transformationOutput) throws StreamTransformationException {
try {
InputStream inputStream = transformationInput.getInputPayload().getInputStream();
OutputStream outputStream = transformationOutput.getOutputPayload().getOutputStream();
//Decode Base64 Input content to Output content. FYI. Java 8 has java.util.Base64.
byte[] b = new sun.misc.BASE64Decoder().decodeBuffer(inputStream);
//Above class is internal class. As an alternative you can use below line, whichever works for you.
//byte[] b = javax.xml.blind.DatatypeConverter().decodeBuffer(inputStream);
outputStream.write(b);
} catch (Exception ex) {
getTrace().addDebugMessage(ex.getMessage());
throw new StreamTransformationException(ex.toString());
}
}
}
Result: -
4) Java mapping to add BOM.
package com.map;
import com.sap.aii.mapping.api.*;
import java.io.*;
public class BOM_JavaMapping extends AbstractTransformation {
@Override
public void transform(TransformationInput transformationInput, TransformationOutput transformationOutput) throws StreamTransformationException {
try {
InputStream inputStream = transformationInput.getInputPayload().getInputStream();
OutputStream outputStream = transformationOutput.getOutputPayload().getOutputStream();
//Copy Input content to Output content.
byte[] b = new byte[inputStream.available()];
inputStream.read(b);
//Prefix BOM. For UTF-8 use "0xEF,0xBB,0xBF". For UTF-16BE use "0xFE,0xFF". For UTF-16LE use "0xFF,0xFE".
outputStream.write(0xEF); outputStream.write(0xBB); outputStream.write(0xBF);
outputStream.write(b);
} catch (Exception ex) {
getTrace().addDebugMessage(ex.getMessage());
throw new StreamTransformationException(ex.toString());
}
}
}
Result: - BOM characters will not be displayed.
5) Java mapping to handle XML Escape Sequence.
FYI...How to create Java mapping.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
User | Count |
---|---|
6 | |
4 | |
4 | |
4 | |
4 | |
3 | |
3 | |
3 | |
3 | |
3 |