on 2016 Feb 10 2:02 AM
AFAIK, in older versions, the XML data type was just a "built-in user-defined" data type based on LONG VARCHAR, i.e. it was listed in SYS.SYSUSERTYPE.
In current versions, it seems to be a distinct data type listed in SYS.SYSDOMAIN.
Questions:
Request clarification before answering.
Some good questions and a few possible challenges too Volker.
. . . The rest of this is just a few observations that may help. . . .
While there is a distinct XML Data Type, such columns ... storage-wise, described-as, and encoding-wise ... are very much the same as a Long Varchar; even being stored (and if required converted) to the charset encoding of the database. This much you seem very aware of.
And there-in lies one challenge, which charset encoding you should use for a database when the the standard itself indicates such a wide variety of valid, possible encodings
My first instinct would be similar to yours (with your suggestion of LONG NVARCHAR) of storing XML as UTF-8 ... but the fact that a full XML document can specify it's own encoding such conversion could be problematic if you do not control that encoding at the source. Parsing the content and normalizing the Encoding attribute to match the stored encoding may be required to make that truly robust.
So yes you could store you xml content as LONG NVARCHAR instead though the easier solution may be to initialize the database to be UTF-8 for both CHAR-type and NCHAR-types. {assuming all your content is UTF-8). There may even be a reason to go with long binary in some situations as well, but the good new may be that the OpenXML( ) operator can deal with any of those (if correctly managed).
-- Note none of this should be considered a recommendation without first adding in some -- practical considerations and application requirements into a proper analysis.
So that may only leave the question of "What you give up when you do not use the XML Data Type, explicitly? ... Again I'm not the expert. There may be some benefit to certain ADO.Net functions but I suspect the biggest advantage would be when using the various XML*( ) functions and not needing to cast other datatypes to XML. The need to cast and possibly needing to convert data encoding can be a bit challenging.
Do share your thoughts and experiences though.
HTH
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Many thanks, Nick, that answers my questions - at least for the moment:). And particular thanks to include an answer for the inherent question "What you give up when you do not use the XML Data Type, explicitly?". As I'm using the XML_% functions, I'll prefer to stick with the XML data type, too.
So the most important point seems to be aware of the endoding issue at all, methinks...
Parsing the content and normalizing the Encoding attribute to match the stored encoding may be required to make that truly robust.
FWIW, is there a builtin function that would help with this task? (AFAIK, DBISQL itself tries to determine a script's encoding based on several rules.)
I'm asking for the general case - in my current task, the encodings are known beforehand.
I hadn't found a single function, builtin or otherwise for the most critical part of this. The 'encoding="..."' attribute (if explicitly specified) exists only in the "<?xml ...?>" header and is not an xml parseable <tag>. So that only leaves string functions such as this query
` create or replace variable read_file long varchar;
set read_file = xp_read_file( 'F:\\\\...\\\\Samples\\\\UltraLite\\\\sample.xml');
select substring(
read_file, //from an xp_read_file() in my mockup
locate(read_file,'encoding="' )+10,
/*say*/ 6) // or another locate to find the position for the close '"'`
Of course where that becomes important is how, when and where you are accessing and how much charset conversion has already happened.
Good luck
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
I agree it is a bit awkward but then again I suspect much of the XML manipulation that goes on probably happens in applications and applicaton servers and not as much in the back-end store technology. And I suspect we are not very much different from other RDBMS in that regard.
But I do suspect much of the XML/SOAP/DISH/WDSL-type support happens inside of a CLR (external environment in our case) or WEB/Content-serving type environment so you might want to see what benefits that approach brings; 'outside of the SQL DB server context' pèr sé.
Maybe a little too suspicious (or skeptical) but maybe that opens up your investigations a bit wider; and hopefully a more complete answer can be had that way.
I'm pretty sure you will find lots of C# and VB.Net code examples (as well as Java+DOM/SAX examples, and C++ XML code samplets too) that can shed some light on many requirements.
User | Count |
---|---|
82 | |
29 | |
9 | |
8 | |
7 | |
7 | |
7 | |
6 | |
6 | |
5 |
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.