MSSqlServer 2005 - UCS-2 => UTF-8?
Options
Environment[indent]TS 7.2 - W2K3 - MS SqlServer2005[/indent]
Background[indent]
Background[indent]
- Hoping someone (Boris or someone else) can help shed some light along this path or me....
0
Comments
-
Microsoft being what it is your answers may differ somewhat.
- Formally speaking your encoding can be anything, UTF-8 including (if for
example you are using UTF-8 encoded XML with the MS SQL "XML" Data Type, etc)
- That said, it's 99% chances you'll have USC-2 or rather surrogate-neutral *version*
of USC-2. It means an ability to store (if not to work with) characters outside of so
called "Basic Multilingual Plane" or BMP.
Unless you plan to support Kanji or Korean these details should not bother you.
And if you do, you better use Oracle with UTF-8 instead of MS SQL!
- I'm not aware of any magical printing or Stream settings to minimize coding.
This is not to say it does not exist, I just do not know if it does. Within BMP
Encode's from_to() with proper error handling was always sufficient for me in the past0 -
I'm not sure about Kanji, but I do believe that the site is intended to support Korean......
Unless you plan to support Kanji or Korean these details should not bother you.
And if you do, you better use Oracle with UTF-8 instead of MS SQL!
I don't know how much sway I have over the DB decisions, but I'll attempt to find out.
Okay - but if I just retrieve the value from the DB query into a variable, and want to make sure it's output as UTF-8 - what would the arguments for from_to() be? Obviously (I hope) the 'to' would be "utf8" - but what do I use for 'from'?- I'm not aware of any magical printing or Stream settings to minimize coding.
This is not to say it does not exist, I just do not know if it does. Within BMP
Encode's from_to() with proper error handling was always sufficient for me in the past
Can I assume the 'from' is "ucs-2"? can I specifyundef
and have it do something magical internally?
This is part of what I find so confusing - not really knowing the attributes of the data already stored within a variable and not (as far as I know) having anyway to control that at the "source" when the db query is performed.0 -
To check encoding Modules included into your Perl Version get to Encode::Config.pm, see %ExtModule Hash. Look at the Code, not perldocI'm not sure about Kanji, but I do believe that the site is intended to support Korean...
I don't know how much sway I have over the DB decisions, but I'll attempt to find out.Okay - but if I just retrieve the value from the DB query into a variable, and want to make sure it's output as UTF-8 - what would the arguments for from_to() be? Obviously (I hope) the 'to' would be "utf8" - but what do I use for 'from'?
Can I assume the 'from' is "ucs-2"? can I specifyundef
and have it do something magical internally?
This is part of what I find so confusing - not really knowing the attributes of the data already stored within a variable and not (as far as I know) having anyway to control that at the "source" when the db query is performed.
I *believe* your "from" parameter should be something like "UCS-2LE". Just in case, treat it as case sensitive
Something like this:
eval { Encode::from_to($string, 'UCS-2LE', 'utf8', Encode::FB_CROAK); };
if ($@) {...}
Yes, utf8 is lower case, does not have hyphen, etc0 -
FYI - from what I understand - the choice of using MSSQLServer instead of Oracle is because [one of] the consumer of the data in the database is one (or more) .Net application - and they prefer to work with MS...0
-
As of Friday afternoon, we got in touch with one of the DBAs, and they believe they can create a function or stored procedure on the DB that we can then utilize in our queries to get back UTF-8 versions of the particular fields in question.
Not sure when they'll actually have this done - but I think it's probably a more optimal (and overall, faster) method of getting what is needed...0 -
Okay - here's the latest on this.
First, some general info:[indent]\>iwperl -MDBD:DBC -le "DBI->instal
[/indent]The DBA did put a function on the SQL Server.
led_versions;"
Perl : 5.008002 (MSWin32-x86-multi)
OS : MSWin32 (4.0)
DBI : 1.38
DBD:ponge : 11.09
DBD:roxy : 0.2004
DBD:DBC : 1.06
DBD::ExampleP : 11.10
If I run a query via "toad" on that server, like:[indent]select id, dbo.fn_****(title, 65001) from myTable;
[/indent](65001 is the UTF-8 "code page" for microsoft)
then I get the results, though initially the column for title just shows "(Excluded)
" but if I click on it, it says it will re-execute the query, and then the actual UTF-8 results show up encoded (e.g.:0xE79BB4E98AB7E5...
) with a drop-down that lets me look at it in: hex, ascii, utf-16, and utf-8.
So, the function is there, and it appears to work - great.
Now I try to use it within my perl code:...
However, when I try to run this, I get the following error:[indent]
my $query = qq(select id, dbo.fn_****(title, 65001) from myTable;
my $st = $dbh->prepare($query); # both with, and without 2nd parameter: { odbc_execdirect => 1}
$st->execute();
...DBD:DBC::st execute failed: [Microsoft][ODBC SQL Server Driver][SQL Server]Can not find either column "dbo" or the user-defined function or aggregate "dbo.fn_****", or the name is ambiguous. (SQL-42000)
[/indent]If I leave off the 'dbo.' prefix (I was trying all sorts of things) - the error changes to:[indent]DBD:DBC::st execute failed: [Microsoft][ODBC SQL Server Driver][SQL Server]'fn_****' is not a recognized function name. (SQL-42000)
[/indent](the error code 'SQL-42000' is not useful - it's a generic error message)
Anyone here have experience calling database functions through Perl DBI / DBD:DBC ?
(I googled, but could not find anything that seemed to help)0 -
SIGH - It's amazing the things one can goof-on, especially when starting from someone else's code ;-)
- Once I got those two things out of the way, I now can perform the query, get the results (which look right in my debugging log), and put them into my desired output (generated DCR).This last part isn't quite working right now - but I'm pretty sure there's just a small glitch in my code to account for that - the big thing was getting the connection made to the correct database server and database ;-)
0 -
The last part was because I still had utf8 directives on my open statements for writing the new file:...
This last part isn't quite working right now - but I'm pretty sure there's just a small glitch in my code to account for that - the big thing was getting the connection made to the correct database server and database ;-)open(OUT, '>:utf8', $file) || die "...";
I had to remove that directive to get the output to look correct now that the input (from the DB) was already in UTF-8 format.0
Categories
- All Categories
- 111 Developer Announcements
- 49 Articles
- 103 General Questions
- 123 IM Services
- 40 OpenText Hackathon
- 31 Developer Tools
- 20.6K Analytics
- 4.1K AppWorks
- 8.9K Extended ECM
- 897 Cloud Fax and Notifications
- 77 Digital Asset Management
- 9.3K Documentum
- 29 eDOCS
- 123 Exstream
- 39.8K TeamSite
- 1.7K Web Experience Management
- 1 XM Fax
TeamSite Developer Resources
If you are interested in gaining full access to the content, you can register for a My Support account here.
- Docker Automation
- LiveSite Content Services (LSCS) REST API
- Single Page Application (SPA) Modules
- TeamSite Add-ons
If you are interested in gaining full access to the content, you can register for a My Support account here.