MSSqlServer 2005 - UCS-2 => UTF-8?

Environment[indent]TS 7.2 - W2K3 - MS SqlServer2005[/indent]
Background[indent]

Hoping someone (Boris or someone else) can help shed some light along this path or me....

Find more posts tagged with

Comments

Microsoft being what it is your answers may differ somewhat.

- Formally speaking your encoding can be anything, UTF-8 including (if for
example you are using UTF-8 encoded XML with the MS SQL "XML" Data Type, etc)

- That said, it's 99% chances you'll have USC-2 or rather surrogate-neutral *version*
of USC-2. It means an ability to store (if not to work with) characters outside of so
called "Basic Multilingual Plane" or BMP.

Unless you plan to support Kanji or Korean these details should not bother you.
And if you do, you better use Oracle with UTF-8 instead of MS SQL!

- I'm not aware of any magical printing or Stream settings to minimize coding.
This is not to say it does not exist, I just do not know if it does. Within BMP
Encode's from_to() with proper error handling was always sufficient for me in the past

Adam Stoller

...
Unless you plan to support Kanji or Korean these details should not bother you.
And if you do, you better use Oracle with UTF-8 instead of MS SQL!

I'm not sure about Kanji, but I do believe that the site is intended to support Korean...

I don't know how much sway I have over the DB decisions, but I'll attempt to find out.

- I'm not aware of any magical printing or Stream settings to minimize coding.
This is not to say it does not exist, I just do not know if it does. Within BMP
Encode's from_to() with proper error handling was always sufficient for me in the past

Okay - but if I just retrieve the value from the DB query into a variable, and want to make sure it's output as UTF-8 - what would the arguments for from_to() be? Obviously (I hope) the 'to' would be "utf8" - but what do I use for 'from'?
Can I assume the 'from' is "ucs-2"? can I specify

undef

and have it do something magical internally?

This is part of what I find so confusing - not really knowing the attributes of the data already stored within a variable and not (as far as I know) having anyway to control that at the "source" when the db query is performed.

ISCBorisB

I'm not sure about Kanji, but I do believe that the site is intended to support Korean...

I don't know how much sway I have over the DB decisions, but I'll attempt to find out.Okay - but if I just retrieve the value from the DB query into a variable, and want to make sure it's output as UTF-8 - what would the arguments for from_to() be? Obviously (I hope) the 'to' would be "utf8" - but what do I use for 'from'?
Can I assume the 'from' is "ucs-2"? can I specify
undef
and have it do something magical internally?

This is part of what I find so confusing - not really knowing the attributes of the data already stored within a variable and not (as far as I know) having anyway to control that at the "source" when the db query is performed.

To check encoding Modules included into your Perl Version get to Encode::Config.pm, see %ExtModule Hash. Look at the Code, not perldoc
I *believe* your "from" parameter should be something like "UCS-2LE". Just in case, treat it as case sensitive

Something like this:
eval { Encode::from_to($string, 'UCS-2LE', 'utf8', Encode::FB_CROAK); };
if ($@) {...}

Yes, utf8 is lower case, does not have hyphen, etc

Adam Stoller

FYI - from what I understand - the choice of using MSSQLServer instead of Oracle is because [one of] the consumer of the data in the database is one (or more) .Net application - and they prefer to work with MS...

ISCBorisB

FYI - from what I understand - the choice of using MSSQLServer instead of Oracle is because [one of] the consumer of the data in the database is one (or more) .Net application - and they prefer to work with MS...

Note mine and yours posting time above Smiley Happy

Adam Stoller

As of Friday afternoon, we got in touch with one of the DBAs, and they believe they can create a function or stored procedure on the DB that we can then utilize in our queries to get back UTF-8 versions of the particular fields in question.

Not sure when they'll actually have this done - but I think it's probably a more optimal (and overall, faster) method of getting what is needed...

Adam Stoller

Okay - here's the latest on this.
First, some general info:[indent]

\>iwperl -MDBD:DBC -le "DBI->instal
led_versions;"
  Perl            : 5.008002    (MSWin32-x86-multi)
  OS              : MSWin32     (4.0)
  DBI             : 1.38
  DBD:ponge     : 11.09
  DBD:roxy      : 0.2004
  DBD:DBC       : 1.06
  DBD::ExampleP   : 11.10

[/indent]The DBA did put a function on the SQL Server.
If I run a query via "toad" on that server, like:[indent]

select id, dbo.fn_****(title, 65001) from myTable;

[/indent](65001 is the UTF-8 "code page" for microsoft)
then I get the results, though initially the column for title just shows "

(Excluded)

" but if I click on it, it says it will re-execute the query, and then the actual UTF-8 results show up encoded (e.g.:

0xE79BB4E98AB7E5...

) with a drop-down that lets me look at it in: hex, ascii, utf-16, and utf-8.
So, the function is there, and it appears to work - great.

Now I try to use it within my perl code:

...
my $query = qq(select id, dbo.fn_****(title, 65001) from myTable;
my $st = $dbh->prepare($query);  # both with, and without 2nd parameter: { odbc_execdirect => 1}
$st->execute();
...

However, when I try to run this, I get the following error:[indent]

DBD: Smiley Surprised DBC::st execute failed: [Microsoft][ODBC SQL Server Driver][SQL Server]Can not find either column "dbo" or the user-defined function or aggregate "dbo.fn_****", or the name is ambiguous. (SQL-42000)

[/indent]If I leave off the 'dbo.' prefix (I was trying all sorts of things) - the error changes to:[indent]

DBD: Smiley Surprised DBC::st execute failed: [Microsoft][ODBC SQL Server Driver][SQL Server]'fn_****' is not a recognized function name. (SQL-42000)

[/indent](the error code 'SQL-42000' is not useful - it's a generic error message)

Anyone here have experience calling database functions through Perl DBI / DBD: Smiley Surprised

DBC ?

(I googled, but could not find anything that seemed to help)

Adam Stoller

SIGH - It's amazing the things one can goof-on, especially when starting from someone else's code ;-)

Once I got those two things out of the way, I now can perform the query, get the results (which look right in my debugging log), and put them into my desired output (generated DCR).This last part isn't quite working right now - but I'm pretty sure there's just a small glitch in my code to account for that - the big thing was getting the connection made to the correct database server and database ;-)

Adam Stoller

...
This last part isn't quite working right now - but I'm pretty sure there's just a small glitch in my code to account for that - the big thing was getting the connection made to the correct database server and database ;-)

The last part was because I still had utf8 directives on my open statements for writing the new file:

open(OUT, '>:utf8', $file) || die "...";

I had to remove that directive to get the output to look correct now that the input (from the DB) was already in UTF-8 format.