MSSqlServer 2005 - UCS-2 => UTF-8?

Options
Environment[indent]TS 7.2 - W2K3 - MS SqlServer2005[/indent]
Background[indent]
    Hoping someone (Boris or someone else) can help shed some light along this path or me....

Comments

  • Microsoft being what it is your answers may differ somewhat.

    - Formally speaking your encoding can be anything, UTF-8 including (if for
    example you are using UTF-8 encoded XML with the MS SQL "XML" Data Type, etc)

    - That said, it's 99% chances you'll have USC-2 or rather surrogate-neutral *version*
    of USC-2. It means an ability to store (if not to work with) characters outside of so
    called "Basic Multilingual Plane" or BMP.

    Unless you plan to support Kanji or Korean these details should not bother you.
    And if you do, you better use Oracle with UTF-8 instead of MS SQL!

    - I'm not aware of any magical printing or Stream settings to minimize coding.
    This is not to say it does not exist, I just do not know if it does. Within BMP
    Encode's from_to() with proper error handling was always sufficient for me in the past
  • ...
    Unless you plan to support Kanji or Korean these details should not bother you.
    And if you do, you better use Oracle with UTF-8 instead of MS SQL!
    I'm not sure about Kanji, but I do believe that the site is intended to support Korean...

    I don't know how much sway I have over the DB decisions, but I'll attempt to find out.
    - I'm not aware of any magical printing or Stream settings to minimize coding.
    This is not to say it does not exist, I just do not know if it does. Within BMP
    Encode's from_to() with proper error handling was always sufficient for me in the past
    Okay - but if I just retrieve the value from the DB query into a variable, and want to make sure it's output as UTF-8 - what would the arguments for from_to() be? Obviously (I hope) the 'to' would be "utf8" - but what do I use for 'from'?
    Can I assume the 'from' is "ucs-2"? can I specify

    undef

    and have it do something magical internally?

    This is part of what I find so confusing - not really knowing the attributes of the data already stored within a variable and not (as far as I know) having anyway to control that at the "source" when the db query is performed.
  • I'm not sure about Kanji, but I do believe that the site is intended to support Korean...

    I don't know how much sway I have over the DB decisions, but I'll attempt to find out.Okay - but if I just retrieve the value from the DB query into a variable, and want to make sure it's output as UTF-8 - what would the arguments for from_to() be? Obviously (I hope) the 'to' would be "utf8" - but what do I use for 'from'?
    Can I assume the 'from' is "ucs-2"? can I specify

    undef

    and have it do something magical internally?

    This is part of what I find so confusing - not really knowing the attributes of the data already stored within a variable and not (as far as I know) having anyway to control that at the "source" when the db query is performed.
    To check encoding Modules included into your Perl Version get to Encode::Config.pm, see %ExtModule Hash. Look at the Code, not perldoc
    I *believe* your "from" parameter should be something like "UCS-2LE". Just in case, treat it as case sensitive

    Something like this:
    eval { Encode::from_to($string, 'UCS-2LE', 'utf8', Encode::FB_CROAK); };
    if ($@) {...}

    Yes, utf8 is lower case, does not have hyphen, etc
  • FYI - from what I understand - the choice of using MSSQLServer instead of Oracle is because [one of] the consumer of the data in the database is one (or more) .Net application - and they prefer to work with MS...
  • FYI - from what I understand - the choice of using MSSQLServer instead of Oracle is because [one of] the consumer of the data in the database is one (or more) .Net application - and they prefer to work with MS...
    Note mine and yours posting time above Smiley Happy
  • As of Friday afternoon, we got in touch with one of the DBAs, and they believe they can create a function or stored procedure on the DB that we can then utilize in our queries to get back UTF-8 versions of the particular fields in question.

    Not sure when they'll actually have this done - but I think it's probably a more optimal (and overall, faster) method of getting what is needed...
  • Okay - here's the latest on this.
    First, some general info:[indent]
    D:\>iwperl -MDBD:Smiley SurprisedDBC -le "DBI->instal
    led_versions;"
    Perl : 5.008002 (MSWin32-x86-multi)
    OS : MSWin32 (4.0)
    DBI : 1.38
    DBD:Smiley Frustratedponge : 11.09
    DBD:Smiley Tongueroxy : 0.2004
    DBD:Smiley SurprisedDBC : 1.06
    DBD::ExampleP : 11.10
    [/indent]The DBA did put a function on the SQL Server.
    If I run a query via "toad" on that server, like:[indent]

    select id, dbo.fn_****(title, 65001) from myTable;

    [/indent](65001 is the UTF-8 "code page" for microsoft)
    then I get the results, though initially the column for title just shows "

    (Excluded)

    " but if I click on it, it says it will re-execute the query, and then the actual UTF-8 results show up encoded (e.g.:

    0xE79BB4E98AB7E5...

    ) with a drop-down that lets me look at it in: hex, ascii, utf-16, and utf-8.
    So, the function is there, and it appears to work - great.

    Now I try to use it within my perl code:
    ...
    my $query = qq(select id, dbo.fn_****(title, 65001) from myTableSmiley Wink;
    my $st = $dbh->prepare($query); # both with, and without 2nd parameter: { odbc_execdirect => 1}
    $st->execute();
    ...
    However, when I try to run this, I get the following error:[indent]

    DBD:Smiley SurprisedDBC::st execute failed: [Microsoft][ODBC SQL Server Driver][SQL Server]Can not find either column "dbo" or the user-defined function or aggregate "dbo.fn_****", or the name is ambiguous. (SQL-42000)

    [/indent]If I leave off the 'dbo.' prefix (I was trying all sorts of things) - the error changes to:[indent]

    DBD:Smiley SurprisedDBC::st execute failed: [Microsoft][ODBC SQL Server Driver][SQL Server]'fn_****' is not a recognized function name. (SQL-42000)

    [/indent](the error code 'SQL-42000' is not useful - it's a generic error message)


    Anyone here have experience calling database functions through Perl DBI / DBD:Smiley SurprisedDBC ?

    (I googled, but could not find anything that seemed to help)
  • SIGH - It's amazing the things one can goof-on, especially when starting from someone else's code ;-)
      Once I got those two things out of the way, I now can perform the query, get the results (which look right in my debugging log), and put them into my desired output (generated DCR).This last part isn't quite working right now - but I'm pretty sure there's just a small glitch in my code to account for that - the big thing was getting the connection made to the correct database server and database ;-)
  • ...
    This last part isn't quite working right now - but I'm pretty sure there's just a small glitch in my code to account for that - the big thing was getting the connection made to the correct database server and database ;-)
    The last part was because I still had utf8 directives on my open statements for writing the new file:

    open(OUT, '>:utf8', $file) || die "...";

    I had to remove that directive to get the output to look correct now that the input (from the DB) was already in UTF-8 format.
TeamSite Developer Resources

  • Docker Automation

  • LiveSite Content Services (LSCS) REST API

  • Single Page Application (SPA) Modules

  • TeamSite Add-ons

If you are interested in gaining full access to the content, you can register for a My Support account here.
image
OpenText CE Products
TeamSite
APIs