Discussions
Categories
Groups
Community Home
Categories
INTERNAL ENABLEMENT
POPULAR
THRUST SERVICES & TOOLS
CLOUD EDITIONS
Quick Links
MY LINKS
HELPFUL TIPS
Back to website
Home
Web CMS (TeamSite)
russian or polish content
meikel
strasdje,
anybody has experience with russian or polish content?
especially the special characters and the iso-codes.
regards
meikel
Find more posts tagged with
Comments
Renata
look up specs for the UTF-8 encoding, make sure this is used for dcr and tpl. if you're using MetaTagger you may have to enforce encoding as it only deals with Latin1 char set.
gzevin
I tried to type in russian into a DCR and did not have any issues.. What was your problem?
Greg Zevin
Independent Interwoven Consultant/Architect
Sydney, AU
Migrateduser
Hi, here is an extract of a document i wrote:
Pls note that on the XML and HTML-example lines the first 'less than sign' has been substituted on purpose with a 'greater than sign' to be able to show the code in the browser.....
The process when generating a file from TeamSite Templating is:
· Enter data in the Data Capture Template (DCT)
· Save the data as a Data Content Record (DCR) to TeamSite’s Backing Store
· Preview/Generate the resulting file by choosing which Presentation Template (TPL) that should be used.
These steps can all affect the encoding of their respective outcome.
1) Data Capture Template
You define which character encoding your Data Capture Template will use by typing it in the first line in datacapture.cfg:
>?xml version="1.0" encoding="UTF-8" standalone="no"?>
(The encoding clause could be ISO-8859-1, ISO-8859-2, Big5 or GB2312 for example, if you choose not to go with UTF-8)
This gives you the possibility to enter data and define labels and descriptions in UTF-8 format, which means you could type in (given your keyboard supports it…) polish characters, ideographs in traditional or simplified Chinese, or even a mix of all of these in the same DCT.
Of course, the editor you use must be able to save the datacapture.cfg file in UTF-8 format, otherwise all your UTF-8 labels and descriptions will be converted to something else at save time.
When you save the data you have entered in the DCT, it is stored in UTF-8 format in the Backing Store as a Data Content Record. This is ALWAYS true, it’s not dependant on what you type in the first line in datacapture.cfg. So, if you were coding in Big5, your DCR would be converted into UTF-8 internally when you save it. It will later be converted back into something of your choice when you reach the final step in the generation process.
2) Presentation Template
You define which character encoding your Presentation Template will use by typing it in the first line in xxxx.tpl:
>?xml version="1.0" encoding="UTF-8"?>
(It could be something else than UTF-8 if you choose not to go with UTF-8, for example CP950 for Big5 or CP936 for GB2312, check the TST manual for more details)
If your Presentation Template will output HTML (for example when generating .html, .jsp or .asp files) you will need to help the browser by telling it which encoding the page uses by declaring the meta-tag in the generated HTML as:
>meta http-equiv="Content-Type" CONTENT="text/html; charset=UTF-8">
(charset can be ISO-8859-1, ISO-8859-2, Big5 or GB2312 for example if you choose not to go with UTF-8)
If your Presentation Template is generating a file which will be included in some other file, for example a .inc file, you have to make sure that the file has the right start tag for inclusion in JSP or in ASP. If your .inc file is to be included in a JSP file, your first line of code in the generated file should look like this: >%@ page contentType="text/html;charset=UTF-8" %>
If your generated file will be included in an ASP file it should look like:
>%@ LANGUAGE="VBSCRIPT" CODEPAGE = 65001 %>
(Codepage 65001 means UTF-8, 950 means Big5 and 936 means GB2312. Check MS Documentation for info on Codepages)
You can of course enter text in your Presentation Template, just remember to save your file as UTF-8 if you have entered UTF-8 encoded characters, otherwise they will be converted to something else at save time.
When creating Presentation Templates you often reuse code from a previous ‘Pre-TeamSite’-version of the page, or some other page that has something you want to reuse or repurpose.
Therefore, you should always check the code for references to charsets or codepages, so you don’t end up with generated code that is inconsistent in its use of encodings.
3) The page generation
The final step in producing the generated file is the internal generation process in TeamSite Templating. This occurs when pressing the ‘Generate’ or ‘ Preview’ buttons in the Templating window, or when using commands like ‘iwgen’, ‘iwregen’ or ‘iwpt_compile’ from the command line.
The generation process can in various ways be told which encoding to use when producing the output. The command line tools take parameters that specify which encoding should be produced, if no parameter is specified the specifications in the file iwpt_encoding.ipl is used. When starting the generation from the Templating window, the specifications in the file iwpt_encoding.ipl are used.
This file’s editable section mainly consists of an if-statement, where you define with regex-expressions written in Perl which encoding should be used for which part of a site when generating output from templates. An example of an if-statement from iwpt_encoding.ipl looks like:
if (!defined $output_file) { return 'ISO-8859-1'; }
elsif ($output_file =~ /branch1\\corporate\\www_mycompany_se\\WORKAREA\\admin/) { return 'ISO-8859-1; }
elsif ($output_file =~ /branch1\\corporate\\wwwi_mycompany_pl/) { return 'UTF-8'; }
elsif ($output_file =~ /branch1\\corporate\\www_mycompany_pl/) { return 'UTF-8'; }
elsif ($output_file =~ /branch2\\test_www_mycompany_cn/) { return 'UTF-8'; }
#elsif ($output_file =~ /branch2\\test_www_mycompany_cn/) { return 'CP950'; }
#elsif ($output_file =~ /branch2\\test_www_mycompany_cn/) { return 'CP936'; }
else { return 'ISO-8859-1'; }
The easiest way is of course to code everything, always, in UTF-8… then the only thing needed instead of the above if-statement is: return ‘UTF-8’;
Hakan