when you open your html page in browser and change encoding to utf-8, does it still display garbage characters?
OK, so I am really confused here.We are translating some pages, HTML everywhere say UTF-8. These are all generated from templates, on Windows 2003 TS 6.7.1. When the 3rd party sent the DCRs, they had the funky characters (ü) rather than the encoded &# 1234;No big deal the page should support that.So I generate the page, I get garbage. Open the HTML in notepad, save it explicitly as UTF-8 and it works. The funny thing, according to iwpt_encoding.ipl UTF-8 is the only option. This is the same for regen, preview or workflow (iwpt_compile). No where do I override the default.Any ideas ?I have no freaking idea what is going on.Andy
The DCRs are all set to UTF-8The TPL generates code for UTF-8My browser is set to display UTF-8I get garbageIf I run my text through:use HTML::Entities;$newText = encode_entities($node->value('Text'));it works. I should not have to do that.Finally, if I open the generated page in notepad and save as, explicitly specifying UTF-8 it works fine.I changed my iwpt_compile to add -oenc UTF-8 that did not help.Obviously I can open/save every file as UTF-8. I should not need to do that.I am pulling my hair out.
The key is for the TPL to be saved with the proper encoding and a signature (also known as the byte order mark -- BOM). This byte order mark is required for most applications to properly interpret encoded characters (go to unicode.org for more info on the BOM). The BOM is added by Notepad if you save the TPL as UTF-8, or through Visual Studio is you save the file with signature.
UltraEdit gives you the BOM option, no need for Notepad The funny thing is they call it a byte order mark, but it simply identifies the data as UTF-8. UTF-8 actually does NOT have byte order issues like other encoding, so it's not an "actually" byte order encoding scheme...On a side note, be wary if you use multiple TPLs (includes) to generate a page. The BOM has to be the first character in order for the encoded data to be interpretted properly. If you add the BOM to includes, it will interpret the BOM as a character and give you unwanted spacing within your page... so only apply it to the "main" TPL.
Let me know if it worked... I'm pretty sure the implemenation I'm working on has more language requirements than most (closing in on 50 now... ugh!), and I've heard so many encoding nightmares from other developers that could have simply been solved using proper encoding rather than complex scripting/code...
Sweet. Glad to hear things worked out. I had 2 BOMs in a row at the beginning of my document, and it cause unwanted spacing.
Yea I was seeing that too. I took your advice and only save the primary TPL as UTF-8 and then also removed the 1st <iw_pt and it is working better, still have one page with a funky line from the BOM (I think) but cannot find it.