Japanese will render correctly in UTF-8, but you have a few details to make certain.First, you list UTF-8 with no BOM - that is bad. You need a BOM, it is a 3 byte mark at the very beginning of a file. I use this in my TPL to generate the BOM:[html]my $UTF8_BOM = "\xEF\xBB\xBF";iwpt_output($UTF8_BOM)[/html]You can modifiy that for your needs.Second make certain that your browser has the full Japanese character set. From my experience Mozilla 2.0 does as well as IE7, while IE 6 and Safari don't - that being from the base install. Find a Japanese site and see if it renders correctly.HTHAndy
Thx Andy,I did tried that also(from your earlier post), but the encoding of the XML file still remains 'UTF-8 without BOM'.This shows up in my Notepad++ editor.Is there any way to force the encoding of the actual physical XML file to UTF-8?
All that BOM/NoBOM rigmarole really depends on the requirements of your generated XML *consumer*, for the lack of better term. For what it's worth, from my personal experience it is almost always sufficient to generate UTF-8 XML without BOM Marker (that's right, no BOM!) but with the XML Encoding Header. Like this:[html]...[/html]Note that it's a bad idea to lie to XML Parser in this manner. If you declared XML File as UTF-8 encoded it better BE UTF-8 encoded,eg contain only valid UTF-8 (multi-byte) characters.Note also that two flavors of UTF-16, LE and BE *seem* to represent totally different story. There correct BOM (one of the two) is almost always required
The *consumer* here is cold fusion(.cfm) file which parses the XML.Somehow it is still treating the XML as 'UTF-8 without BOM' even if I declare that XML decl on first line.Not sure how to make that perl script(IPL) to force the encoding to XML.
Look at the output in a binary editor, it should start something like this:FF FA 0A 00 3CWhich looks like thisÿþ
gosh...that makes it completely unparseble by the CFM....surprisingly if I view just the XML in browser it renders properly but not when it gets parsed by the CFM.still wondering....
You need to find out from whoever sells coldfusion. That is a standard way of putting out UTF-8 (to include the BOM). If they cannot process the BOM then you may have to encode the data before you put it in XML.
yeah...I am researching on that....but going back to the original issue is there any way where the perl script(.ipl) generate the XML in plain vanilla UTF-8 instead of 'UTF-8 without BOM'?
It's a pretty safe bet that your problem has nothing to do to the presence or absence of BOM Marker, it's optional.Attach your XML (do not copy/paste to the post, attach). Chances are, you have some invalid (non UTF-8) symbols there.
plz find the XML attached...saving it as .txt....plz note the encoding as 'UTF-8 without BOM'.
Your UTF-8 Encoding seems to be correct! I've attached a copy of your File with the UTF-8 BOM Marker.Copy it AS IS and try, see if that changes anything. Unless you are absolutely sure how exactlyyour Editors treat Unicode and BOM, do not edit the File.
that didn't work either...as I mentioned earlier in the post, CFM is not even able to parse this new xml now...also now I see the encoding in file provided by you as 'UTF-8" and not 'UTF-8 without BOM'....seems more fishy on the coldfusion side to me...
Ok. Let me repeat what I've already said - one last time. Your problem has nothing whatever to do to the presence or absence of BOM Marker!Something fails somewhere in you Application. Why do you think it's encoding related? Sure, if your software expects (for example)UTF-16BE or UTF-32LE or Windows-1252 or whatever... You can feed it flawless UTF-8, it'll still fail. For all we know, your problemmay be even not related to encoding.
Once I save the same XML in editor as only 'UTF-8' and deploy it ,everything works fine.
Make a copy of the File "before". Save it as 'UTF-8', whatever Editor you are using. Compare Files "before"/"after", do it in hexadecimal mode if needed. What's the difference?