Discussions
Categories
Groups
Community Home
Categories
INTERNAL ENABLEMENT
POPULAR
THRUST SERVICES & TOOLS
CLOUD EDITIONS
Quick Links
MY LINKS
HELPFUL TIPS
Back to website
Home
Web CMS (TeamSite)
Converting UTF-8 characters to ISO-8859-1
CyrusTheVirus
Our web pages are encoded in UTF-8, but unfortunately the Search Engine Software that we use can't handle UTF-8. This means that special characters like ü show up like ü.
What I want to do now is substituting the characters in the presentation template via PERL (which is fairly new to me) so that the search engine can show them correctly.
It works fine with german umlauts: $text =~ s/Ä/Ä/gm;
But it doesn't work with special characters like „ “ ’ or ‚.
Any ideas how to do this?
Find more posts tagged with
Comments
Valentine
we have the same problem.
But we have modified the search engine, not the TeamSite.
Have a look at the Encode perl module.
use Encode qw/encode/;
$str = encode('iso-8859-1', $str);
We have a feature to launch a script right before the indexing and after the crawling, which converts pages from iso to utf-8.
CyrusTheVirus
Well, I solved the problem the other way around.
I didn't really change the output but I restricted the input with a regex. From now on only characters that you can type in with a regular keyboard are allowed. The only character that still causes troubles is the euro sign (€), but I can live with that.
The problem I was talking about only occurred when you copied some special characters from MS Word into an input field...