Hi Folks,
I'm just going through a process of writing a routine that will automatically replace any "known" windows 1252 characters with an equivalent HTML encoded character (as I have specified myself). I thought I had it nailed, until I parsed all of our existing HTML pages (thousands, spanning 10 years of development). Then I came across this weird phenomena where I have this character (it "seems" like it is an e acute, but I don't really know what it is!).
On our Sun box, it shows up in a putty terminal as (using cat):
CommuniquéIf I "vi" it, it shows up like this:
Communiqu\303\251In my windows text editor (textpad, file encoding is utf-8), it shows up like this:
CommuniquéAnd if I run it through a Perl script using Devel:

eek I get this information for "two" characters:
SV = PVIV(0x238efc) at 0x18ebab0
REFCNT = 2
FLAGS = (IOK,POK,pIOK,pPOK)
IV = 195
PV = 0x1909694 "195"\0
CUR = 3
LEN = 4
SV = PVIV(0x238f0c) at 0x18ebabc
REFCNT = 2
FLAGS = (IOK,POK,pIOK,pPOK)
IV = 169
PV = 0x190a7b4 "169"\0
CUR = 3
LEN = 4
Thats it! Two characters for what I thought was one windows-1252 e acute.
(Oh...and I wrote a basic C program to count the characters also, and it counts the last e acute as two characters also).
Now admitting that my character encoding knowledge is rudimentary, but I'm not understanding this at all. Is it possible to get one character represented by two characters? What am I missing?
Any pointers are gratefully appreciated.