[Advanced-java] getting unicode characters to work with JSPs

Sheer El-Showk sheer at saraf.com
Wed May 21 15:32:27 2003


Hi,

I have an application that I'm trying to internationalize (more in the
character set sense than in the whole Locale sense).  I'm using JSDK 1.4
and Tomcat 3.3 as my servlet container.  I configured a windows 2k box to
use the Arabic character set and then enter data in arabic into a form.
The application stores the data to Postgres using JDBC.  I later try to
reteive the same data from the database and display it on the website but
it comes out looking very different.  I am trying to ascertain where there
is an inappropriate converstion being done.  I am using a very simple test
string of three arabic characters.  I log them as soon as they are
received by my application (from the HTTP request) and then when I put
them in the database, when I retreive them from the database, and finally,
before I print them back to the JSP.  Within my application the three
characters are displayed in the logs as <D3><E5><DE> which I presume is
UTF-8 encoded.  I know the database is not the problem because when I
retreive them from the database they look exactly the same as when I put
them in there (since I log both operations).

I am consdering several possible sources for my problem.

- The W2k character set is being interpreted strangely by Java when it
reads it in from IE.

- The characters are being mangled in the HTTP request because they are
not properly encoded.

- The characters are being mangled by the JSP print writer.

One concern I have is that when I open my log files in a Unicode-aware
editor (vim), the characters (which I see as <D3><E5><DE> in a non-unicode
aware program) don't appear as the valid arabic characters but rather as
the same random garbage that I see output in IE.  This suggests that my
Java application is never even receiving the apporpriate characters
because they are being mangled on input by the HTTP stream.

Can anyone shed any light on this problem.  Even if you don't know a full
solution, suggestions or clarifications regarding just a part of it would
help.

Thank you in advance,
Sheer El-Showk