[Advanced-java] getting unicode characters to work with JSPs
Sheer El-Showk
sheer at saraf.com
Wed May 21 16:41:00 2003
I think I have resolved my issue (though I think because of caching
problems it took a little longer to realize that than I imagined). I
believe the problem was with the HTTP parameters (though I can't say for
sure) and the way to fix it is to include this meta tag in the
<head></head> element of the JSP:
<meta http-equiv="Content-type" content="text/html" charset="utf-8">
Incidently for anyone in a simliar situation, the other thing to make sure
you do is to create your postgres database to using unicode encoding.
Make sure you use createdb -E UNICODE.
Cheers,
Sheer
On Wed, 21 May 2003, Sheer El-Showk wrote:
> Hi,
>
> I have an application that I'm trying to internationalize (more in the
> character set sense than in the whole Locale sense). I'm using JSDK 1.4
> and Tomcat 3.3 as my servlet container. I configured a windows 2k box to
> use the Arabic character set and then enter data in arabic into a form.
> The application stores the data to Postgres using JDBC. I later try to
> reteive the same data from the database and display it on the website but
> it comes out looking very different. I am trying to ascertain where there
> is an inappropriate converstion being done. I am using a very simple test
> string of three arabic characters. I log them as soon as they are
> received by my application (from the HTTP request) and then when I put
> them in the database, when I retreive them from the database, and finally,
> before I print them back to the JSP. Within my application the three
> characters are displayed in the logs as <D3><E5><DE> which I presume is
> UTF-8 encoded. I know the database is not the problem because when I
> retreive them from the database they look exactly the same as when I put
> them in there (since I log both operations).
>
> I am consdering several possible sources for my problem.
>
> - The W2k character set is being interpreted strangely by Java when it
> reads it in from IE.
>
> - The characters are being mangled in the HTTP request because they are
> not properly encoded.
>
> - The characters are being mangled by the JSP print writer.
>
> One concern I have is that when I open my log files in a Unicode-aware
> editor (vim), the characters (which I see as <D3><E5><DE> in a non-unicode
> aware program) don't appear as the valid arabic characters but rather as
> the same random garbage that I see output in IE. This suggests that my
> Java application is never even receiving the apporpriate characters
> because they are being mangled on input by the HTTP stream.
>
> Can anyone shed any light on this problem. Even if you don't know a full
> solution, suggestions or clarifications regarding just a part of it would
> help.
>
> Thank you in advance,
> Sheer El-Showk
>
> _______________________________________________
> Advanced-java mailing list
> Advanced-java@lists.xcf.berkeley.edu
> http://lists.xcf.berkeley.edu/mailman/listinfo/advanced-java
>