View Ticket
Not logged in
Ticket UUID: cc6557cfc5763fa80bb04eecea7f713b0751efc4
Title: Unicode support
Status: Open Type: Feature_Request
Severity: Important Priority:
Subsystem: Resolution: Open
Last Modified: 2009-04-07 15:03:25
Version Found In: all
Description & Comments:
cann't use unicode character in many places. sqlite is a full unicode supported db system, please consider to add unicode support and make default charset of webpage is "utf-8".

drh added on 2008-11-14 14:11:41:
Please provide more detail.

You say fossil "can't use unicode character in many places" but you do not specify what places. Petr Struc got fossil working for Czech as part of ticket 30f7206b2. Are you saying that his changes missed some places and that addtional changes are needed? What exactly is not working.

Recognize that the main developers of fossil run on unix (Linux and Mac OSX) where everything is always UTF8 all the time. We never have to deal with goofy windows character sets and have no way discovering or testing for character set problems on windows. Everything just works for us. If you want us to work on this issue, you need to give us more hints.


kkinnell added on 2009-01-29 18:42:02:
I think the issue here and the wikiname issue in 0a0f00d434 probably relate.


anonymous added on 2009-02-01 08:53:37:
What he means is that the webpages are not served as utf8, so characters outside the us-ascii (what firefox tell me this page is encoded in) will not get through to fossil when sending post information with utf8 characters.

also, it seems these pages are served with this http header:

Content-type: text/html; charset=ANSI_X3.4-1968

setting charset to utf8 might help (or adding meta tag)


kkinnell added on 2009-04-07 15:03:25:
I have Firefox set to use UTF-8, and when I display the page info and the response headers, the encoding in use is UTF-8. This is what I get whether I'm looking at the online repository or any of my local repositories.

You can usually force a browser to use whatever encoding you want as long as the server will supply it, and in Fossil's case it will supply UTF-8.

I use UTF-8 code points in comments all the time—including this comment. For me, it displays correctly. I'm sure for some it displays as — or perhaps a different number (or two in the worst case.)

The only place I've found where Fossil is at all "ill-mannered" about encodings is it's insistence on using "entified" versions of multi-byte characters in the titles of wiki pages. It's not even clear if that is a bug.

99.9% of the problems with encodings stem from a mismatch between the encodings two disparate systems are using. If the web server will serve multiple encodings, but your browser is allowing a default that isn't what's wanted, then the encoding will appear to be wrong.

The fast and loose way that some OS' and other software will switch encodings to make using them "friendly" doesn't help. The best bet is to pick some UC and set everything on your system to use it, ask for it, and complain when it isn't available. If you generally hit sites where Latin-1 would work, you should probably set everything to UTF-8.

Encodings are thorny. If something isn't displaying correctly, make sure you check every setting (and there are a LOT of places to set an encoding.) If you're using a non-unix or not-unix-like system, you will need to do a lot more checking.