Ticket Change Details
Not logged in

Changes to ticket 838bde7990

By kkinnell on 2008-12-04 03:00:21. See also: artifact content, and ticket history

    1. Change comment to "@rem issue demostration<br> del .\_fossil_<br> del .\test.fossil<br> rmdir /Q /S .\testdir<br> mkdir .\testdir<br> del manifest.uuid<br> del manifest<br> cd .\testdir<br> echo 345678>"1r&#345;&#345;&#345;&#345; R .test"<br> cd ..<br> fossil new test.fossil<br> fossil open test.fossil<br> fossil add .\testdir\\"1r&#345;&#345;&#345;&#345; R .test"<br> fossil ls<br> fossil commit -m "check in a file with accended char/s and spaces in its name" --nosign<br> fossil ui<br> ----<br> on win XP<br> I create file with some accending characters in its name, commit it to the fossil repository and save repository as a zip file. When I extract such file from fossil generated file by windows Explorer/Total Commander I get file with different name.<br> Extracting by unzip program from Info-ZIP web site yields correctly named file. In above example I create file with "r-caron" chars in its name(U+0159), after unzipping I get file with "DEGREE SIGN"(U+00B0) in it instead. Many other characters get twisted in similar way. I wonder, if there should be something marked in zip file header, as how to interpret stored file name. I tried to check differences with explorer generated zip files/fossil generated ones with help of:<br> unzip -Zv byZIP.zip >byZIP.lst<br> unzip -Zv byFossil.zip >byFossil.lst<br> so far to no avail. <hr><i>anonymous claiming to be kkinnell added on 2008-12-03 02:56:00:</i><br> If I'm reading this correctly, you stored a file containing utf8 or uc16 encoded characters in a repository, and then got the zip for the repository by downloading it from the server. Then, when you unzipped it with one windows program, the file was encoded incorrectly, but when you unzipped it with a different windows program, the encoding was correct. If that is what happened, then the problem is how the programs you are using for unzipping the files are interpreting them. The one that gives you the correct version is using the same encoding that you used when you created the file, the other one is using something else. <b>fossil</b> itself uses SQLite BLOBs to store its artifacts. The storage doesn't encode the data in any way, it treats it as binary data. <hr><i>anonymous added on 2008-12-03 04:53:53:</i><br> The content of file(s) is correct. What causes trouble is the name/path of file having accented characters.<br> If I modify the test and try to add file on path already having mix of accended/not accended characters and spaces, <br> it is even not listing such file in File List menu which I would consider a bug.<br> (I can save zip archive from fossil, view file content(diff) but will not see such file listed in File List menu)<br> Produced zip archive can be correctly extracted by unzip from Info-ZIP web site. Windows Explorer will show just badly transcoded file name on badly transcoded file path.<br> Used test path uses "r-caron" chars (U+0159),"s-caron"(U+0161) and "LATIN SMALL LETTER Y WITH ACUTE" (U+00FD) accented characters.<br> del .\_fossil_<br> del .\test.fossil<br> rmdir /Q /S .\\"testS&#353;&#353;&#353;SR&#345;&#345;&#345;R Y&#253;&#253;&#253;Y"<br> mkdir .\\"testS&#353;&#353;&#353;SR&#345;&#345;&#345;R Y&#253;&#253;&#253;Y"<br> del manifest.uuid<br> del manifest<br> cd .\\"testS&#353;&#353;&#353;SR&#345;&#345;&#345;R Y&#253;&#253;&#253;Y"<br> echo content of file>"1r&#345;&#345;&#345;&#345; R .test"<br> cd ..<br> fossil new test.fossil<br> fossil open test.fossil<br> fossil add .\\"testS&#353;&#353;&#353;SR&#345;&#345;&#345;R Y&#253;&#253;&#253;Y"\\"1r&#345;&#345;&#345;&#345; R .test"<br> fossil ls<br> fossil commit -m "check in a file with accended char(s) and spaces in its name on a path with accedended characters and spaces" --nosign<br> fossil ui<br> ---<BR> fossil ls<br> ADDED testS&#220;&#220;&#220;SR&#176;&#176;&#176;R Y&#345;&#345;&#345;Y/1r&#176;&#176;&#176;&#176; R .test<br> <hr><i>kkinnell added on 2008-12-03 16:20:32:</i><br> I have tested this in the Linux version, via copy and paste using the exact string you are using (testS&#353;&#353;&#353;SR&#345;&#345;&#345;R Y&#253;&#253;&#253;Y). I can confirm that the encoding used for display of the filename and contents are not the same as the encoding used for wiki, the wiki strings are "entified" as &amp;#ddd; decimals whereas display of the file path strings are dependent on the browser's default encoding. When I set my browser to use the default windows encoding (Western, or Latin-1 which is ISO-8859-1) I get behavior similar to that which you describe. I think it is possible you <code><b>add</b></code>ed the file that is not showing up in the file list, but did not <code><b>commit</b></code> commit the change afterward. This is the only way I have been able to replicate this behavior. Please verify that all of your encodings&#8212;system, browser and zip programs, are the same as your input system and see if you still get bad behavior. I can duplicate most of the problems you are having by mis-matching the encodings, but I am on a GNU/Linux system and I can't be <i>sure</i> there is not something peculiar to the windows version causing part of your problem."