- Change status to "Review"
- Appended to comment:
kkinnell added on 2008-12-09 15:58:07:
Summary:1. File paths within a repository use the bits supplied by the fs, but do not note the encoding. The browser's encoding determines display.
2. Wiki display entifies characters, and so the display of a file path from within a wikified string may be different than the display in the file browser.
3. Is this a wart, or is this correct behavior? (Mailing list discussion?)
- Change status to "Deferred"
- Change type to "Incident"
- Appended to comment:
anonymous claiming to be Ilia Frenkel added on 2008-12-09 09:49:26:
You can add the following line to the header (Setup->Header) between <head> and </head> tags: <meta http-equiv="Content-Type" content="text/html; charset=windows-1250"> it will tell your browser what encoding you want to be the default one.
- Appended to comment:
anonymous added on 2008-12-04 07:06:01:
the way I test it on win XP is that I run the above script (from del .\_fossil_ line till fossil ui line) in one go.
At that moment I get fossil web interface, where I see freshly created repo.
If I go there to the timeline menu, I see there one enty title "1 most recent events". If I check submenu "Checkins only"
I still see there only 1 item, title now "1 most recent checkins".
It says "Leaf" and text of my commit comment, as you czn check i above script.
Now if I click link on that link, I see page titled "Baseline ...."
There I have (besides other) "Commands:" line with "diff | ZIP archive | manifest | edit" options/ links.
If I use dill, I see "diffed" content of file (because I start afresh, it is just content of file with diff marks). In my case:
@@ -1,1 +1,1 @@
-
+345678
If I click ZIP link, I get Zipped archive of repository
Now If I go to "Files" top menu, I get page titled "File List" with heading "Files in the top-level directory"
and there I my commited dirctory (If I switch the codepage of browser to Central European Windows-1250, I see the directory name
correctly). If I click on it, .... Now I noticed something I did not notice before.
If I keep default character encoding on browser page (Western ISO 8859-1) and click on directory link, I get page stating:
"Files in directory testSA!A!A!SRA¸A¸A¸R YýýýY" and showing no file there.
If I change character encoding on browser page to Central European Windows-1250 (on Files in the top-level directory page)
and click on correcly named directory, I get page with heading "Files in directory testSšššSRoooR YýýýY" and I see there
link to my file (of course with name in western encoding " 1roooo R .test" and I can click on it to see its content)
So it seems to me, based on encoding of generated page, sometimes fossil can fail to show true content of directory.
- Change comment to "@rem issue demostration<br> del .\_fossil_<br> del .\test.fossil<br> rmdir /Q /S .\testdir<br> mkdir .\testdir<br> del manifest.uuid<br> del manifest<br> cd .\testdir<br> echo 345678>"1rřřřř R .test"<br> cd ..<br> fossil new test.fossil<br> fossil open test.fossil<br> fossil add .\testdir\\"1rřřřř R .test"<br> fossil ls<br> fossil commit -m "check in a file with accended char/s and spaces in its name" --nosign<br> fossil ui<br> ----<br> on win XP<br> I create file with some accending characters in its name, commit it to the fossil repository and save repository as a zip file. When I extract such file from fossil generated file by windows Explorer/Total Commander I get file with different name.<br> Extracting by unzip program from Info-ZIP web site yields correctly named file. In above example I create file with "r-caron" chars in its name(U+0159), after unzipping I get file with "DEGREE SIGN"(U+00B0) in it instead. Many other characters get twisted in similar way. I wonder, if there should be something marked in zip file header, as how to interpret stored file name. I tried to check differences with explorer generated zip files/fossil generated ones with help of:<br> unzip -Zv byZIP.zip >byZIP.lst<br> unzip -Zv byFossil.zip >byFossil.lst<br> so far to no avail. <hr><i>anonymous claiming to be kkinnell added on 2008-12-03 02:56:00:</i><br> If I'm reading this correctly, you stored a file containing utf8 or uc16 encoded characters in a repository, and then got the zip for the repository by downloading it from the server. Then, when you unzipped it with one windows program, the file was encoded incorrectly, but when you unzipped it with a different windows program, the encoding was correct. If that is what happened, then the problem is how the programs you are using for unzipping the files are interpreting them. The one that gives you the correct version is using the same encoding that you used when you created the file, the other one is using something else. <b>fossil</b> itself uses SQLite BLOBs to store its artifacts. The storage doesn't encode the data in any way, it treats it as binary data. <hr><i>anonymous added on 2008-12-03 04:53:53:</i><br> The content of file(s) is correct. What causes trouble is the name/path of file having accented characters.<br> If I modify the test and try to add file on path already having mix of accended/not accended characters and spaces, <br> it is even not listing such file in File List menu which I would consider a bug.<br> (I can save zip archive from fossil, view file content(diff) but will not see such file listed in File List menu)<br> Produced zip archive can be correctly extracted by unzip from Info-ZIP web site. Windows Explorer will show just badly transcoded file name on badly transcoded file path.<br> Used test path uses "r-caron" chars (U+0159),"s-caron"(U+0161) and "LATIN SMALL LETTER Y WITH ACUTE" (U+00FD) accented characters.<br> del .\_fossil_<br> del .\test.fossil<br> rmdir /Q /S .\\"testSšššSRřřřR YýýýY"<br> mkdir .\\"testSšššSRřřřR YýýýY"<br> del manifest.uuid<br> del manifest<br> cd .\\"testSšššSRřřřR YýýýY"<br> echo content of file>"1rřřřř R .test"<br> cd ..<br> fossil new test.fossil<br> fossil open test.fossil<br> fossil add .\\"testSšššSRřřřR YýýýY"\\"1rřřřř R .test"<br> fossil ls<br> fossil commit -m "check in a file with accended char(s) and spaces in its name on a path with accedended characters and spaces" --nosign<br> fossil ui<br> ---<BR> fossil ls<br> ADDED testSÜÜÜSR°°°R YřřřY/1r°°°° R .test<br> <hr><i>kkinnell added on 2008-12-03 16:20:32:</i><br> I have tested this in the Linux version, via copy and paste using the exact string you are using (testSšššSRřřřR YýýýY). I can confirm that the encoding used for display of the filename and contents are not the same as the encoding used for wiki, the wiki strings are "entified" as &#ddd; decimals whereas display of the file path strings are dependent on the browser's default encoding. When I set my browser to use the default windows encoding (Western, or Latin-1 which is ISO-8859-1) I get behavior similar to that which you describe. I think it is possible you <code><b>add</b></code>ed the file that is not showing up in the file list, but did not <code><b>commit</b></code> the change afterward. This is the only way I have been able to replicate this behavior. Please verify that all of your encodings—system, browser and zip programs, are the same as your input system and see if you still get bad behavior. I can duplicate most of the problems you are having by mis-matching the encodings, but I am on a GNU/Linux system and I can't be <i>sure</i> there is not something peculiar to the windows version causing part of your problem."
- Change comment to "@rem issue demostration<br> del .\_fossil_<br> del .\test.fossil<br> rmdir /Q /S .\testdir<br> mkdir .\testdir<br> del manifest.uuid<br> del manifest<br> cd .\testdir<br> echo 345678>"1rřřřř R .test"<br> cd ..<br> fossil new test.fossil<br> fossil open test.fossil<br> fossil add .\testdir\\"1rřřřř R .test"<br> fossil ls<br> fossil commit -m "check in a file with accended char/s and spaces in its name" --nosign<br> fossil ui<br> ----<br> on win XP<br> I create file with some accending characters in its name, commit it to the fossil repository and save repository as a zip file. When I extract such file from fossil generated file by windows Explorer/Total Commander I get file with different name.<br> Extracting by unzip program from Info-ZIP web site yields correctly named file. In above example I create file with "r-caron" chars in its name(U+0159), after unzipping I get file with "DEGREE SIGN"(U+00B0) in it instead. Many other characters get twisted in similar way. I wonder, if there should be something marked in zip file header, as how to interpret stored file name. I tried to check differences with explorer generated zip files/fossil generated ones with help of:<br> unzip -Zv byZIP.zip >byZIP.lst<br> unzip -Zv byFossil.zip >byFossil.lst<br> so far to no avail. <hr><i>anonymous claiming to be kkinnell added on 2008-12-03 02:56:00:</i><br> If I'm reading this correctly, you stored a file containing utf8 or uc16 encoded characters in a repository, and then got the zip for the repository by downloading it from the server. Then, when you unzipped it with one windows program, the file was encoded incorrectly, but when you unzipped it with a different windows program, the encoding was correct. If that is what happened, then the problem is how the programs you are using for unzipping the files are interpreting them. The one that gives you the correct version is using the same encoding that you used when you created the file, the other one is using something else. <b>fossil</b> itself uses SQLite BLOBs to store its artifacts. The storage doesn't encode the data in any way, it treats it as binary data. <hr><i>anonymous added on 2008-12-03 04:53:53:</i><br> The content of file(s) is correct. What causes trouble is the name/path of file having accented characters.<br> If I modify the test and try to add file on path already having mix of accended/not accended characters and spaces, <br> it is even not listing such file in File List menu which I would consider a bug.<br> (I can save zip archive from fossil, view file content(diff) but will not see such file listed in File List menu)<br> Produced zip archive can be correctly extracted by unzip from Info-ZIP web site. Windows Explorer will show just badly transcoded file name on badly transcoded file path.<br> Used test path uses "r-caron" chars (U+0159),"s-caron"(U+0161) and "LATIN SMALL LETTER Y WITH ACUTE" (U+00FD) accented characters.<br> del .\_fossil_<br> del .\test.fossil<br> rmdir /Q /S .\\"testSšššSRřřřR YýýýY"<br> mkdir .\\"testSšššSRřřřR YýýýY"<br> del manifest.uuid<br> del manifest<br> cd .\\"testSšššSRřřřR YýýýY"<br> echo content of file>"1rřřřř R .test"<br> cd ..<br> fossil new test.fossil<br> fossil open test.fossil<br> fossil add .\\"testSšššSRřřřR YýýýY"\\"1rřřřř R .test"<br> fossil ls<br> fossil commit -m "check in a file with accended char(s) and spaces in its name on a path with accedended characters and spaces" --nosign<br> fossil ui<br> ---<BR> fossil ls<br> ADDED testSÜÜÜSR°°°R YřřřY/1r°°°° R .test<br> <hr><i>kkinnell added on 2008-12-03 16:20:32:</i><br> I have tested this in the Linux version, via copy and paste using the exact string you are using (testSšššSRřřřR YýýýY). I can confirm that the encoding used for display of the filename and contents are not the same as the encoding used for wiki, the wiki strings are "entified" as &#ddd; decimals whereas display of the file path strings are dependent on the browser's default encoding. When I set my browser to use the default windows encoding (Western, or Latin-1 which is ISO-8859-1) I get behavior similar to that which you describe. I think it is possible you <code><b>add</b></code>ed the file that is not showing up in the file list, but did not <code><b>commit</b></code> commit the change afterward. This is the only way I have been able to replicate this behavior. Please verify that all of your encodings—system, browser and zip programs, are the same as your input system and see if you still get bad behavior. I can duplicate most of the problems you are having by mis-matching the encodings, but I am on a GNU/Linux system and I can't be <i>sure</i> there is not something peculiar to the windows version causing part of your problem."
- Change comment to "@rem issue demostration<br> del .\_fossil_<br> del .\test.fossil<br> rmdir /Q /S .\testdir<br> mkdir .\testdir<br> del manifest.uuid<br> del manifest<br> cd .\testdir<br> echo 345678>"1rřřřř R .test"<br> cd ..<br> fossil new test.fossil<br> fossil open test.fossil<br> fossil add .\testdir\\"1rřřřř R .test"<br> fossil ls<br> fossil commit -m "check in a file with accended char/s and spaces in its name" --nosign<br> fossil ui<br> ----<br> on win XP<br> I create file with some accending characters in its name, commit it to the fossil repository and save repository as a zip file. When I extract such file from fossil generated file by windows Explorer/Total Commander I get file with different name.<br> Extracting by unzip program from Info-ZIP web site yields correctly named file. In above example I create file with "r-caron" chars in its name(U+0159), after unzipping I get file with "DEGREE SIGN"(U+00B0) in it instead. Many other characters get twisted in similar way. I wonder, if there should be something marked in zip file header, as how to interpret stored file name. I tried to check differences with explorer generated zip files/fossil generated ones with help of:<br> unzip -Zv byZIP.zip >byZIP.lst<br> unzip -Zv byFossil.zip >byFossil.lst<br> so far to no avail. <hr><i>anonymous claiming to be kkinnell added on 2008-12-03 02:56:00:</i><br> If I'm reading this correctly, you stored a file containing utf8 or uc16 encoded characters in a repository, and then got the zip for the repository by downloading it from the server. Then, when you unzipped it with one windows program, the file was encoded incorrectly, but when you unzipped it with a different windows program, the encoding was correct. If that is what happened, then the problem is how the programs you are using for unzipping the files are interpreting them. The one that gives you the correct version is using the same encoding that you used when you created the file, the other one is using something else. <b>fossil</b> itself uses SQLite BLOBs to store its artifacts. The storage doesn't encode the data in any way, it treats it as binary data. <hr><i>anonymous added on 2008-12-03 04:53:53:</i><br> The content of file(s) is correct. What causes trouble is the name/path of file having accented characters.<br> If I modify the test and try to add file on path already having mix of accended/not accended characters and spaces, <br> it is even not listing such file in File List menu which I would consider a bug.<br> (I can save zip archive from fossil, view file content(diff) but will not see such file listed in File List menu)<br> Produced zip archive can be correctly extracted by unzip from Info-ZIP web site. Windows Explorer will show just badly transcoded file name on badly transcoded file path.<br> Used test path uses "r-caron" chars (U+0159),"s-caron"(U+0161) and "LATIN SMALL LETTER Y WITH ACUTE" (U+00FD) accented characters.<br> del .\_fossil_<br> del .\test.fossil<br> rmdir /Q /S .\\"testSšššSRřřřR YýýýY"<br> mkdir .\\"testSšššSRřřřR YýýýY"<br> del manifest.uuid<br> del manifest<br> cd .\\"testSšššSRřřřR YýýýY"<br> echo content of file>"1rřřřř R .test"<br> cd ..<br> fossil new test.fossil<br> fossil open test.fossil<br> fossil add .\\"testSšššSRřřřR YýýýY"\\"1rřřřř R .test"<br> fossil ls<br> fossil commit -m "check in a file with accended char(s) and spaces in its name on a path with accedended characters and spaces" --nosign<br> fossil ui<br> ---<BR> fossil ls<br> ADDED testSÜÜÜSR°°°R YřřřY/1r°°°° R .test<br> <hr><i>kkinnell added on 2008-12-03 16:20:32:</i><br> I have tested this in the Linux version, via copy and paste using the exact string you are using (testSšššSRřřřR YýýýY). I can confirm that the encoding used for display of the filename and contents are not the same as the encoding used for wiki, the wiki strings are "entified" as &#ddd; decimals whereas display of the file path strings are dependent on the browser's default encoding. When I set my browser to use the default windows encoding (Western, or Latin-1 which is ISO-8859-1) I get behavior similar to that which you describe. I think it is possible you ADDed the file that is not showing up in the file list, but did not CHECKIN after you added it. This is the only way I have been able to replicate this behavior. Please verify that all of your encodings—system, browser and zip programs, are the same as your input system and see if you still get bad behavior. I can duplicate most of the problems you are having by mis-matching the encodings, but I am on a GNU/Linux system and I can't be <i>sure</i> there is not something peculiar to the windows version causing part of your problem."
- Appended to comment:
anonymous claiming to be kkinnell added on 2008-12-03 16:20:32:
I have tested this in the Linux version, via copy and paste using the exact string you are using (testSšššSRřřřR YýýýY).I can confirm that the encoding used for display of the filename and contents are not the same as the encoding used for wiki.
- Appended to comment:
anonymous added on 2008-12-03 04:53:53:
The content of file(s) is correct. What causes trouble is the name/path of file having accented characters.
If I modify the test and try to add file on path already having mix of accended/not accended characters and spaces,
it is even not listing such file in File List menu which I would consider a bug.
(I can save zip archive from fossil, view file content(diff) but will not see such file listed in File List menu)
Produced zip archive can be correctly extracted by unzip from Info-ZIP web site. Windows Explorer will show just badly transcoded file name on badly transcoded file path.
Used test path uses "r-caron" chars (U+0159),"s-caron"(U+0161) and "LATIN SMALL LETTER Y WITH ACUTE" (U+00FD) accented characters.
del .\_fossil_
del .\test.fossil
rmdir /Q /S .\\"testSšššSRřřřR YýýýY"
mkdir .\\"testSšššSRřřřR YýýýY"
del manifest.uuid
del manifest
cd .\\"testSšššSRřřřR YýýýY"
echo content of file>"1rřřřř R .test"
cd ..
fossil new test.fossil
fossil open test.fossil
fossil add .\\"testSšššSRřřřR YýýýY"\\"1rřřřř R .test"
fossil ls
fossil commit -m "check in a file with accended char(s) and spaces in its name on a path with accedended characters and spaces" --nosign
fossil ui
---
fossil ls
ADDED testSÜÜÜSR°°°R YřřřY/1r°°°° R .test
- Appended to comment:
anonymous claiming to be kkinnell added on 2008-12-03 02:56:00:
If I'm reading this correctly, you stored a file containing utf8 or uc16 encoded characters in a repository, and then got the zip for the repository by downloading it from the server. Then, when you unzipped it with one windows program, the file was encoded incorrectly, but when you unzipped it with a different windows program, the encoding was correct.If that is what happened, then the problem is how the programs you are using for unzipping the files are interpreting them. The one that gives you the correct version is using the same encoding that you used when you created the file, the other one is using something else.
fossil itself uses SQLite BLOBs to store its artifacts. The storage doesn't encode the data in any way, it treats it as binary data.
- Change resolution to "Open"
- Change comment to "@rem issue demostration<br> del .\_fossil_<br> del .\test.fossil<br> rmdir /Q /S .\testdir<br> mkdir .\testdir<br> del manifest.uuid<br> del manifest<br> cd .\testdir<br> echo 345678>"1rřřřř R .test"<br> cd ..<br> fossil new test.fossil<br> fossil open test.fossil<br> fossil add .\testdir\\"1rřřřř R .test"<br> fossil ls<br> fossil commit -m "check in a file with accended char/s and spaces in its name" --nosign<br> fossil ui<br> ----<br> on win XP<br> I create file with some accending characters in its name, commit it to the fossil repository and save repository as a zip file. When I extract such file from fossil generated file by windows Explorer/Total Commander I get file with different name.<br> Extracting by unzip program from Info-ZIP web site yields correctly named file. In above example I create file with "r-caron" chars in its name(U+0159), after unzipping I get file with "DEGREE SIGN"(U+00B0) in it instead. Many other characters get twisted in similar way. I wonder, if there should be something marked in zip file header, as how to interpret stored file name. I tried to check differences with explorer generated zip files/fossil generated ones with help of:<br> unzip -Zv byZIP.zip >byZIP.lst<br> unzip -Zv byFossil.zip >byFossil.lst<br> so far to no avail."
- Change foundin to "Fossil version [f84bfc31bf] 2008-11-27 02:30:29"
- Change private_contact to "24b67375dd2ec6c7381a5ad34cfcf006f0b9c260"
- Change severity to "Important"
- Change status to "Open"
- Change title to "file extracted from Fossil zip archive could have different name"
- Change type to "Code_Defect"