Overview
SHA1 Hash: | b807acf62eecfbf884fa10244ea59d78f39ac0a9 |
---|---|
Date: | 2007-07-24 12:52:32 |
User: | drh |
Comment: | Documentation updates |
Timelines: | ancestors | descendants | both | trunk |
Other Links: | files | ZIP archive | manifest |
Tags And Properties
- branch=trunk inherited from [a28c83647d]
- sym-trunk inherited from [a28c83647d]
Changes
[hide diffs]Modified www/fileformat.html from [85a1fd60fb] to [8a7ce84d60].
@@ -1,95 +1,179 @@ <html> <head> -<title>Fossil File Formats</title> +<title>Fossil File Format</title> </head> <body bgcolor="white"> <h1 align="center"> Fossil File Formats </h1> <p> The global state of a fossil repository is determined by an unordered -set of content files. Each of these files has a format which is defined -by this document. -</p> - -<h2>1.0 General Formatting Rules</h2> +set of files. Some files used to represent wiki pages, trouble tickets, +and the special "manifest" file has a specific and well-defined format. +Other files are just the content of the files. Files can be text or +binary. +</p> + +<p> +Each file in the repository is named by its SHA1 hash. +Some files have a particular format which qualifies them +as "manifests". A manifest assigns filenames to a subset +of the files in the repository, in order to provide a +snapshot of the state of the project at a point in time. +Each manifest file corresponds to a version or baseline +of the project. +</p> + +<h2>1.0 The Manifest File</h2> + +<p> +Any file in the repository that follows the syntactic rules +of a manifest is a manifest. Note that a manifest can +be both a real manifest and also a content file, though this +is rare. +</p> <p> -Fossil content files consist of a header, a blank line, and optional -content. +A manifest is a line-oriented text file. Newline characters +(ASCII 0x0a) separate lines. Each line begins with a single +character "line type". Zero or more arguments may follow +the line type. All arguments are separated from each other +and from the line-type character by a single space +character. There is no surplus white space between arguments +and no leading or trailing whitespace except for the newline +character that acts as the line separator. </p> <p> -The header is divided into "properties" by newline ('\n', 0x0a) -characters. Each header property is divided into tokens by space (' ', 0x20) -characters. The first token of each property is the property name. -Subsequent tokens (if any) are arguments to the property. +All lines of the manifest occur in strict sorted lexigraphical order. +No line may be duplicated. +The entire manifest file may be PGP clear-signed, but otherwise it +may contain no additional text or data beyond what is described here. </p> <p> -The blank line that separates the header from the content can be -thought of as a property line that contains no tokens. Everything -that follows the newline character that terminates the blank line -is content. The blank line is always present but the content is -optional. -</p> - -<p> -All tokens in a property line are encoded to escape special characters. -The encoding is as follows: +Allowed lines in the manifest are as follows: </p> <blockquote> -<table border="1"> -<tr><th>Input Character</th><th>Encoded As</th></tr> -<tr><td align="center"> space (0x20) </td><td align="center"> \s </td></tr> -<tr><td align="center"> newline (0x0A) </td><td align="center"> \n </td></tr> -<tr><td align="center"> carriage return (0x0D) </td><td align="center"> \r </td></tr> -<tr><td align="center"> tab (0x09) </td><td align="center"> \t </td></tr> -<tr><td align="center"> vertical tab (0x0B) </td><td align="center"> \v </td></tr> -<tr><td align="center"> formfeed (0x0C) </td><td align="center"> \f </td></tr> -<tr><td align="center"> nul (0x00) </td><td align="center"> \0 </td></tr> -<tr><td align="center"> backslash (0x5C) </td><td align="center"> \\ </td></tr> -</table> +<b>C</b> <i>checkin-comment</i><br> +<b>D</b> <i>time-and-date-stamp</i><br> +<b>F</b> <i>filename</i> <i>SHA1-hash</i><br> +<b>P</b> <i>SHA1-hash</i>+<br> +<b>R</b> <i>repository-checksum</i><br> +<b>U</b> <i>user-login</i><br> +<b>Z</b> <i>manifest-checksum</i> </blockquote> <p> -Characters other than the ones shown in the table above are passed through -the encoder without change. +A manifest must have exactly one C-line. The sole argument to +the C-line is a check-in comment that describes the baseline that +the manifest defines. The check-in comment is text. The following +escape sequences are applied to the text: +A space (ASCII 0x20) is represented as "\s" (ASCII 0x5C, 0x73). A +newline (ASCII 0x0a) is "\n" (ASCII 0x6C, x6E). A backslash +(ASCII 0x5C) is represented as two backslashes "\\". Apart from +space and newline, no other whitespace characters are allowed in +the check-in comment. Nor are any unprintable characters allowed +in the comment. </p> <p> -All properties names are unpunctuated lower-case ASCII strings. -The properties appear in the header in sorted order (using -memcpy() as the comparision function) except for the "signature" -property which always occurs first. -</p> - -<h2>2.0 Common Properties</h2> - -<p> -Every content file has a "time" property. The argument to the -time property is an integer which is the number of seconds since -1970 UTC when the content file was created. For example: +A manifest must have exactly one D-line. The sole argument to +the D-line is a date-time stamp in the ISO8601 format. The +date and time should be in coordinated universal time (UTC). +The format is: </p> <blockquote> -time 1181404746 +<i>YYYY</i><b>-</b><i>MM</i><b>-</b><i>DD</i><b>T</b><i>HH</i><b>:</b><i>MM</i><b>:</b><i>SS</i> </blockquote> <p> -Every content file has a "type" property. The argument to the -type property defines the purpose of the content file. The -argument can be strings like "version", "folder", "file", or "user". +A manifest has zero or more F-lines. Each F-line defines a file +(other than the manifest itself) which is part of the baseline that +the manifest defines. There are two arguments. The first argment +is the pathname of the file in the baseline relative to the root +of the project file hierarchy. No ".." or "." directories are allowed +within the filename. Space characters are escaped as in C-line +comment text. Backslash characters and newlines are not allowed +within filenames. The directory separator character is a forward +slash (ASCII 0x2F). The second argument to the F-line is the +full 40-character hexadecimal SHA1 hash of the file content. +Upper-case letters ABCDEF are used for the higher digits of the +hexadecimal. +</p> + +<p> +A manifest has zero or one P-lines. Most manifests have one P-line. +The P-line has a varying number of arguments that +defines other manifests from which the current manifest +is derived. Each argument is an 40-character uppercase +hexadecimal SHA1 of the predecessor manifest. All arguments +to the P-line must be unique to that line. +The first predecessor is the manifests direct ancestor. +Other arguments define manifests with which the first was +merged to yield the current manifest. Most manifests have +a P-line with a single argument. The first manifest in the +project has no ancestors and thus has no P-line. +</p> + +<p> +A manifest may optionally have a single R-line. The R-line has +a single argument which is the MD5 checksum of all files in +the baseline except the manifest itself. The checksum is expressed +as 32-characters of uppercase hexadecimal. The checksum is +computed as follows: For each file in the baseline (except for +the manifest itself) in strict sorted lexigraphical order, +take the pathname of the file relative to the root of the +repository, append a single space (ASCII 0x20), the +size of the file in ASCII decimal, a single newline +character (ASCII 0x0A), and the complete text of the file. +Compute the MD5 checksum of the the result. +</p> + +<p> +Each manifest has a single U-line. The argument to the U-line is +the login of the user who created the manifest. The login name +is encoded using the same character escapes as is used for the +check-in comment argument to the C-line. +</p> + +<p> +A manifest has an option Z-line as its last line. The argument +to the Z-line is a 32-character uppercase hexadecimal MD5 hash +of all prior lines of the manifest up to and including the newline +character that immediately preceeds the "Z". The Z-line is just +a sanity check to prove that the manifest is well-formed and +consistent. +</p> + +<h2>2.0 Trouble Tickets</h2> + +<p> +Each trouble ticket is a file in the repository and appears in +a manifest for every baseline in which the ticket exists. +Trouble tickets occur in a specific subdirectory of the file +heirarchy. The name of the subdirectory that contains tickets +is part of the local state of each repository. The filename +of each trouble ticket has a ".tkt" suffix. The trouble ticket +has a particular file format defined below. </p> + +<i>To be continued...</i> + +<h2>3.0 Wiki Pages</h2> <p> -The first property of a content file is the digital signature. The -name of the signature property is "signature". There are two arguments. -The first argument is the SHA256 hash of the content file that defines -the user who signed this file. User records themselves are self-signed -and so the first argument is simply "*" for user records. The second -argument is the digital signature of an SHA256 hash of the entire -file (header and content) except for the signature line itself. +Each wiki is a file in the repository and appears in +a manifest for every baseline in which that wiki page exists. +Wiki pages occur in a specific subdirectory of the file +heirarchy. The name of the subdirectory that contains wiki pages +is part of the local state of each repository. The filename +of each wiki page has a ".wiki" suffix. The base name of +the file is the name of the wiki page. The wiki pages +have a particular file format defined below. </p> + +<i>To be continued...</i>
Modified www/index.html from [68895a10cd] to [30f3834664].
@@ -7,25 +7,26 @@ <p> This is a preliminary homepage for a new software configuration management system called "Fossil". The code is currently under development, and has been for about -a year. Nothing is available for download or inspection -as of this writing (2007-06-09). +two years. (We have iterated the design multiple times.) +Nothing is available for download or inspection +as of this writing (2007-07-24). But the system is self-hosting now. Hopefully something will be available soon. </p> -<p>Distinctive features of Fossil:</p> +<p>Design Goals For Fossil:</p> <ul> <li>Supports disconnected, distributed development (like <a href="http://kerneltrap.org/node/4982">git</a>, <a href="http://www.venge.net/monotone/">monotone</a>, <a href="http://www.selenic.com/mercurial/wiki/index.cgi">mercurial</a>, or <a href="http://www.bitkeeper.com/">bitkeeper</a>) -or tightly coupled client/server operation (like +or client/server operation (like <a href="http://www.nongnu.org/cvs/">CVS</a> or <a href="http://subversion.tigris.org/">subversion</a>) or both at the same time</li> <li>Integrated bug tracking and wiki, along the lines of <a href="http://www.cvstrac.org/">CVSTrac</a> and @@ -38,29 +39,25 @@ trivial to install</li> <li>Server runs as <a href="http://www.w3.org/CGI/">CGI</a>, using <a href="http://en.wikipedia.org/wiki/inetd">inetd</a> or <a href="http://www.xinetd.org/">xinetd</a> or using its own built-in, standalone web server.</li> -<li>The entire project contained in single disk file (which also -happens to be an <a href="http://www.sqlite.org/">SQLite</a> database.)</li> -<li>Self sign-up (at the administrators discretion) including the -ability to support secure anonymous check-ins (also optional).</li> -<li>Digital signatures on all files, versions, -<a href="http://wiki.org/wiki.cgi?WhatIsWiki">wiki</a> pages, -trouble tickets, etc. Everything is digitally signed.</li> +<li>An entire project contained in single disk file (which also +happens to be an <a href="http://www.sqlite.org/">SQLite</a> database.)</li> <li>Trivial to setup and administer</li> <li>Files and versions identified by their -<a href="http://en.wikipedia.org/wiki/SHA-1">SHA-256</a> signature expressed -in <a href="base32.html">base-32 notation</a>. +<a href="http://en.wikipedia.org/wiki/SHA-1">SHA1</a> signature.</a> Any unique prefix is sufficient to identify a file or version - usually the first 4 or 5 characters suffice.</li> +<li>The file format is trival and requires nothing more complex +than a text editor and the "sha1sum" command-line utility to decode.</li> <li>Automatic <a href="selfcheck.html">self-check</a> on repository changes makes it exceedingly unlikely that data will ever be lost because of a software bug.</li> </ul> -<p>Goals of fossil:</p> +<p>Objectives Of Fossil:</p> <ul> <li>Fossil should be ridiculously easy to install and operate.</li> <li>With fossil, it should be possible (and easy) to set up a project on an inexpensive shared-hosting ISP @@ -78,13 +75,13 @@ <p>Links:</p> <ul> <li><a href="pop.html">Principals Of Operation</a></li> -<li>The <a href="base32.html">base-32 encoding</a> mechanism used -by Fossil.</li> +<li>The <a href="selfcheck.html">automatic self-check</a> mechanism +helps insure project integrity.</li> <li>The <a href="fileformat.html">file format</a> used by every content file stored in the repository.</li> </ul> </body> </html>
Modified www/pop.html from [bf75283289] to [13539df59a].
@@ -27,18 +27,18 @@ for each repository is private to that repository. The global state represents the content of the project. The local state identifies the authorized users and access policies for a particular repository.</p></li> -<li><p>The global state of a repository is an mostly unordered +<li><p>The global state of a repository is an unordered collection of files. Each file is named by -its SHA256 hash. The name is encoded as a 52-digit -base-32 number. In many contexts, the name can be +its SHA1 hash encoded in hexadecimal. +In many contexts, the name can be abbreviated to a unique prefix. A five- or six-character prefix usually suffices to uniquely identify a file.</p></li> -<li><p>Because files are named by their SHA256 hash, all files +<li><p>Because files are named by their SHA1 hash, all files are immutable. Any change to the content of a file also changes the hash that forms the files name, thus creating a new file. Both the old original version of the file and the new change are preserved under different names.</p></li> @@ -45,52 +45,37 @@ <li><p>It is theoretically possible for two files with different content to share the same hash. But finding two such files is so incredibly difficult and unlikely that we consider it to be an impossibility.</p></li> -<li><p>The files that comprise the global state of a repository -consist of a header followed by optional content. Every -file contains an RSA signature in the header. And every -file contains a "file type" designator in the header. -Additional information is also found in the header depending -on the file type.</p></li> +<li><p>The signature of a file is the SHA1 hash of the +file itself, exactly as it appears on disk. No prefix +or meta-information about the file is added before computing +the hash. So you can +always find the SHA1 signature of a file by using the +"sha1sum" command-line utility.</p></li> -<li><p>The file that comprise the global state of a repository +<li><p>The files that comprise the global state of a repository are the complete global state of that repository. The SQLite database that holds the repository contains additional information about linkages between files, but all of that added information -can be discarded and reconstructed by scanning the content +can be discarded and reconstructed by rescanning the content files.</p></li> <li><p>Two repositories for the same project can synchronize their global states simply by sharing files. The local state of repositories is not normally synchronized or shared.</p></li> -<li><p>The name of a file is its SHA256 hash in a base-32 -encoding. The digits of the base-32 encode are as -follows: - -<blockquote><b> - 0123456789abcdefghjkmnpqrstuvwxy -</b></blockquote> +<li><p>Every repository has a special file at the top-level +named "manifest" which is an index of all other files in +the system. The manifest is automatically created and +maintained by the system.</p></li> -<p>The letters "o", "i", and "l" are omitted from the -encoding character set to avoid confusion with the -digits "0" and "1". On input, upper and lower case -letters are treated the same, the letter "o" is -interpreted as a zero ("0") and the letters "i" and -"l" are interpreted as a one ("1"). The full name of -a file is 52 characters long. The first 4 bits of the -SHA256 has are repeated onto the end of the hash so that -the last digit in the base-32 encoding will contain a -full 5 bits. -For convenience, files -may often be abbreviated to a unique prefix and the -repository will automatically expand the name to -its full 52 characters. In practice, 5 or 6 -characters are usually sufficient to give a unique -name prefix to files even in the largest of projects.</p></li> -</ul> +<li><p>The <a href="fileformat.html">file format</a> +is very simple so that with access +to the original content files, one can easily reconstruct +the content of a baseline without the need for any +special tools or software.</p></li> </body> </html>
Modified www/selfcheck.html from [0532e60f88] to [849df9860d].
@@ -17,12 +17,12 @@ <h2>Atomic Check-ins With Rollback</h2> <p> The fossil repository is an -<a href="http://www.sqlite.org/">SQLite</a> database file. SQLite -is very mature and stable and has been in wide-spread use for many +<a href="http://www.sqlite.org/">SQLite version 3</a> database file. +SQLite is very mature and stable and has been in wide-spread use for many years, so we have little worries that it might cause repository corruption. SQLite databases do not corrupt even if a program or system crash or power failure occurs in the middle of the update. If some kind of crash does occur in the middle of a change, then all the changes are rolled @@ -56,14 +56,14 @@ of every content file that it changes just prior to transaction commit. So during the course of check-in, many different files in the repository might be modified. Some files are simply compressed. Other files are delta encoded and then compressed. While all this is going on, fossil makes a record of every file -that is encoded and the MD5 hash of the original content of that +that is encoded and the SHA1 hash of the original content of that file. Then just before transaction commit, fossil re-extracts the original content of all files that were written, computes -the MD5 checksum again, and verifies that the checksums match. +the SHA1 checksum again, and verifies that the checksums match. If anything does not match up, an error message is printed and the transaction rolls back. </p> <p> @@ -72,40 +72,19 @@ Hence bugs in fossil are unlikely to corrupt the repository in a way that prevents us from extracting historical versions of files. </p> -<h2>Checksums on all files and versions</h2> +<h2>Checksum Over All Files In A Baseline</h2> <p> -Repository records of type "file" (records that hold the content -of project files) contain a "cksum" property which records the -MD5 checksum of the content of that file. So if something goes -wrong in the file extraction process we will at least know about -it. This checksum is in addition to the digital signature that -is over the entire header and content of the record. -</p> - -<p> -Repository records of type "version" contain a "cksum" -property that holds the MD5 checksum of the concatenation of -every file in the entire project. During a check-in, after -fossil has inserted all changes into the repository, it goes -back and rereads every file out of the repository and recomputes -this global checksum based on the respository content. It then -computes an MD5 checksum over the files on disk. If these two -checksums do not match, the check-in files and rolls back. -Thus if a check-in transaction is successful, we have high -confidence that the content in the repository exactly matches -the content on disk. -</p> - -<p> -Every project files is verified by three separate checksums. -There is an SHA256 checksum used as part of the digital signature -on the file. There is an MD5 checksum on the content of each -individual file. And there is a global MD5 checksum over the -entire project source tree. If any of these cross-checks do not -match then the operation fails and an error is displayed. Taken -together, these cross-checks give us high confidence that the -files you checked out are identical to the files you checked in. +Manifest files that define a baseline have two fields (the +R-line and Z-line) that record MD5 hashs of the manifest itself +and of all other files in the manifest. Prior to any check-in +commit, these checksums are verified to ensure that the baseline +checked in agrees exactly with what is on disk. Similarly, +the repository checksum is verified after a checkout to make +sure that the entire repository was checked out correctly. +Note that these added checks use a different hash (MD5 instead +of SHA1) in order to avoid common-mode failures in the hash +algorithm implementation. </p>