Artifact Content
Not logged in

Artifact b4104959a67175f02d6b415480be22a239f1f077

File www/fileformat.html part of check-in [0cd202a86e] - Website updates. Change the message for unrecognized commands to refer the user to "help". by drh on 2007-08-23 23:10:56. Also file www/fileformat.html part of check-in [424d47e453] - Attempting the same merge that aku tried and got empty files with. by drh on 2007-08-25 18:58:16.

<html>
<head>
<title>Fossil File Format</title>
</head>
<body bgcolor="white">
<h1 align="center">
Fossil File Formats
</h1>

<p>
The global state of a fossil repository is determined by an unordered
set of files.  Some files are used to represent wiki pages, trouble tickets,
and the special "manifest" file has a specific and well-defined format.
Other files are just data.  Files can be text or binary.
</p>

<p>
Each file in the repository is named by its SHA1 hash.
No prefixes or meta information is added to a file before
its hash is computed.  The name of a file in the repository
is exactly the same SHA1 hash that is computed by sha1sum 
on the file as it exists in your source tree.</p>

<p>
Some files have a particular format which qualifies them
as "manifests".  A manifest assigns filenames to a subset
of the files in the repository, in order to provide a
snapshot of the state of the project at a point in time.
Each manifest file corresponds to a version or baseline
of the project.
</p>

<h2>1.0 The Manifest File</h2>

<p>
Any file in the repository that follows the syntactic rules
of a manifest is a manifest.  Note that a manifest can
be both a real manifest and also a content file, though this
is rare.
</p>

<p>
A manifest is a line-oriented text file.  Newline characters
(ASCII 0x0a) separate lines.  Each line begins with a single
character "line type".  Zero or more arguments may follow
the line type.  All arguments are separated from each other
and from the line-type character by a single space
character.  There is no surplus white space between arguments
and no leading or trailing whitespace except for the newline 
character that acts as the line separator.
</p>

<p>
All lines of the manifest occur in strict sorted lexigraphical order.
No line may be duplicated.
The entire manifest file may be PGP clear-signed, but otherwise it
may contain no additional text or data beyond what is described here.
</p>

<p>
Allowed lines in the manifest are as follows:
</p>

<blockquote>
<b>C</b> <i>checkin-comment</i><br>
<b>D</b> <i>time-and-date-stamp</i><br>
<b>F</b> <i>filename</i> <i>SHA1-hash</i><br>
<b>P</b> <i>SHA1-hash</i>+<br>
<b>R</b> <i>repository-checksum</i><br>
<b>U</b> <i>user-login</i><br>
<b>Z</b> <i>manifest-checksum</i>
</blockquote>

<p>
A manifest must have exactly one C-line.  The sole argument to
the C-line is a check-in comment that describes the baseline that
the manifest defines.  The check-in comment is text.  The following
escape sequences are applied to the text:
A space (ASCII 0x20) is represented as "\s" (ASCII 0x5C, 0x73).  A
newline (ASCII 0x0a) is "\n" (ASCII 0x6C, x6E).  A backslash 
(ASCII 0x5C) is represented as two backslashes "\\".  Apart from
space and newline, no other whitespace characters are allowed in
the check-in comment.  Nor are any unprintable characters allowed
in the comment.
</p>

<p>
A manifest must have exactly one D-line.  The sole argument to
the D-line is a date-time stamp in the ISO8601 format.  The
date and time should be in coordinated universal time (UTC).
The format is:
</p>

<blockquote>
<i>YYYY</i><b>-</b><i>MM</i><b>-</b><i>DD</i><b>T</b><i>HH</i><b>:</b><i>MM</i><b>:</b><i>SS</i>
</blockquote>

<p>
A manifest has zero or more F-lines.  Each F-line defines a file
(other than the manifest itself) which is part of the baseline that
the manifest defines.  There are two arguments.  The first argment
is the pathname of the file in the baseline relative to the root
of the project file hierarchy.  No ".." or "." directories are allowed
within the filename.  Space characters are escaped as in C-line
comment text.  Backslash characters and newlines are not allowed
within filenames.  The directory separator character is a forward
slash (ASCII 0x2F).  The second argument to the F-line is the
full 40-character hexadecimal SHA1 hash of the file content.  
Upper-case letters ABCDEF are used for the higher digits of the
hexadecimal.
</p>

<p>
A manifest has zero or one P-lines.  Most manifests have one P-line.
The P-line has a varying number of arguments that
defines other manifests from which the current manifest
is derived.  Each argument is an 40-character lowercase 
hexadecimal SHA1 of the predecessor manifest.  All arguments
to the P-line must be unique to that line.
The first predecessor is the manifests direct ancestor.
Other arguments define manifests with which the first was
merged to yield the current manifest.  Most manifests have
a P-line with a single argument.  The first manifest in the
project has no ancestors and thus has no P-line.
</p>

<p>
A manifest may optionally have a single R-line.  The R-line has
a single argument which is the MD5 checksum of all files in 
the baseline except the manifest itself.  The checksum is expressed
as 32-characters of lowercase hexadecimal.   The checksum is
computed as follows:  For each file in the baseline (except for
the manifest itself) in strict sorted lexigraphical order, 
take the pathname of the file relative to the root of the
repository, append a single space (ASCII 0x20), the
size of the file in ASCII decimal, a single newline
character (ASCII 0x0A), and the complete text of the file.
Compute the MD5 checksum of the the result.
</p>

<p>
Each manifest has a single U-line.  The argument to the U-line is
the login of the user who created the manifest.  The login name
is encoded using the same character escapes as is used for the
check-in comment argument to the C-line.
</p>

<p>
A manifest has an option Z-line as its last line.  The argument
to the Z-line is a 32-character lowercase hexadecimal MD5 hash
of all prior lines of the manifest up to and including the newline 
character that immediately preceeds the "Z".  The Z-line is just
a sanity check to prove that the manifest is well-formed and
consistent.
</p>

<h2>2.0 Trouble Tickets</h2>

<p>
Each trouble ticket is a file in the repository and appears in
a manifest for every baseline in which the ticket exists.
Trouble tickets occur in a specific subdirectory of the file
heirarchy.  The name of the subdirectory that contains tickets
is part of the local state of each repository.  The filename
of each trouble ticket has a ".tkt" suffix.  The trouble ticket
has a particular file format defined below.
</p>

<i>To be continued...</i>

<h2>3.0 Wiki Pages</h2>

<p>
Each wiki is a file in the repository and appears in
a manifest for every baseline in which that wiki page exists.
Wiki pages occur in a specific subdirectory of the file
heirarchy.  The name of the subdirectory that contains wiki pages
is part of the local state of each repository.  The filename
of each wiki page has a ".wiki" suffix.  The base name of
the file is the name of the wiki page.  The wiki pages
have a particular file format defined below.
</p>

<i>To be continued...</i>