Artifact 9924363ba6277701b9665475b2bfeb8ae1d90d24
File
www/fileformat.wiki
part of check-in
[6fe981ec56]
- typo
by
bch on
2009-04-07 04:12:04.
Also file
www/fileformat.wiki
part of check-in
[ece2c766a2]
- Merge in wiki typo fixes.
by
drh on
2009-04-11 12:53:00.
<h1 align="center">
Fossil File Formats
</h1>
<p>The global state of a fossil repository is kept simple so that it can
endure in useful form for decades or centuries.
A fossil repository is intended to be readable,
searchable, and extensible by people not yet born.</p>
<p>
The global state of a fossil repository is an unordered
set of <i>artifacts</i>.
An artifact might be a source code file, the text of a wiki page,
part of a trouble ticket, or one of several special control artifacts
used to show the relationships between other artifacts within the
project. Each artifact is normally represented on disk as a separate
file. Artifacts can be text or binary.
</p>
<p>
In addition to the global state,
each fossil repository also contains local state.
The local state consists of web-page formatting
preferences, authorized users, ticket display and reporting formats,
and so forth. The global state is shared in common among all
repositories for the same project, whereas the local state is often
different in separate repositories.
The local state is not versioned and is not synchronized
with the global state.
The local state is not composed of artifacts and is not intended to be enduring.
This document is concerned with global state only. Local state is only
mentioned here in order to distinguish it from global state.
</p>
<p>
Each artifact in the repository is named by its SHA1 hash.
No prefixes or meta information is added to a artifact before
its hash is computed. The name of a artifact in the repository
is exactly the same SHA1 hash that is computed by sha1sum
on the file as it exists in your source tree.</p>
<p>
Some artifacts have a particular format which gives them special
meaning to fossil. Fossil recognizes:</p>
<ul>
<li> Manifests </li>
<li> Clusters </li>
<li> Control Artifacts </li>
<li> Wiki Pages </li>
<li> Ticket Changes </li>
</ul>
<p>These five artifact types are described in the sequel.</p>
<p>In the current implementation (as of 2009-01-25) the artifacts that
make up a fossil repository are stored in in as delta- and zlib-compressed
blobs in an <a href="http://www.sqlite.org/">SQLite</a> database. This
is an implementation detail and might change in a future release. For
the purpose of this article "file format" means the format of the artifacts,
not how the artifacts are stored on disk. It is the artifact format that
is intended to be enduring. The specifics of how artifacts are stored on
disk, though stable, is not intended to live as long as the
artifact format.</p>
<h2>1.0 The Manifest</h2>
<p>A manifest defines a check-in or version of the project
source tree. The manifest contains a list of artifacts for
each file in the project and the corresponding filenames, as
well as information such as parent check-ins, the name of the
programmer who created the check-in, the date and time when
the check-in was created, and any check-in comments associated
with the check-in.</p>
<p>
Any artifact in the repository that follows the syntactic rules
of a manifest is a manifest. Note that a manifest can
be both a real manifest and also a content file, though this
is rare.
</p>
<p>
A manifest is a text file. Newline characters
(ASCII 0x0a) separate the file into "cards".
Each card begins with a single
character "card type". Zero or more arguments may follow
the card type. All arguments are separated from each other
and from the card-type character by a single space
character. There is no surplus white space between arguments
and no leading or trailing whitespace except for the newline
character that acts as the card separator.
</p>
<p>
All cards of the manifest occur in strict sorted lexicographical order.
No card may be duplicated.
The entire manifest may be PGP clear-signed, but otherwise it
may contain no additional text or data beyond what is described here.
</p>
<p>
Allowed cards in the manifest are as follows:
</p>
<blockquote>
<b>C</b> <i>checkin-comment</i><br>
<b>D</b> <i>time-and-date-stamp</i><br>
<b>F</b> <i>filename</i> <i>SHA1-hash</i> <i>permissions</i> <i>old-name</i><br>
<b>P</b> <i>SHA1-hash</i>+<br>
<b>R</b> <i>repository-checksum</i><br>
<b>T</b> (<b>+</b>|<b>-</b>|<b>*</b>)<i>tag-name <b>*</b> ?value?</i><br>
<b>U</b> <i>user-login</i><br>
<b>Z</b> <i>manifest-checksum</i>
</blockquote>
<p>
A manifest must have exactly one C-card. The sole argument to
the C-card is a check-in comment that describes the check-in that
the manifest defines. The check-in comment is text. The following
escape sequences are applied to the text:
A space (ASCII 0x20) is represented as "\s" (ASCII 0x5C, 0x73). A
newline (ASCII 0x0a) is "\n" (ASCII 0x6C, x6E). A backslash
(ASCII 0x5C) is represented as two backslashes "\\". Apart from
space and newline, no other whitespace characters are allowed in
the check-in comment. Nor are any unprintable characters allowed
in the comment.
</p>
<p>
A manifest must have exactly one D-card. The sole argument to
the D-card is a date-time stamp in the ISO8601 format. The
date and time should be in coordinated universal time (UTC).
The format is:
</p>
<blockquote>
<i>YYYY</i><b>-</b><i>MM</i><b>-</b><i>DD</i><b>T</b><i>HH</i><b>:</b><i>MM</i><b>:</b><i>SS</i>
</blockquote>
<p>
A manifest has zero or more F-cards. Each F-card defines a file
(other than the manifest itself) which is part of the check-in that
the manifest defines. There are two, three, or four arguments.
The first argument
is the pathname of the file in the check-in relative to the root
of the project file hierarchy. No ".." or "." directories are allowed
within the filename. Space characters are escaped as in C-card
comment text. Backslash characters and newlines are not allowed
within filenames. The directory separator character is a forward
slash (ASCII 0x2F). The second argument to the F-card is the
full 40-character lower-case hexadecimal SHA1 hash of the content
artifact. The optional 3rd argument defines any special access
permissions associated with the file. The only special code currently
defined is "x" which means that the file is executable. All files are
always readable and writable. This can be expressed by "w" permission
if desired but is optional.
The optional 4th argument is the name of the same file as it existed in
the parent check-in. If the name of the file is unchanged from its
parent, then the 4th argument is omitted.
</p>
<p>
A manifest has zero or more N-cards. Each N card records a name changes
to one of the files in the manifest. The first argument to the N code is
the name of the file in the parent check-in. The second argument is the
name of the file in the check-in defined by the manifest.
</p>
<p>
A manifest has zero or one P-cards. Most manifests have one P-card.
The P-card has a varying number of arguments that
defines other manifests from which the current manifest
is derived. Each argument is an 40-character lowercase
hexadecimal SHA1 of the predecessor manifest. All arguments
to the P-card must be unique to that line.
The first predecessor is the direct ancestor of the manifest.
Other arguments define manifests with which the first was
merged to yield the current manifest. Most manifests have
a P-card with a single argument. The first manifest in the
project has no ancestors and thus has no P-card.
</p>
<p>
A manifest may optionally have a single R-card. The R-card has
a single argument which is the MD5 checksum of all files in
the check-in except the manifest itself. The checksum is expressed
as 32-characters of lowercase hexadecimal. The checksum is
computed as follows: For each file in the check-in (except for
the manifest itself) in strict sorted lexicographical order,
take the pathname of the file relative to the root of the
repository, append a single space (ASCII 0x20), the
size of the file in ASCII decimal, a single newline
character (ASCII 0x0A), and the complete text of the file.
Compute the MD5 checksum of the the result.
</p>
<p>
A manifest might contain one or more T-cards used to set tags or
properties on the check-in. The format of the T-card is the same as
described in <i>Control Artifacts</i> section below, except that the
second argument is the single characcter "<b>*</b>" instead of an
artifact ID. The <b>*</b> in place of the artifact ID indicates that
the tag or property applies to the current artifact. It is not
possible to encode the current artifact ID as part of an artifact,
since the act of inserting the artifact ID would change the artifact ID,
hence a <b>*</b> is used to represent "self". T-cards are typically
added to manifests in order to set the <b>branch</b> property and a
symbolic name when the check-in is intended to start a new branch.
</p>
<p>
Each manifest has a single U-card. The argument to the U-card is
the login of the user who created the manifest. The login name
is encoded using the same character escapes as is used for the
check-in comment argument to the C-card.
</p>
<p>
A manifest has an option Z-card as its last line. The argument
to the Z-card is a 32-character lowercase hexadecimal MD5 hash
of all prior lines of the manifest up to and including the newline
character that immediately precedes the "Z". The Z-card is just
a sanity check to prove that the manifest is well-formed and
consistent.
</p>
<h2>2.0 Clusters</h2>
<p>
A cluster is a artifact that declares the existence of other artifacts.
Clusters are used during repository synchronization to help
reduce network traffic. As such, clusters are an optimization and
may be removed from a repository without loss or damage to the
underlying project code.
</p>
<p>
Clusters follow a syntax that is very similar to manifests.
A Cluster is a line-oriented text file. Newline characters
(ASCII 0x0a) separate the artifact into cards. Each card begins with a single
character "card type". Zero or more arguments may follow
the card type. All arguments are separated from each other
and from the card-type character by a single space
character. There is no surplus white space between arguments
and no leading or trailing whitespace except for the newline
character that acts as the card separator.
All cards of a cluster occur in strict sorted lexicographical order.
No card may be duplicated.
The cluster may not contain additional text or data beyond
what is described here.
Unlike manifests, clusters are never PGP signed.
</p>
<p>
Allowed cards in the cluster are as follows:
</p>
<blockquote>
<b>M</b> <i>artifact-id</i><br />
<b>Z</b> <i>checksum</i>
</blockquote>
<p>
A cluster contains one or more "M" cards followed by a single "Z"
line. Each M card has a single argument which is the artifact ID of
another artifact in the repository. The Z card work exactly like
the Z card of a manifest. The argument to the Z card is the
lower-case hexadecimal representation of the MD5 checksum of all
prior cards in the cluster. Note that the Z card is required
on a cluster.
</p>
<h2>3.0 Control Artifacts</h2>
<p>
Control artifacts are used to assign properties to other artifacts
within the repository. The basic format of a control artifact is
the same as a manifest or cluster. A control artifact is a text
files divided into cards by newline characters. Each card has a
single-character card type followed by arguments. Spaces separate
the card type and the arguments. No surplus whitespace is allowed.
All cards must occur in strict lexigraphical order.
</p>
<p>
Allowed cards in a control artifact are as follows:
</p>
<blockquote>
<b>D</b> <i>time-and-date-stamp</i><br />
<b>T</b> (<b>+</b>|<b>-</b>|<b>*</b>)<i>tag-name artifact-id ?value?</i><br />
<b>Z</b> <i>checksum</i><br />
</blockquote>
<p>
A control artifact must have one D card and one Z card and
one or more T cards. No other cards or other text is
allowed in a control artifact. Control artifacts might be PGP
clearsigned.</p>
<p>The D card and the Z card of a control artifact are the same
as in a manifest.</p>
<p>The T card represents a "tag" or property that is applied to
some other artifact. The T card has two or three values. The
second argument is the 40 character lowercase artifact ID of the artifact
to which the tag is to be applied. The
first value is the tag name. The first character of the tag
is either "+", "-", or "*". A "+" means the tag should be added
to the artifact. The "-" means the tag should be removed.
The "*" character means the tag should be added to the artifact
and all direct descendants (but not branches) of the artifact down
to but not including the first descendant that contains a
more recent "-" tag with the same name.
The optional third argument is the value of the tag. A tag
without a value is a boolean.</p>
<p>When two or more tags with the same name are applied to the
same artifact, the tag with the latest (most recent) date is
used.</p>
<p>Some tags have special meaning. The "comment" tag when applied
to a check-in will override the check-in comment of that check-in
for display purposes.</p>
<a name="wikichng"></a>
<h2>4.0 Wiki Pages</h2>
<p>A wiki page is an artifact with a format similar to manifests,
clusters, and control artifacts. The artifact is divided into
cards by newline characters. The format of each card is as in
manifests, clusters, and control artifacts. Wiki artifacts accept
the following card types:</p>
<blockquote>
<b>D</b> <i>time-and-date-stamp</i><br />
<b>L</b> <i>wiki-title</i><br />
<b>P</b> <i>parent-artifact-id</i>+<br />
<b>U</b> <i>user-name</i><br />
<b>W</b> <i>size</i> <b>\n</b> <i>text</i> <b>\n</b><br />
<b>Z</b> <i>checksum</i>
</blockquote>
<p>The D card is the date and time when the wiki page was edited.
The P card specifies the parent wiki pages, if any. The L card
gives the name of the wiki page. The U card specifies the login
of the user who made this edit to the wiki page. The Z card is
the usual checksum over the either artifact.</p>
<p>The W card is used to specify the text of the wiki page. The
argument to the W card is an integer which is the number of bytes
of text in the wiki page. That text follows the newline character
that terminates the W card. The wiki text is always followed by one
extra newline.</p>
<a name="tktchng"></a>
<h2>5.0 Ticket Changes</h2>
<p>A ticket-change artifact represents a change to a trouble ticket.
The following cards are allowed on a ticket change artifact:</p>
<blockquote>
<b>D</b> <i>time-and-date-stamp</i><br />
<b>J</b> ?<b>+</b>?<i>name</i> ?<i>value</i>?<br />
<b>K</b> <i>ticket-id</i><br />
<b>U</b> <i>user-name</i><br />
<b>Z</b> <i>checksum</i>
</blockquote>
<p>
The D card is the usual date and time stamp and represents the point
in time when the change was entered. The U card is the login of the
programmer who entered this change. The Z card is the checksum over
the entire artifact.</p>
<p>
Every ticket has a unique ID. The ticket to which this change is applied
is specified by the K card. A ticket exists if it contains one or
more changes. The first "change" to a ticket is what brings the
ticket into existence.</p>
<p>
J cards specify changes to the "value" of "fields" in the ticket.
If the <i>value</i> parameter of the J card is omitted, then the
field is set to an empty string.
Each fossil server has a ticket configuration which specifies the fields its
understands. The ticket configuration is part of the local state for
the repository and thus can vary from one repository to another.
Hence a J card might specify a <i>field</i> that do not exist in the
local ticket configuration. If a J card specifies a <i>field</i> that
is not in the local configuration, then that J card
is simply ignored.</p>
<p>
The first argument of the J card is the field name. The second
value is the field value. If the field name begins with "+" then
the value is appended to the prior value. Otherwise, the value
on the J card replaces any previous value of the field.
The field name and value are both encoded using the character
escapes defined for the C card of a manifest.
</p>