Artifact d48039839878b9c4a4be9ebf1b16b45fe2db5793
File www/fileformat.html part of check-in [f9f7cf5684] - The autosync setting understands values like "on", "off", "true", and "false" in addition to 0 and 1. Updates to the documentation. by drh on 2007-11-24 02:45:39.
[ Index ]
Fossil File Formats
The global state of a fossil repository is determined by an unordered set of files. A file in fossil is called an "artifact". An artifact might be a source code file, the text of a wiki page, part of a trouble ticket, or one of several special control artifacts used to show the relationships between other artifacts within the project. Artifacts can be text or binary.
Each artifact in the repository is named by its SHA1 hash. No prefixes or meta information is added to a artifact before its hash is computed. The name of a artifact in the repository is exactly the same SHA1 hash that is computed by sha1sum on the file as it exists in your source tree.
Some artifacts have a particular format which qualifies them as "manifests". A manifest assigns filenames to a subset of the artifacts in the repository, in order to provide a snapshot of the state of the project at a point in time. Each manifest corresponds to a version or baseline of the project.
1.0 The Manifest
Any artifact in the repository that follows the syntactic rules of a manifest is a manifest. Note that a manifest can be both a real manifest and also a content file, though this is rare.
A manifest is a line-oriented text file. Newline characters (ASCII 0x0a) separate lines. Each line is called a "card". Each card begins with a single character "card type". Zero or more arguments may follow the card type. All arguments are separated from each other and from the card-type character by a single space character. There is no surplus white space between arguments and no leading or trailing whitespace except for the newline character that acts as the card separator.
All cards of the manifest occur in strict sorted lexicographical order. No card may be duplicated. The entire manifest may be PGP clear-signed, but otherwise it may contain no additional text or data beyond what is described here.
Allowed cards in the manifest are as follows:
C checkin-comment
D time-and-date-stamp
F filename SHA1-hash
P SHA1-hash+
R repository-checksum
U user-login
Z manifest-checksum
A manifest must have exactly one C-card. The sole argument to the C-card is a check-in comment that describes the check-in that the manifest defines. The check-in comment is text. The following escape sequences are applied to the text: A space (ASCII 0x20) is represented as "\s" (ASCII 0x5C, 0x73). A newline (ASCII 0x0a) is "\n" (ASCII 0x6C, x6E). A backslash (ASCII 0x5C) is represented as two backslashes "\\". Apart from space and newline, no other whitespace characters are allowed in the check-in comment. Nor are any unprintable characters allowed in the comment.
A manifest must have exactly one D-card. The sole argument to the D-card is a date-time stamp in the ISO8601 format. The date and time should be in coordinated universal time (UTC). The format is:
YYYY-MM-DDTHH:MM:SS
A manifest has zero or more F-cards. Each F-card defines a file (other than the manifest itself) which is part of the baseline that the manifest defines. There are two arguments. The first argment is the pathname of the file in the baseline relative to the root of the project file hierarchy. No ".." or "." directories are allowed within the filename. Space characters are escaped as in C-card comment text. Backslash characters and newlines are not allowed within filenames. The directory separator character is a forward slash (ASCII 0x2F). The second argument to the F-card is the full 40-character lower-case hexadecimal SHA1 hash of the content artifact.
A manifest has zero or one P-cards. Most manifests have one P-card. The P-card has a varying number of arguments that defines other manifests from which the current manifest is derived. Each argument is an 40-character lowercase hexadecimal SHA1 of the predecessor manifest. All arguments to the P-card must be unique to that line. The first predecessor is the direct ancestor of the manifest. Other arguments define manifests with which the first was merged to yield the current manifest. Most manifests have a P-card with a single argument. The first manifest in the project has no ancestors and thus has no P-card.
A manifest may optionally have a single R-card. The R-card has a single argument which is the MD5 checksum of all files in the baseline except the manifest itself. The checksum is expressed as 32-characters of lowercase hexadecimal. The checksum is computed as follows: For each file in the baseline (except for the manifest itself) in strict sorted lexicographical order, take the pathname of the file relative to the root of the repository, append a single space (ASCII 0x20), the size of the file in ASCII decimal, a single newline character (ASCII 0x0A), and the complete text of the file. Compute the MD5 checksum of the the result.
Each manifest has a single U-card. The argument to the U-card is the login of the user who created the manifest. The login name is encoded using the same character escapes as is used for the check-in comment argument to the C-card.
A manifest has an option Z-card as its last line. The argument to the Z-card is a 32-character lowercase hexadecimal MD5 hash of all prior lines of the manifest up to and including the newline character that immediately preceeds the "Z". The Z-card is just a sanity check to prove that the manifest is well-formed and consistent.
2.0 Clusters
A cluster is a artifact that declares the existance of other artifacts. Clusters are used during repository synchronization to help reduce network traffic.
Clusters follow a syntax that is very similar to manifests. A Cluster is a line-oriented text file. Newline characters (ASCII 0x0a) separate the artifact into cards. Each card begins with a single character "card type". Zero or more arguments may follow the card type. All arguments are separated from each other and from the card-type character by a single space character. There is no surplus white space between arguments and no leading or trailing whitespace except for the newline character that acts as the card separator. All cards of a cluter occur in strict sorted lexicographical order. No card may be duplicated. The cluster may not contain additional text or data beyond what is described here. Unlike manifests, clusters are never PGP signed.
Allowed cards in the cluster are as follows:
M uuid Z checksum
A cluster contains one or more "M" cards followed by a single "Z" line. Each M card has a single argument which is the UUID of another artifact in the repository. The Z card work exactly like the Z card of a manifest. The argument to the Z card is the lower-case hexadecimal representation of the MD5 checksum of all prior cards in the cluster. Note that the Z card is required on a cluster.
3.0 Control Artifacts
Control artifacts are used to assign properties to other artifacts within the repository. The basic format of a control artifact is the same as a manifest or cluster. A control artifact is a text files divided into cards by newline characters. Each card has a single-character card type followed by arguments. Spaces separate the card type and the arguments. No surplus whitespace is allowed. All cards must occur in strict lexigraphical order.
Allowed cards in a control artifact are as follows:
D time-and-date-stamp
T tag-name uuid ?value?
Z checksum
A control artifact must have one D card and one Z card and one or more or more T cards. No other cards or other text is allowed in a control artifact. Control artifacts might be PGP clearsigned.
The D card and the Z card of a control artifact are the same as in a manifest.
The T card represents a "tag" or property that is applied to some other artifact. The T card has two or three values. The second argument is the 40 character lowercase UUID of the artifact to which the tag is to be applied. The first value is the tag name. The first character of the tag is either "+", "-", or "*". A "+" means the tag should be added to the artifact. The "-" means the tag should be removed. The "*" character means the tag should be added to the artifact and all direct decendents (but not branches) of the artifact. The optional third argument is the value of the tag. A tag without a value is considered to be a boolean.
When two or more tags with the same name are applied to the same artifact, the tag with the latest (most recent) date is used.
Some tags have special meaning. The "comment" tag when applied to a baseline will override the check-in comment of that baseline for display purposes.
4.0 Wiki Pages
A wiki page is an artifact in a format similar to manifests, clusters, and control artifacts. The artifact is divided into cards by newline characters. The format of each card is as in manifests, clusters, and control artifacts. Wiki artifacts accept the following card types:
D time-and-date-stamp
L wiki-title
U user-name
W size \n text \n
Z checksum
5.0 Ticket Changes
D time-and-date-stamp
J ?+?name value
K ticket-uuid
U user-name
Z checksum