Artifact Content
Not logged in

Artifact 85a1fd60fbebfd33ae1406e9d22130ad36fc4909

File www/fileformat.html part of check-in [dbda8d6ce9] - Initial check-in of m1 sources. by drh on 2007-07-21 14:10:57.

<html>
<head>
<title>Fossil File Formats</title>
</head>
<body bgcolor="white">
<h1 align="center">
Fossil File Formats
</h1>

<p>
The global state of a fossil repository is determined by an unordered
set of content files.  Each of these files has a format which is defined
by this document.
</p>

<h2>1.0 General Formatting Rules</h2>

<p>
Fossil content files consist of a header, a blank line, and optional
content.
</p>

<p>
The header is divided into "properties" by newline ('\n', 0x0a)
characters.  Each header property is divided into tokens by space (' ', 0x20)
characters.  The first token of each property is the property name.
Subsequent tokens (if any) are arguments to the property.
</p>

<p>
The blank line that separates the header from the content can be
thought of as a property line that contains no tokens.  Everything
that follows the newline character that terminates the blank line
is content.  The blank line is always present but the content is
optional.
</p>

<p>
All tokens in a property line are encoded to escape special characters.
The encoding is as follows:
</p>

<blockquote>
<table border="1">
<tr><th>Input Character</th><th>Encoded As</th></tr>
<tr><td align="center"> space (0x20) </td><td align="center"> \s </td></tr>
<tr><td align="center"> newline (0x0A) </td><td align="center"> \n </td></tr>
<tr><td align="center"> carriage return (0x0D) </td><td align="center"> \r </td></tr>
<tr><td align="center"> tab (0x09) </td><td align="center"> \t </td></tr>
<tr><td align="center"> vertical tab (0x0B) </td><td align="center"> \v </td></tr>
<tr><td align="center"> formfeed (0x0C) </td><td align="center"> \f </td></tr>
<tr><td align="center"> nul (0x00) </td><td align="center"> \0 </td></tr>
<tr><td align="center"> backslash (0x5C) </td><td align="center"> \\ </td></tr>
</table>
</blockquote>

<p>
Characters other than the ones shown in the table above are passed through
the encoder without change.
</p>

<p>
All properties names are unpunctuated lower-case ASCII strings.
The properties appear in the header in sorted order (using
memcpy() as the comparision function) except for the "signature"
property which always occurs first.
</p>

<h2>2.0 Common Properties</h2>

<p>
Every content file has a "time" property.  The argument to the
time property is an integer which is the number of seconds since
1970 UTC when the content file was created.  For example:
</p>

<blockquote>
time 1181404746
</blockquote>

<p>
Every content file has a "type" property.  The argument to the
type property defines the purpose of the content file.  The
argument can be strings like "version", "folder", "file", or "user".
</p>

<p>
The first property of a content file is the digital signature.  The
name of the signature property is "signature".  There are two arguments.
The first argument is the SHA256 hash of the content file that defines
the user who signed this file.  User records themselves are self-signed
and so the first argument is simply "*" for user records.  The second
argument is the digital signature of an SHA256 hash of the entire
file (header and content) except for the signature line itself.
</p>