Fossil: Artifact Content

Artifact 4b7a0c18217f58fb6793f31988683bbd915c5d3f

File www/sync.html part of check-in [f76192b245] - Pulled the latest CLI, website, and sqlite changes into the importer branch. by aku on 2007-09-17 01:00:32.
<html>
<head>
<title>The Fossil Sync Protocol</title>
</head>
<body bgcolor="white">
<p>[ <a href="index.html">Index</a> ]</p>
<hr>
<h1 align="center">The Fossil Sync Protocol</h1>

<p>Fossil supports commands <b>push</b>, <b>pull</b>, and <b>sync</b>
for transferring information from one repository to another.  The
command is run on the client repository.  A URL for the server repository
is specified as part of the command.  This document describes what happens
behind the scenes in order to synchronize the information on the two
repositories.</p>

<h2>1.0 Transport</h2>

<p>All communication between client and server is via HTTP requests.
The server is listening for incoming HTTP requests.  The client
issues one or more HTTP requests and receives replies for each
request.</p>

<p>The server might be running as an independent server
using the <b>server</b> command, or it might be launched from
inetd or xinetd using the <b>http</b> command.  Or the server might
be launched from CGI.  The details of how the server is configured
to "listen" for incoming HTTP requests is immaterial.  The important
point is that the server is listening for requests and the client
is the issuer of the requests.</p>

<p>A single push, pull, or sync might involve multiple HTTP requests.
The client maintains state between all requests.  But on the server
side, each request is independent.  The server does not preserve
any information about the client from one request to the next.</p>

<h3>1.1 Server Identification</h3>

<p>The server is identified by a URL argument that accompanies the
push, pull, or sync command on the client.  (As a convenience to
users, the URL can be omitted on the client command and the same URL
from the most recent push, pull, or sync will be reused.  This saves
typing in the common case where the client does multiple syncs to
the same server.)</p>

<p>The client modifies the URL by appending the method name "<b>/xfer</b>"
to the end.  For example, if the URL specified on the client command
line is</p>

<blockquote>
http://fossil-scm.hwaci.com/fossil
</blockquote>

<p>Then the URL that is really used to do the synchronization will
be:</p>

<blockquote>
http://fossil-scm.hwaci.com/fossil/xfer
</blockquote>

<h3>1.2 HTTP Request Format</h3>

<p>The client always sends a POST request to the server.  The
general format of the POST request is as follows:</p>

<blockquote><pre>
POST /fossil/xfer HTTP/1.0
Host: fossil-scm.hwaci.com:80
Content-Type: application/x-fossil
Content-Length: 4216

<i>content...</i>
</pre></blockquote>

<p>In the example above, the pathname given after the POST keyword
on the first line is a copy of the URL pathname.  The Host: parameter
is also taken from the URL.  The content type is always either
"application/x-fossil" or "application/x-fossil-debug".  The "x-fossil"
content type is the default.  The only difference is that "x-fossil"
content is compressed using zlib whereas "x-fossil-debug" is sent
uncompressed.</p>

<p>A typical reply from the server might look something like this:</p>

<blockquote><pre>
HTTP/1.0 200 OK
Date: Mon, 10 Sep 2007 12:21:01 GMT
Connection: close
Cache-control: private
Content-Type: application/x-fossil; charset=US-ASCII
Content-Length: 265

<i>content...</i>
</pre></blockquote>

<p>The content type of the reply is always the same as the content type
of the request.</p>

<h2>2.0 Fossil Synchronization Content</h2>

<p>A synchronization request between a client and server consists of
one or more HTTP requests as described in the previous section.  This
section details the "x-fossil" content type.</p>

<h3>2.1 Line-oriented Format</h3>

<p>The x-fossil content type consists of zero or more "cards".  Cards
are separate by the newline character ("\n").  Leading and trailing
whitespace on a card is ignored.  Blank cards are ignored.</p>

<p>Each card is divided into zero or more space separated tokens.
The first token on each card is the operator.  Subsequent tokens
are arguments.  The set of operators understood by servers is slightly
different from the operators understood by clients, though the two
are very similar.</p>

<h3>2.2 Login Cards</h3>

<p>Every message from client to server begins with one or more login
cards.  Each login card has the following format:</p>

<blockquote>
<b>login</b>  <i>userid  nonce  signature</i>
</blockquote>

<p>The userid is the name of the user that is requesting service
from the server.  The nonce is the SHA1 hash of the remainder of
the message - all text that follows the newline character that
terminates the login card.  The signature is the SHA1 hash of
the concatenation of the nonce and the users password.</p>

<p>For each login card, the server looks up the user and verifies
that the nonce matches the SHA1 hash of the remainder of the
message.  It then checks the signature hash to make sure the 
signature matches.  If everything
checks out, then the client is granted all privileges of the
specified user.</p>

<p>Privileges are cumulative.  There can be multiple successful
login cards.  The session privileges are the bit-wise OR of the
privileges of each individual login.</p>

<h3>2.3 File Cards</h3>

<p>Repository content records or files are transferred using
a "file" card.  File cards come in two different formats depending
on whether the file is sent directly or as a delta from some
other file.</p>

<blockquote>
<b>file</b> <i>uuid size</i> <b>\n</b> <i>content</i><br>
<b>file</b> <i>uuid delta-uuid size</i> <b>\n</b> <i>content</i>
</blockquote>

<p>File cards are different from all other cards in that they
followed by in-line "payload" data.  The content of the file
or the file delta consists of the first <i>size</i> bytes of the
x-fossil content that immediately follow the newline that
terminates the file card.  No other cards have this characteristic.
</p>

<p>The first argument of a file card is the UUID of the file that
is being transferred.  The UUID is the lower-case hexadecimal
representation of the SHA1 hash of the entire file content.
The last argument of the file card is the number of bytes of
payload that immediately follow the file card.  If the file
card has only two arguments, that means the payload is the
complete content of the file.  If the file card has three
arguments, then the payload is a delta and second argument is
the UUID of another file that is the source of the delta.</p>

<p>File cards are sent in both directions: client to server and
server to client.  A delta might be sent before the source of
the delta, so both client and server should remember deltas
and be able to apply them when their source arrives.</p>

<h3>2.4 Push and Pull Cards</h3>

<p>Among of the first cards in a client-to-server message are
the push and pull cards.  The push card tell the server that
the client is pushing content.  The pull card tell the server
that the client wants to pull content.  In the event of a sync,
both cards are sent.  The format is as follows:</p>

<blockquote>
<b>push</b> <i>servercode projectcode</i><br>
<b>pull</b> <i>servercode projectcode</i>
</blockquote>

<p>The <i>servercode</i> argument is the repository ID for the
client.  The server will only allow the transaction to proceed
if the servercode is different from its own servercode.  This
prevents a sync-loop.  The <i>projectcode</i> is the identifier
of the software project that the client repository contains.
The projectcode for the client and server must match in order
for the transaction to proceed.</p>

<p>The server will also send a push card back to the client
during a clone.  This is how the client determines what project
code to put in the new repository it is constructing.</p>

<h3>2.5 Clone Cards</h3>

<p>A clone card works like a pull card in that it is sent from
client to server in order to tell the server that the client
wants to pull content.  But unlike the pull card, the clone
card has no arguments.</p>

<blockquote>
<b>clone</b>
</blockquote>

<p>In response to a clone message, the server also sends the client
a push message so that the client can discover the projectcode for
this project.</p>

<h3>2.6 Igot Cards</h3>

<p>An igot card can be sent from either client to server or from
server to client in order to indicate that the sender holds a copy
of a particular file.  The format is:</p>

<blockquote>
<b>igot</b> <i>uuid</i>
</blockquote>

<p>The argument of the igot card is the UUID of the file that
the sender possesses.
The receiver of an igot card will typically check to see if
it also holds the same file and if not it will request the file
using a gimme card in either the reply or in the next message.</p>

<h3>2.7 Gimme Cards</h3>

<p>A gimme card is sent from either client to server or from server
to client.  The gimme card asks the receiver to send a particular
file back to the sender.  The format of a gimme card is this:</p>

<blockquote>
<b>gimme</b> <i>uuid</i>
</blockquote>

<p>The argument to the gimme card is the UUID of the file that
the sender wants.  The receiver will typically respond to a
gimme card by sending a file card in its reply or in the next
message.</p>

<h3>2.8 Cookie Cards</h3>

<p>A cookie card can be used by a server to record a small amount
of state information on a client.  The server sends a cookie to the
client.  The client sends the same cookie back to the server on
its next request.  The cookie card has a single argument which
is its payload.</p>

<blockquote>
<b>cookie</b> <i>payload</i>
</blockquote>

<p>The client is not required to return the cookie to the server on
its next request.  Or the client might send a cookie from a different
server on the next request.  So the server must not depend on the
cookie and the server must structure the cookie payload in such
a way that it can tell if the cookie it sees is its own cookie or
a cookie from another server.  (Typically the server will embed
its servercode as part of the cookie.)</p>

<h3>2.9 Error Cards</h3>

<p>If the server discovers anything wrong with a request, it generates
an error card in its reply.  When the client sees the error card,
it displays an error message to the user and aborts the sync
operation.  An error card looks like this:</p>

<blockquote>
<b>error</b> <i>error-message</i>
</blockquote>

<p>The error message is English text that is encoded in order to
be a single token.
A space (ASCII 0x20) is represented as "\s" (ASCII 0x5C, 0x73).  A
newline (ASCII 0x0a) is "\n" (ASCII 0x6C, x6E).  A backslash 
(ASCII 0x5C) is represented as two backslashes "\\".  Apart from
space and newline, no other whitespace characters nor any
unprintable characters are allowed in
the error message.</p>

<h3>2.10 Unknown Cards</h3>

<p>If either the client or the server sees a card that is not
described above, then it generates an error and aborts.</p>

<h2>3.0 Phantoms And Clusters</h2>

<p>When a repository knows that a file exists and knows the UUID of
that file, but it does not know the file content, then it stores that
file as a "phantom".  A repository will typically create a phantom when
it receives an igot card for a file that it does not hold or when it
receives a file card that references a delta source that it does not
hold.  When a server is generating its reply or when a client is
generating a new request, it will usually send gimme cards for every
phantom that it holds.</p>

<p>A cluster is a special file that tells of the existence of other
files.  Any file in the repository that follows the syntactic rules
of a cluster is considered a cluster.</p>

<p>A cluster is a line oriented file.  Each line of a cluster
is a card.  The cards are separated by the newline ("\n") character.
Each card consists of a single character card type, a space, and a
single argument.  No extra whitespace and no trailing or leading
whitespace is allowed.  All cards in the cluster must occur in
strict lexicographical order.</p>

<p>A cluster consists of one or more "M" cards followed by a single
"Z" card.  Each M card holds an argument which is a UUID for a file
in the repository.  The Z card has a single argument which is the
lower-case hexadecimal representation of the MD5 checksum of all
preceding M cards up to and included the newline character that
occurred just before the Z that starts the Z card.</p>

<p>Any file that does not match the specifications of a cluster
exactly is not a cluster.  There must be no extra whitespace in
the file.  There must be one or more M cards.  There must be a
single Z card with a correct MD5 checksum.  And all cards must
be in strict lexicographical order.</p>

<h3>3.1 The Unclustered Table</h3>

<p>Every repository maintains a table named "<b>unclustered</b>"
which records the identity of every file and phantom it holds that is not
mentioned in a cluster.  The entries in the unclustered table can
be thought of as leaves on a tree of files.  Some of the unclustered
files will be clusters.  Those clusters may contain other clusters,
which might contain still more clusters, and so forth.  Beginning
with the files in the unclustered table, one can follow the chain
of clusters to find every file in the repository.</p>

<h2>4.0 Synchronization Strategies</h2>

<h3>4.1 Pull</h3>

<p>A typical pull operation proceeds as shown below.  Details
of the actual implementation may very slightly but the gist of
a pull is captured in the following steps:</p>

<ol>
<li>The client sends login and pull cards.
<li>The client sends a cookie card if it has previously received a cookie.
<li>The client sends gimme cards for every phantom that it holds.
<hr>
<li>The server checks the login password and rejects the session if
the user does not have permission to pull.
<li>If the number entries in the unclustered table on the server is
greater than 100, then the server constructs a new cluster file to
cover all those unclustered entries.
<li>The server sends file cards for every gimme card it received
from the client.
<li>The server sends ihave cards for every file in its unclustered
table that is not a phantom.
<hr>
<li>The client adds the content of file cards to its repository.
<li>The client creates a phantom for every ihave card in the server reply
that mentions a file that the client does not possess.
<li>The client creates a phantom for the delta source of file cards when
the delta source is a file that the client does not possess.
</ol>

<p>These ten steps represent a single HTTP round-trip request.
The first three steps are the processing that occurs on the client
to generate the request.  The middle four steps are processing
that occurs on the server to interpret the request and generate a
reply.  And the last three steps are the processing that the
client does to interpret the reply.</p>

<p>During a pull, the client will keep sending HTTP requests
until it holds all files that exist on the server.</p>

<p>Note that the server tries
to limit the size of its reply message to something reasonable
(usually about 1MB) so that it might stop sending file cards as
described in step (6) if the reply becomes too large.</p>

<p>Step (5) is the only way in which new clusters can be created.
By only creating clusters on the server, we hope to minimize the
amount of overlap between clusters in the common configuration where
there is a single server and many clients.  The same synchronization
protocol will continue to work even if there are multiple servers
or if servers and clients sometimes change roles.  The only negative
effects of these unusual arrangements is that more than the minimum
number of clusters might be generated.</p>

<h3>4.2 Push</h3>

<p>A typical push operation proceeds roughly as shown below.  As
with a pull, the actual implementation may vary slightly.</p>

<ol>
<li>The client sends login and push cards.
<li>The client sends file cards for any files that it holds that have
never before been pushed - files that come from local check-ins.
<li>If this is the second or later cycle in a push, then the
client sends file cards for any gimme cards that the server sent
in the previous cycle.
<li>The client sends igot cards for every file in its unclustered table
that is not a phantom.
<hr>
<li>The server checks the login and push cards and issues an error if
anything is amiss.
<li>The server accepts file cards from the client and adds those files
to its repository.
<li>The server creates phantoms for igot cards that mention files it
does not possess or for file cards that mention delta source files that
it does not possess.
<li>The server issues gimme cards for all phantoms.
<hr>
<li>The client remembers the gimme cards from the server so that it
can generate file cards in reply on the next cycle.
</ol>

<p>As with a pull, the steps of a push operation repeat until the
server knows all files that exist on the client.  Also, as with
pull, the client attempts to keep the size of the request from
growing too large by suppressing file cards once the
size of the request reaches 1MB.</p>

<h3>4.3 Sync</h3>

<p>A sync is just a pull and a push that happen at the same time.
The first three steps of a pull are combined with the first five steps
of a push.  Steps (4) through (7) of a pull are combined with steps
(5) through (8) of a push.  And steps (8) through (10) of a pull
are combined with step (9) of a push.</p>

<h2>5.0 Summary</h2>

<p>Here are the key points of the synchronization protocol:</p>

<ol>
<li>The client sends one or more PUSH HTTP requests to the server.
    The request and reply content type is "application/x-fossil".
<li>HTTP request content is compressed using zlib.
<li>The content of request and reply consists of cards with one
    card per line.  
<li>Card formats are:
    <ul>
    <li> <b>login</b> <i>userid nonce signature</i>
    <li> <b>push</b> <i>servercode projectcode</i>
    <li> <b>pull</b> <i>servercode projectcode</i>
    <li> <b>clone</b>
    <li> <b>file</b> <i>uuid size</i> <b>\n</b> <i>content</i>
    <li> <b>file</b> <i>uuid delta-uuid size</i> <b>\n</b> <i>content</i>
    <li> <b>igot</b> <i>uuid</i>
    <li> <b>gimme</b> <i>uuid</i>
    <li> <b>cookie</b>  <i>cookie-text</i>
    <li> <b>error</b> <i>error-message</i>
    </ul>
<li>Phantoms are files that a repository knows exist but does not possess.
<li>Clusters are files that contain the UUIDs of other files.
<li>Clusters are created automatically on the server during a pull.
<li>Repositories keep track of all files that are not named in any
cluster and send igot messages for those files.
<li>Repositories keep track of all the phantoms they hold and send
gimme messages for those files.
</ol>

</body>
</html>