Check-in [26131cfcc5]
Not logged in
Overview

SHA1 Hash:26131cfcc5fe3b5f852b5c14150cb7988f9117f6
Date: 2007-09-10 00:39:28
User: drh
Comment:Add a first draft of the synchronization protocol document. Unproofed.
Timelines: ancestors | descendants | both | trunk
Other Links: files | ZIP archive | manifest

Tags And Properties
Changes
[hide diffs]

Modified www/index.html from [976b9491be] to [a0290afac4].

@@ -90,9 +90,10 @@
 file stored in the repository.</li>
 <li>The <a href="delta_format.html">format of deltas</a> used to
 efficiently store changes between file revisions.</li>
 <li>The <a href="delta_encoder_algorithm.html">encoder algorithm</a> used to
 efficiently generate deltas.</li>
+<li>The <a href="sync.html">synchronization protocol</a>.
 </ul>
 
 </body>
 </html>

Added www/sync.html version [ca710f907d]

@@ -1,1 +1,463 @@
+<html>
+<head>
+<title>The Fossil Sync Protocol</title>
+</head>
+<body bgcolor="white">
+<h1 align="center">The Fossil Sync Protocol</h1>
+
+<p>Fossil supports commands <b>push</b>, <b>pull</b>, and <b>sync</b>
+for transferring information from one repository to another.  The
+command is run on the client repository.  A URL for the server repository
+is specified as part of the command.  This document describes what happens
+behind the scenes in order to synchronize the information on the two
+repositories.</p>
+
+<h2>1.0 Transport</h2>
+
+<p>All communication between client and server is via HTTP requests.
+The server is listening for incoming HTTP requests.  The client
+issues one or more HTTP requests and receives replies for each
+request.</p>
+
+<p>The server might be running as an independent server
+using the <b>server</b> command, or it might be launched from
+inetd or xinetd using the <b>http</b> command.  Or the server might
+be launched from CGI.  The details of how the server is configured
+to "listen" for incoming HTTP requests is immaterial.  The important
+point is that the server is listening for requests and the client
+is the issuer of the requests.</p>
+
+<p>A single push, pull, or sync might involve multiple HTTP requests.
+The client maintains state between all requests.  But on the server
+side, each request is independent.  The server does not preserve
+any information about the client from one request to the next.</p>
+
+<h3>1.1 Server Identification</h3>
+
+<p>The server is identified by a URL argument that accompanies the
+push, pull, or sync command on the client.  (As a convenience to
+users, the URL can be omitted on the client command and the same URL
+from the most recent push, pull, or sync will be reused.  This saves
+typing in the common case where the client does multiple syncs to
+the same server.)</p>
+
+<p>The client modifies the URL by appending the method name "<b>/xfer</b>"
+to the end.  For example, if the URL specified on the client command
+line is</p>
+
+<blockquote>
+http://fossil-scm.hwaci.com/fossil
+</blockquote>
+
+<p>Then the URL that is really used to do the synchronization will
+be:</p>
+
+<blockquote>
+http://fossil-scm.hwaci.com/fossil/xfer
+</blockquote>
+
+<h3>1.2 HTTP Request Format</h3>
+
+<p>The client always sends a POST request to the server.  The
+general format of the POST request is as follows:</p>
+
+<blockquote><pre>
+POST /fossil/xfer HTTP/1.0
+Host: fossil-scm.hwaci.com:80
+Content-Type: application/x-fossil
+Content-Length: 4216
+
+<i>content...</i>
+</pre></blockquote>
+
+<p>In the example above, the pathname given after the POST keyword
+on the first line is a copy of the URL pathname.  The Host: parameter
+is also taken from the URL.  The content type is always either
+"application/x-fossil" or "application/x-fossil-debug".  The "x-fossil"
+content type is the default.  The only difference is that "x-fossil"
+content is compressed using zlib whereas "x-fossil-debug" is sent
+uncompressed.</p>
+
+<p>A typical reply from the server might look something like this:</p>
+
+<blockquote><pre>
+HTTP/1.0 200 OK
+Date: Mon, 10 Sep 2007 12:21:01 GMT
+Connection: close
+Cache-control: private
+Content-Type: application/x-fossil; charset=US-ASCII
+Content-Length: 265
+
+<i>content...</i>
+</pre></blockquote>
+
+<p>The content type of the reply is always the same as the content type
+of the request.</p>
+
+<h2>2.0 Fossil Synchronization Content</h2>
+
+<p>A synchronization request between a client and server consists of
+one or more HTTP requests as described in the previous section.  This
+section details the "x-fossil" content type.</p>
+
+<h3>2.1 Line-oriented Format</h3>
+
+<p>The x-fossil content type consists of zero or more "cards".  Cards
+are separate by the newline character ("\n").  Leading and trailing
+whitespace on a card is ignore.  Blank cards are ignored.</p>
+
+<p>Each card is divided into zero or more space separated tokens.
+The first token on each card is the operator.  Subsequent tokens
+are arguments.  The set of operators understood by servers is slightly
+different from the operators understood by clients, though the two
+are very similar.</p>
+
+<h3>2.2 Login Cards</h3>
+
+<p>Every message from client to server begins with one more login
+cards.  Each login card has the following format:</p>
+
+<blockquote>
+<b>login</b>  <i>userid  nonce  signature</i>
+</blockquote>
+
+<p>The userid is the name of the user that is requesting service
+from the server.  The nonce is a random one-use hexadecimal number.
+The signature is the SHA1 hash of the users password.</p>
+
+<p>For each login card, the server looks up the user and verifies
+that the nonce has never before been used.  It then checks the
+signature hash to make sure the signature matches.  If everything
+checks out, then the client is granted all privileges of the
+specified user.</p>
+
+<p>Privileges are cumulative.  There can be multiple successful
+login cards.  The session privileges are the bit-wise OR of the
+privileges of each individual login.</p>
+
+<h3>2.3 File Cards</h3>
+
+<p>Repository content records or files are transferred using
+a "file" card.  File cards come in two different formats depending
+on whether the file is sent directly or as a delta from some
+other file.</p>
+
+<blockquote>
+<b>file</b> <i>uuid size</i> <b>\n</b> <i>content</i><br>
+<b>file</b> <i>uuid delta-uuid size</i> <b>\n</b> <i>content</i>
+</blockquote>
+
+<p>File cards are different from all other cards in that they
+followed by in-line "payload" data.  The content of the file
+or the file delta consists of first <i>size</i> bytes of the
+x-fossil content that immediately follow the newline that
+terminates the file card.  No other cards have this characteristic.
+</p>
+
+<p>The first argument of a file card is the UUID of the file that
+is being transferred.  The UUID is the lower-case hexadecimal
+representation of the SHA1 hash of the entire file content.
+The last argument of the file card is the number of bytes of
+payload that immediately follow the file card.  If the file
+card has only two arguments, that means the payload is the
+complete content of the file.  If the file card has three
+arguments, then the payload is a delta and second argument is
+the UUID of another file that is the source of the delta.</p>
+
+<p>File cards are sent in both directions: client to server and
+server to client.  A delta might be sent before the source of
+the delta, so both client and server should remember deltas
+and be able to apply them when their source arrives.</p>
+
+<h3>2.4 Push and Pull Cards</h3>
+
+<p>Among of the first cards in a client-to-server message are
+the push and pull cards.  The push card tell the server that
+the client is pushing content.  The pull card tell the server
+that the client wants to pull content.  In the event of a sync,
+both cards are sent.  The format is as follows:</p>
+
+<blockquote>
+<b>push</b> <i>servercode projectcode</i><br>
+<b>pull</b> <i>servercode projectcode</i>
+</blockquote>
+
+<p>The <i>servercode</i> argument is the repository ID for the
+client.  The server will only allow the transaction to proceed
+if the servercode is different from its own servercode.  This
+prevents a sync-loop.  The <i>projectcode</i> is the identifier
+of the software project that the client repository contains.
+The projectcode for the client and server must match in order
+for the transaction to proceed.</p>
+
+<p>The server will also send a push card back to the client
+during a clone.  This is how the client determines what project
+code to put in the new repository it is constructing.</p>
+
+<h3>2.5 Clone Cards</h3>
+
+<p>A clone card works like a pull card in that it is sent from
+client to server in order to tell the server that the client
+wants to pull content.  But unlike the pull card, the clone
+card has no arguments.</p>
+
+<blockquote>
+<b>clone</b>
+</blockquote>
+
+<p>In response to a clone message, the server also sends the client
+a push message so that the client can discover the projectcode for
+this project.</p>
+
+<h3>2.6 Igot Cards</h3>
+
+<p>An igot card can be sent from either client to server or from
+server to client in order to indicate that the sender holds a copy
+of a particular file.  The format is:</p>
+
+<blockquote>
+<b>igot</b> <i>uuid</i>
+</blockquote>
+
+<p>The argument of the igot card is the UUID of the file that
+the sender possesses.
+The receiver of an igot card will typically check to see if
+it also holds the same file and if not it will request the file
+using a gimme card in either the reply or in the next message.</p>
+
+<h3>2.7 Gimme Cards</h3>
+
+<p>A gimme card is sent from either client to server or from server
+to client.  The gimme card asks the receiver to send a particular
+file back to the sender.  The format of a gimme card is this:</p>
+
+<blockquote>
+<b>gimme</b> <i>uuid</i>
+</blockquote>
+
+<p>The argument to the gimme card is the UUID of the file that
+the sender wants.  The receiver will typically respond to a
+gimme card by sending a file card in its reply or in the next
+message.</p>
+
+<h3>2.8 Cookie Cards</h3>
+
+<p>A cookie card can be used by a server to record a small amount
+of state information on a client.  The server sends a cookie to the
+client.  The client sends the same cookie back to the server on
+its next request.  The cookie card has a single argument which
+is its payload.</p>
+
+<blockquote>
+<b>cookie</b> <i>payload</i>
+</blockquote>
+
+<p>The client is not required to return the cookie to the server on
+its next request.  Or the client might send a cookie from a different
+server on the next request.  So the server must not depend on the
+cookie and the server must structure the cookie payload in such
+a way that it can tell if the cookie it sees is its own cookie or
+a cookie from another server.  (Typically the server will embed
+its servercode as part of the cookie.)</p>
+
+<h3>2.9 Error Cards</h3>
+
+<p>If the server discovers anything wrong with a request, it generates
+an error card in its reply.  When the client sees the error card,
+it displays an error message to the user and aborts the sync
+operation.  An error card looks like this:</p>
+
+<blockquote>
+<b>error</b> <i>error-message</i>
+</blockquote>
+
+<p>The error message is English text that is encoded in order to
+be a single token.
+A space (ASCII 0x20) is represented as "\s" (ASCII 0x5C, 0x73).  A
+newline (ASCII 0x0a) is "\n" (ASCII 0x6C, x6E).  A backslash
+(ASCII 0x5C) is represented as two backslashes "\\".  Apart from
+space and newline, no other whitespace characters nor any
+unprintable characters are allowed in
+the error message.</p>
+
+<h3>2.10 Unknown Cards</h3>
+
+<p>If either the client or the server sees a card that is not
+described above, then it generates an error and aborts.</p>
+
+<h2>3.0 Phantoms And Clusters</h2>
+
+<p>When a repository knows that a file exists and knows the UUID of
+that file, but it does not know the file content, then it stores that
+file as a "phantom".  A repository will typically create a phantom when
+it receives an igot card for a file that it does not hold or when it
+receives a file card that references a delta source that it does not
+hold.  When a server is generating its reply or when a client is
+generating a new request, it will usually send gimme cards for every
+phantom that it holds.</p>
+
+<p>A cluster is a special file that tells of the existance of other
+files.  Any file in the repository that follows the syntactic rules
+of a cluster is considered a cluster.</p>
+
+<p>A cluster consists is a line oriented file.  Each line of a cluster
+is a card.  The cards are separated by the newline ("\n") character.
+Each card consists of a single character card type, a space, and a
+single argument.  No extra whitespace and no trailing or leading
+whitespace is allowed.  All cards in the cluster must occur in
+strict lexigraphical order.</p>
+
+<p>A cluster consists of one or more "M" cards followed by a single
+"Z" card.  Each M card holds an argument which is a UUID for a file
+in the repository.  The Z card has a single argument which is the
+lower-case hexadecimal representation of the MD5 checksum of all
+preceeding M cards up to and included the newline character that
+occurred just before the Z that starts the Z card.</p>
+
+<p>Any file that does not match the specifications of a cluster
+exactly is not a cluster.  There must be no extra whitespace in
+the file.  There must be one or more M cards.  There must be a
+single Z card with a correct MD5 checksum.  And all cards must
+be in strict lexigraphical order.</p>
+
+<h3>3.1 The Unclustered Table</h3>
+
+<p>Every repository maintains a table named "<b>unclustered</b>"
+which records the identify of every file and phantom it holds that is not
+mentioned in a cluster.  The entries in the unclustered table can
+be thought of as leaves on a tree of files.  Some of the unclustered
+files will be clusters.  Those clusters may contain other clusters,
+which might contain still more clusters, and so forth.  Beginning
+with the files in the unclustered table, one can follow the chain
+of clusters to find every file in the repository.</p>
+
+<h2>4.0 Synchronization Strategies</h2>
+
+<h3>4.1 Pull</h3>
+
+<p>A typical pull operation proceeds as shown below.  Details
+of the actual implementation may very slightly but the gist of
+a pull is captured in the following steps:</p>
+
+<ol>
+<li>The client sends login and pull cards.
+<li>The client sends a cookie card if it has previously received a cookie.
+<li>The client sends gimme cards for every phantom that it holds.
+<hr>
+<li>The server checks the login password and rejects the session if
+the user does not have permission to pull.
+<li>If the number entries in the unclustered table on the server is
+greater than 100, then the server constructs a new cluster file to
+cover all those unclustered entries.
+<li>The server sends file cards for every gimme card it received
+from the client.
+<li>The server sends ihave cards for every file in its unclustered
+table that is not a phantom.
+<hr>
+<li>The client adds the content of file cards to its repository.
+<li>The client creates a phantom for every ihave card in the server reply
+that mentions a file that the client does not possess.
+<li>The client creates a phantom for the delta source of file cards when
+the delta source is a file that the client does not possess.
+</ol>
+
+<p>These ten steps represent a single HTTP round-trip request.
+The first three steps are the processing that occurs on the client
+to generate the request.  The middle four steps are processing
+that occurs on the server to interpret the request and generate a
+reply.  And the last three steps are the processing that the
+client does to interpret the reply.</p>
+
+<p>During a pull, the client will keep sending HTTP requests
+until it holds all files that exist on the server.</p>
+
+<p>Note that the server tries
+to limit the size of its reply message to something reasonable
+(usually about 1MB) so that it might stop sending file cards as
+described in step (6) if the reply becomes too large.</p>
+
+<p>Step (5) is the only way in which new clusters can be created.
+By only creating clusters on the server, we hope to minimize the
+amount of overlap between clusters in the common configuration where
+there is a single server and many clients.  The same synchronization
+protocol will continue to work even if there are multiple servers
+or if servers and clients sometimes change roles.  The only negative
+effects of these unusual arrangements is that more than the minimum
+number of clusters might be generated.</p>
+
+<h3>4.2 Push</h3>
+
+<p>A typical push operation proceeds roughly as shown below.  As
+with a pull, the actual implementation may vary slightly.</p>
+
+<ol>
+<li>The client sends login and push cards.
+<li>The client sends file cards for any files that it holds that have
+never before been pushed - files that come from local check-ins.
+<li>If this is the second or later cycle in a push, then the
+client sends file cards for any gimme cards that the server sent
+in the previous cycle.
+<li>The client sends igot cards for every file in its unclustered table
+that is not a phantom.
+<hr>
+<li>The server checks the login and push cards and issues an error if
+anything is amiss.
+<li>The server accepts file cards from the client and adds those files
+to its repository.
+<li>The server creates phantoms for igot cards that mention files it
+does not possess or for file cards that mention delta source files that
+it does not possess.
+<li>The server issues gimme cards for all phantoms.
+<hr>
+<li>The client remembers the gimme cards from the server so that it
+can generate file cards in reply on the next cycle.
+</ol>
+
+<p>As with a pull, the steps of a push operation repeat until the
+server knows all files that exist on the client.  Also, as with
+pull, the client attempts to keep the size of the request from
+growing too large by suppressing file cards once the
+size of the request reaches 1MB.</p>
+
+<h3>4.3 Sync</h3>
+
+<p>A sync is just a pull and a push that happen at the same time.
+The first three steps of a pull are combined with the first five steps
+of a push.  Steps (4) through (7) of a pull are combined with steps
+(5) through (8) of a push.  And steps (8) through (10) of a pull
+are combined with step (9) of a push.</p>
+
+<h2>5.0 Summary</h2>
+
+<p>Here are the key points of the synchronization protocol:</p>
+
+<ol>
+<li>The client sends one or more PUSH HTTP requests to the server.
+    The request and reply content type is "application/x-fossil".
+<li>HTTP request content is compressed using zlib.
+<li>The content of request and reply consists of cards with one
+    card per line.
+<li>Card formats are:
+    <ul>
+    <li> <b>login</b> <i>userid nonce signature</i>
+    <li> <b>push</b> <i>servercode projectcode</i>
+    <li> <b>pull</b> <i>servercode projectcode</i>
+    <li> <b>clone</b>
+    <li> <b>file</b> <i>uuid size</i> <b>\n</b> <i>content</i>
+    <li> <b>file</b> <i>uuid delta-uuid size</i> <b>\n</b> <i>content</i>
+    <li> <b>igot</b> <i>uuid</i>
+    <li> <b>gimme</b> <i>uuid</i>
+    <li> <b>cookie</b>  <i>cookie-text</i>
+    <li> <b>error</b> <i>error-message</i>
+    </ul>
+<li>Phantoms are files that a repository knows exist but does not possess.
+<li>Clusters are files that contain the UUIDs of other files.
+<li>Clusters are created automatically on the server during a pull.
+<li>Repositories keep track of all files that are not named in any
+cluster and send igot messages for those files.
+<li>Repositories keep track of all the phantoms they hold and send
+gimme messages for those files.
+</ol>
 
+</body>
+</html>