Overview
SHA1 Hash: | 26131cfcc5fe3b5f852b5c14150cb7988f9117f6 |
---|---|
Date: | 2007-09-10 00:39:28 |
User: | drh |
Comment: | Add a first draft of the synchronization protocol document. Unproofed. |
Timelines: | ancestors | descendants | both | trunk |
Other Links: | files | ZIP archive | manifest |
Tags And Properties
- branch=trunk inherited from [a28c83647d]
- sym-trunk inherited from [a28c83647d]
Changes
[hide diffs]Modified www/index.html from [976b9491be] to [a0290afac4].
@@ -90,9 +90,10 @@ file stored in the repository.</li> <li>The <a href="delta_format.html">format of deltas</a> used to efficiently store changes between file revisions.</li> <li>The <a href="delta_encoder_algorithm.html">encoder algorithm</a> used to efficiently generate deltas.</li> +<li>The <a href="sync.html">synchronization protocol</a>. </ul> </body> </html>
Added www/sync.html version [ca710f907d]
@@ -1,1 +1,463 @@ +<html> +<head> +<title>The Fossil Sync Protocol</title> +</head> +<body bgcolor="white"> +<h1 align="center">The Fossil Sync Protocol</h1> + +<p>Fossil supports commands <b>push</b>, <b>pull</b>, and <b>sync</b> +for transferring information from one repository to another. The +command is run on the client repository. A URL for the server repository +is specified as part of the command. This document describes what happens +behind the scenes in order to synchronize the information on the two +repositories.</p> + +<h2>1.0 Transport</h2> + +<p>All communication between client and server is via HTTP requests. +The server is listening for incoming HTTP requests. The client +issues one or more HTTP requests and receives replies for each +request.</p> + +<p>The server might be running as an independent server +using the <b>server</b> command, or it might be launched from +inetd or xinetd using the <b>http</b> command. Or the server might +be launched from CGI. The details of how the server is configured +to "listen" for incoming HTTP requests is immaterial. The important +point is that the server is listening for requests and the client +is the issuer of the requests.</p> + +<p>A single push, pull, or sync might involve multiple HTTP requests. +The client maintains state between all requests. But on the server +side, each request is independent. The server does not preserve +any information about the client from one request to the next.</p> + +<h3>1.1 Server Identification</h3> + +<p>The server is identified by a URL argument that accompanies the +push, pull, or sync command on the client. (As a convenience to +users, the URL can be omitted on the client command and the same URL +from the most recent push, pull, or sync will be reused. This saves +typing in the common case where the client does multiple syncs to +the same server.)</p> + +<p>The client modifies the URL by appending the method name "<b>/xfer</b>" +to the end. For example, if the URL specified on the client command +line is</p> + +<blockquote> +http://fossil-scm.hwaci.com/fossil +</blockquote> + +<p>Then the URL that is really used to do the synchronization will +be:</p> + +<blockquote> +http://fossil-scm.hwaci.com/fossil/xfer +</blockquote> + +<h3>1.2 HTTP Request Format</h3> + +<p>The client always sends a POST request to the server. The +general format of the POST request is as follows:</p> + +<blockquote><pre> +POST /fossil/xfer HTTP/1.0 +Host: fossil-scm.hwaci.com:80 +Content-Type: application/x-fossil +Content-Length: 4216 + +<i>content...</i> +</pre></blockquote> + +<p>In the example above, the pathname given after the POST keyword +on the first line is a copy of the URL pathname. The Host: parameter +is also taken from the URL. The content type is always either +"application/x-fossil" or "application/x-fossil-debug". The "x-fossil" +content type is the default. The only difference is that "x-fossil" +content is compressed using zlib whereas "x-fossil-debug" is sent +uncompressed.</p> + +<p>A typical reply from the server might look something like this:</p> + +<blockquote><pre> +HTTP/1.0 200 OK +Date: Mon, 10 Sep 2007 12:21:01 GMT +Connection: close +Cache-control: private +Content-Type: application/x-fossil; charset=US-ASCII +Content-Length: 265 + +<i>content...</i> +</pre></blockquote> + +<p>The content type of the reply is always the same as the content type +of the request.</p> + +<h2>2.0 Fossil Synchronization Content</h2> + +<p>A synchronization request between a client and server consists of +one or more HTTP requests as described in the previous section. This +section details the "x-fossil" content type.</p> + +<h3>2.1 Line-oriented Format</h3> + +<p>The x-fossil content type consists of zero or more "cards". Cards +are separate by the newline character ("\n"). Leading and trailing +whitespace on a card is ignore. Blank cards are ignored.</p> + +<p>Each card is divided into zero or more space separated tokens. +The first token on each card is the operator. Subsequent tokens +are arguments. The set of operators understood by servers is slightly +different from the operators understood by clients, though the two +are very similar.</p> + +<h3>2.2 Login Cards</h3> + +<p>Every message from client to server begins with one more login +cards. Each login card has the following format:</p> + +<blockquote> +<b>login</b> <i>userid nonce signature</i> +</blockquote> + +<p>The userid is the name of the user that is requesting service +from the server. The nonce is a random one-use hexadecimal number. +The signature is the SHA1 hash of the users password.</p> + +<p>For each login card, the server looks up the user and verifies +that the nonce has never before been used. It then checks the +signature hash to make sure the signature matches. If everything +checks out, then the client is granted all privileges of the +specified user.</p> + +<p>Privileges are cumulative. There can be multiple successful +login cards. The session privileges are the bit-wise OR of the +privileges of each individual login.</p> + +<h3>2.3 File Cards</h3> + +<p>Repository content records or files are transferred using +a "file" card. File cards come in two different formats depending +on whether the file is sent directly or as a delta from some +other file.</p> + +<blockquote> +<b>file</b> <i>uuid size</i> <b>\n</b> <i>content</i><br> +<b>file</b> <i>uuid delta-uuid size</i> <b>\n</b> <i>content</i> +</blockquote> + +<p>File cards are different from all other cards in that they +followed by in-line "payload" data. The content of the file +or the file delta consists of first <i>size</i> bytes of the +x-fossil content that immediately follow the newline that +terminates the file card. No other cards have this characteristic. +</p> + +<p>The first argument of a file card is the UUID of the file that +is being transferred. The UUID is the lower-case hexadecimal +representation of the SHA1 hash of the entire file content. +The last argument of the file card is the number of bytes of +payload that immediately follow the file card. If the file +card has only two arguments, that means the payload is the +complete content of the file. If the file card has three +arguments, then the payload is a delta and second argument is +the UUID of another file that is the source of the delta.</p> + +<p>File cards are sent in both directions: client to server and +server to client. A delta might be sent before the source of +the delta, so both client and server should remember deltas +and be able to apply them when their source arrives.</p> + +<h3>2.4 Push and Pull Cards</h3> + +<p>Among of the first cards in a client-to-server message are +the push and pull cards. The push card tell the server that +the client is pushing content. The pull card tell the server +that the client wants to pull content. In the event of a sync, +both cards are sent. The format is as follows:</p> + +<blockquote> +<b>push</b> <i>servercode projectcode</i><br> +<b>pull</b> <i>servercode projectcode</i> +</blockquote> + +<p>The <i>servercode</i> argument is the repository ID for the +client. The server will only allow the transaction to proceed +if the servercode is different from its own servercode. This +prevents a sync-loop. The <i>projectcode</i> is the identifier +of the software project that the client repository contains. +The projectcode for the client and server must match in order +for the transaction to proceed.</p> + +<p>The server will also send a push card back to the client +during a clone. This is how the client determines what project +code to put in the new repository it is constructing.</p> + +<h3>2.5 Clone Cards</h3> + +<p>A clone card works like a pull card in that it is sent from +client to server in order to tell the server that the client +wants to pull content. But unlike the pull card, the clone +card has no arguments.</p> + +<blockquote> +<b>clone</b> +</blockquote> + +<p>In response to a clone message, the server also sends the client +a push message so that the client can discover the projectcode for +this project.</p> + +<h3>2.6 Igot Cards</h3> + +<p>An igot card can be sent from either client to server or from +server to client in order to indicate that the sender holds a copy +of a particular file. The format is:</p> + +<blockquote> +<b>igot</b> <i>uuid</i> +</blockquote> + +<p>The argument of the igot card is the UUID of the file that +the sender possesses. +The receiver of an igot card will typically check to see if +it also holds the same file and if not it will request the file +using a gimme card in either the reply or in the next message.</p> + +<h3>2.7 Gimme Cards</h3> + +<p>A gimme card is sent from either client to server or from server +to client. The gimme card asks the receiver to send a particular +file back to the sender. The format of a gimme card is this:</p> + +<blockquote> +<b>gimme</b> <i>uuid</i> +</blockquote> + +<p>The argument to the gimme card is the UUID of the file that +the sender wants. The receiver will typically respond to a +gimme card by sending a file card in its reply or in the next +message.</p> + +<h3>2.8 Cookie Cards</h3> + +<p>A cookie card can be used by a server to record a small amount +of state information on a client. The server sends a cookie to the +client. The client sends the same cookie back to the server on +its next request. The cookie card has a single argument which +is its payload.</p> + +<blockquote> +<b>cookie</b> <i>payload</i> +</blockquote> + +<p>The client is not required to return the cookie to the server on +its next request. Or the client might send a cookie from a different +server on the next request. So the server must not depend on the +cookie and the server must structure the cookie payload in such +a way that it can tell if the cookie it sees is its own cookie or +a cookie from another server. (Typically the server will embed +its servercode as part of the cookie.)</p> + +<h3>2.9 Error Cards</h3> + +<p>If the server discovers anything wrong with a request, it generates +an error card in its reply. When the client sees the error card, +it displays an error message to the user and aborts the sync +operation. An error card looks like this:</p> + +<blockquote> +<b>error</b> <i>error-message</i> +</blockquote> + +<p>The error message is English text that is encoded in order to +be a single token. +A space (ASCII 0x20) is represented as "\s" (ASCII 0x5C, 0x73). A +newline (ASCII 0x0a) is "\n" (ASCII 0x6C, x6E). A backslash +(ASCII 0x5C) is represented as two backslashes "\\". Apart from +space and newline, no other whitespace characters nor any +unprintable characters are allowed in +the error message.</p> + +<h3>2.10 Unknown Cards</h3> + +<p>If either the client or the server sees a card that is not +described above, then it generates an error and aborts.</p> + +<h2>3.0 Phantoms And Clusters</h2> + +<p>When a repository knows that a file exists and knows the UUID of +that file, but it does not know the file content, then it stores that +file as a "phantom". A repository will typically create a phantom when +it receives an igot card for a file that it does not hold or when it +receives a file card that references a delta source that it does not +hold. When a server is generating its reply or when a client is +generating a new request, it will usually send gimme cards for every +phantom that it holds.</p> + +<p>A cluster is a special file that tells of the existance of other +files. Any file in the repository that follows the syntactic rules +of a cluster is considered a cluster.</p> + +<p>A cluster consists is a line oriented file. Each line of a cluster +is a card. The cards are separated by the newline ("\n") character. +Each card consists of a single character card type, a space, and a +single argument. No extra whitespace and no trailing or leading +whitespace is allowed. All cards in the cluster must occur in +strict lexigraphical order.</p> + +<p>A cluster consists of one or more "M" cards followed by a single +"Z" card. Each M card holds an argument which is a UUID for a file +in the repository. The Z card has a single argument which is the +lower-case hexadecimal representation of the MD5 checksum of all +preceeding M cards up to and included the newline character that +occurred just before the Z that starts the Z card.</p> + +<p>Any file that does not match the specifications of a cluster +exactly is not a cluster. There must be no extra whitespace in +the file. There must be one or more M cards. There must be a +single Z card with a correct MD5 checksum. And all cards must +be in strict lexigraphical order.</p> + +<h3>3.1 The Unclustered Table</h3> + +<p>Every repository maintains a table named "<b>unclustered</b>" +which records the identify of every file and phantom it holds that is not +mentioned in a cluster. The entries in the unclustered table can +be thought of as leaves on a tree of files. Some of the unclustered +files will be clusters. Those clusters may contain other clusters, +which might contain still more clusters, and so forth. Beginning +with the files in the unclustered table, one can follow the chain +of clusters to find every file in the repository.</p> + +<h2>4.0 Synchronization Strategies</h2> + +<h3>4.1 Pull</h3> + +<p>A typical pull operation proceeds as shown below. Details +of the actual implementation may very slightly but the gist of +a pull is captured in the following steps:</p> + +<ol> +<li>The client sends login and pull cards. +<li>The client sends a cookie card if it has previously received a cookie. +<li>The client sends gimme cards for every phantom that it holds. +<hr> +<li>The server checks the login password and rejects the session if +the user does not have permission to pull. +<li>If the number entries in the unclustered table on the server is +greater than 100, then the server constructs a new cluster file to +cover all those unclustered entries. +<li>The server sends file cards for every gimme card it received +from the client. +<li>The server sends ihave cards for every file in its unclustered +table that is not a phantom. +<hr> +<li>The client adds the content of file cards to its repository. +<li>The client creates a phantom for every ihave card in the server reply +that mentions a file that the client does not possess. +<li>The client creates a phantom for the delta source of file cards when +the delta source is a file that the client does not possess. +</ol> + +<p>These ten steps represent a single HTTP round-trip request. +The first three steps are the processing that occurs on the client +to generate the request. The middle four steps are processing +that occurs on the server to interpret the request and generate a +reply. And the last three steps are the processing that the +client does to interpret the reply.</p> + +<p>During a pull, the client will keep sending HTTP requests +until it holds all files that exist on the server.</p> + +<p>Note that the server tries +to limit the size of its reply message to something reasonable +(usually about 1MB) so that it might stop sending file cards as +described in step (6) if the reply becomes too large.</p> + +<p>Step (5) is the only way in which new clusters can be created. +By only creating clusters on the server, we hope to minimize the +amount of overlap between clusters in the common configuration where +there is a single server and many clients. The same synchronization +protocol will continue to work even if there are multiple servers +or if servers and clients sometimes change roles. The only negative +effects of these unusual arrangements is that more than the minimum +number of clusters might be generated.</p> + +<h3>4.2 Push</h3> + +<p>A typical push operation proceeds roughly as shown below. As +with a pull, the actual implementation may vary slightly.</p> + +<ol> +<li>The client sends login and push cards. +<li>The client sends file cards for any files that it holds that have +never before been pushed - files that come from local check-ins. +<li>If this is the second or later cycle in a push, then the +client sends file cards for any gimme cards that the server sent +in the previous cycle. +<li>The client sends igot cards for every file in its unclustered table +that is not a phantom. +<hr> +<li>The server checks the login and push cards and issues an error if +anything is amiss. +<li>The server accepts file cards from the client and adds those files +to its repository. +<li>The server creates phantoms for igot cards that mention files it +does not possess or for file cards that mention delta source files that +it does not possess. +<li>The server issues gimme cards for all phantoms. +<hr> +<li>The client remembers the gimme cards from the server so that it +can generate file cards in reply on the next cycle. +</ol> + +<p>As with a pull, the steps of a push operation repeat until the +server knows all files that exist on the client. Also, as with +pull, the client attempts to keep the size of the request from +growing too large by suppressing file cards once the +size of the request reaches 1MB.</p> + +<h3>4.3 Sync</h3> + +<p>A sync is just a pull and a push that happen at the same time. +The first three steps of a pull are combined with the first five steps +of a push. Steps (4) through (7) of a pull are combined with steps +(5) through (8) of a push. And steps (8) through (10) of a pull +are combined with step (9) of a push.</p> + +<h2>5.0 Summary</h2> + +<p>Here are the key points of the synchronization protocol:</p> + +<ol> +<li>The client sends one or more PUSH HTTP requests to the server. + The request and reply content type is "application/x-fossil". +<li>HTTP request content is compressed using zlib. +<li>The content of request and reply consists of cards with one + card per line. +<li>Card formats are: + <ul> + <li> <b>login</b> <i>userid nonce signature</i> + <li> <b>push</b> <i>servercode projectcode</i> + <li> <b>pull</b> <i>servercode projectcode</i> + <li> <b>clone</b> + <li> <b>file</b> <i>uuid size</i> <b>\n</b> <i>content</i> + <li> <b>file</b> <i>uuid delta-uuid size</i> <b>\n</b> <i>content</i> + <li> <b>igot</b> <i>uuid</i> + <li> <b>gimme</b> <i>uuid</i> + <li> <b>cookie</b> <i>cookie-text</i> + <li> <b>error</b> <i>error-message</i> + </ul> +<li>Phantoms are files that a repository knows exist but does not possess. +<li>Clusters are files that contain the UUIDs of other files. +<li>Clusters are created automatically on the server during a pull. +<li>Repositories keep track of all files that are not named in any +cluster and send igot messages for those files. +<li>Repositories keep track of all the phantoms they hold and send +gimme messages for those files. +</ol> +</body> +</html>