Check-in [adc0b3bfb0]
Not logged in
Overview

SHA1 Hash:adc0b3bfb0daaa7677179c5c48b062d88536fe4d
Date: 2008-07-15 14:33:48
User: drh
Comment:Additional documentation updates.
Timelines: ancestors | descendants | both | trunk
Other Links: files | ZIP archive | manifest

Tags And Properties
Changes
[hide diffs]

Modified www/delta_encoder_algorithm.wiki from [a5b01a2840] to [132f9016a1].

@@ -15,11 +15,11 @@
 not necessary to understand encoder operation, can be found in the
 companion specification titled "<a href="delta_format.wiki">Fossil
 Delta Format</a>".
 </p>
 
-<p>The entire algorithm is inspired
+<p>The algorithm is inspired
 by <a href="http://samba.anu.edu.au/rsync/">rsync</a>.</p>
 
 <a name="argresparam"></a><h2>1.0 Arguments, Results, and Parameters</h2>
 
 <p>The encoder takes two byte-sequences as input, the "original", and

Modified www/fileformat.wiki from [346d6f3293] to [9be529e4b5].

@@ -1,16 +1,22 @@
 <h1 align="center">
 Fossil File Formats
 </h1>
 
+<p>The state of a fossil repository is kept simple so that it can
+endure in useful form for decades or centuries.
+A fossil repository is intended to be readable,
+searchable, and extensible by people not yet born.</p>
+
 <p>
 The global state of a fossil repository is determined by an unordered
-set of artifacts.
+set of <i>artifacts</i>.
 An artifact might be a source code file, the text of a wiki page,
 part of a trouble ticket, or one of several special control artifacts
 used to show the relationships between other artifacts within the
-project.  Artifacts can be text or binary.
+project.  Each artifact is normally represented on disk as a separate
+file.  Artifacts can be text or binary.
 </p>
 
 <p>
 Each artifact in the repository is named by its SHA1 hash.
 No prefixes or meta information is added to a artifact before
@@ -167,11 +173,13 @@
 <h2>2.0 Clusters</h2>
 
 <p>
 A cluster is a artifact that declares the existance of other artifacts.
 Clusters are used during repository synchronization to help
-reduce network traffic.
+reduce network traffic.  As such, clusters are an optimization and
+may be removed from a repository without loss or damage to the
+underlying project code.
 </p>
 
 <p>
 Clusters follow a syntax that is very similar to manifests.
 A Cluster is a line-oriented text file.  Newline characters

Modified www/index.wiki from [0332519f4d] to [a45f7bf59e].

@@ -4,11 +4,11 @@
 Fossil is a new
 <a href="http://en.wikipedia.org/wiki/Revision_control">
 distributed software revision control system</a> that includes an integrated
 <a href="http://en.wikipedia.org/wiki/Wiki">Wiki</a> and an integrated
 <a href="http://en.wikipedia.org/wiki/Bugtracker">
-bug-tracking system</a> all in a single easy-to-use stand-alone
+bug-tracking system</a> all in a single, easy-to-use, stand-alone
 executable.
 (NB: The bug-tracker component is not yet completely functional, but
 we expect it to be available soon.)
 Fossil is
 <a href="http://www.fossil-scm.org/fossil/timeline">self-hosting</a>
@@ -35,29 +35,30 @@
 or all three at the same time</li>
 <li>Integrated bug tracking and wiki, along the lines of
 <a href="http://www.cvstrac.org/">CVSTrac</a> and
 <a href="http://www.edgewall.com/trac/">Trac</a>.</li>
 <li>Built-in web interface that supports deep archaeological digs through
-historical source code.</li>
+the project history.</li>
 <li>All network communication via
 <a href="http://en.wikipedia.org/wiki/HTTP">HTTP</a>
 (so that everything works from behind restrictive firewalls).</li>
-<li>Everything included in a single self-contained executable -
-    trivial to install</li>
+<li>Everything (client, server, and utilities) is included in a
+single self-contained executable - trivial to install</li>
 <li>Server runs as <a href="http://www.w3.org/CGI/">CGI</a>, using
-<a href="http://en.wikipedia.org/wiki/inetd">inetd</a> or
-<a href="http://www.xinetd.org/">xinetd</a> or using its own built-in,
+<a href="http://en.wikipedia.org/wiki/inetd">inetd</a>/<a
+ href="http://www.xinetd.org/">xinetd</a>, or using its own built-in,
 standalone web server.</li>
 <li>An entire project contained in single disk file (which also
 happens to be an <a href="http://www.sqlite.org/">SQLite</a> database.)</li>
 <li>Trivial to setup and administer</li>
 <li>Files and versions are identified by their
 <a href="http://en.wikipedia.org/wiki/SHA-1">SHA1</a> signature.</a>
 Any unique prefix is sufficient to identify a file
 or version - usually the first 4 or 5 characters suffice.</li>
-<li>The file format is trival and requires nothing more complex
-than a text editor and the "sha1sum" command-line utility to decode.</li>
+<li>The <a href="fileformat.wiki">file format</a> designed to be enduring.
+It is deliberately kept simple, requiring nothing more complex
+than a text editor and an SHA1 checksum generator to encode or decode.</li>
 <li>Automatic <a href="selfcheck.wiki">self-check</a>
 on repository changes makes it exceedingly
 unlikely that data will ever be lost because of a software bug.</li>
 </ul>
 
@@ -73,16 +74,15 @@
 (example: <a href="http://www.he.net/hosting.html">Hurricane Electric</a>)
 that provides nothing more than web space and CGI capability.
 Here is <a href="http://www.hwaci.com/cgi-bin/fossil/timeline">a demo</a>.</li>
 <li>Fossil should provide in-depth historical and status information about the
 project through a web interface</li>
-<li>The integration of <a href="http://wiki.org/wiki.cgi?WhatIsWiki">Wiki</a>
-and the ability to safely support anonymous check-in are features sometimes
-described as
-<a href="http://www.oreillynet.com/pub/a/oreilly/tim/news/2005/09/30/what-is-web-20.html">Web 2.0</a>.
-Fossil attempts to better capture "collective intelligence" and
-"the wisdom of crowds" by opening up write access to the masses.</li>
+<li>Fossil should provide an historical record of a project that endures
+for decades or centuries and across multiple generations of hardward
+and software.</li>
+<li>Fossil should be easily adaptable to different workflows.  Fossil
+implements mechanism, not policy.</li>
 </ul>
 
 <p>User Links:</p>
 
 <ul>

Modified www/sync.wiki from [5cff62ccf6] to [e3ece24fa5].

@@ -5,11 +5,28 @@
 command is run on the client repository.  A URL for the server repository
 is specified as part of the command.  This document describes what happens
 behind the scenes in order to synchronize the information on the two
 repositories.</p>
 
-<h2>1.0 Transport</h2>
+<h2>1.0 Overview</h2>
+
+<p>The global state of a fossil repository consists of an unordered
+collection of artifacts.  Each artifact is identified by its SHA1 hash.
+Synchronization is simply the process of sharing artifacts between
+servers so that all servers have copies of all artifacts.  Because
+artifacts are unordered, the order in which artifacts are received
+at a server is inconsequential.  It is assumed that the SHA1 hashes
+of artifacts are unique - that every artifact has a different SHA1 hash.
+To first approximation, synchronization proceeds by sharing lists
+SHA1 hashes of available artifacts, then sharing those artifacts that
+are not found on one side or the other of the connection.  In practice,
+a repository might contain millions of artifacts.  The list of
+SHA1 hashes for this many artifacts can be large.  So optimizations are
+employed that usually reduce the number of SHA1 hashes that need to be
+shared to a few hundred.</p>
+
+<h2>2.0 Transport</h2>
 
 <p>All communication between client and server is via HTTP requests.
 The server is listening for incoming HTTP requests.  The client
 issues one or more HTTP requests and receives replies for each
 request.</p>
@@ -25,11 +42,11 @@
 <p>A single push, pull, or sync might involve multiple HTTP requests.
 The client maintains state between all requests.  But on the server
 side, each request is independent.  The server does not preserve
 any information about the client from one request to the next.</p>
 
-<h3>1.1 Server Identification</h3>
+<h3>2.1 Server Identification</h3>
 
 <p>The server is identified by a URL argument that accompanies the
 push, pull, or sync command on the client.  (As a convenience to
 users, the URL can be omitted on the client command and the same URL
 from the most recent push, pull, or sync will be reused.  This saves
@@ -49,11 +66,11 @@
 
 <blockquote>
 http://fossil-scm.hwaci.com/fossil/xfer
 </blockquote>
 
-<h3>1.2 HTTP Request Format</h3>
+<h3>2.2 HTTP Request Format</h3>
 
 <p>The client always sends a POST request to the server.  The
 general format of the POST request is as follows:</p>
 
 <blockquote><pre>
@@ -87,17 +104,17 @@
 </pre></blockquote>
 
 <p>The content type of the reply is always the same as the content type
 of the request.</p>
 
-<h2>2.0 Fossil Synchronization Content</h2>
+<h2>3.0 Fossil Synchronization Content</h2>
 
 <p>A synchronization request between a client and server consists of
 one or more HTTP requests as described in the previous section.  This
 section details the "x-fossil" content type.</p>
 
-<h3>2.1 Line-oriented Format</h3>
+<h3>3.1 Line-oriented Format</h3>
 
 <p>The x-fossil content type consists of zero or more "cards".  Cards
 are separate by the newline character ("\n").  Leading and trailing
 whitespace on a card is ignored.  Blank cards are ignored.</p>
 
@@ -105,11 +122,11 @@
 The first token on each card is the operator.  Subsequent tokens
 are arguments.  The set of operators understood by servers is slightly
 different from the operators understood by clients, though the two
 are very similar.</p>
 
-<h3>2.2 Login Cards</h3>
+<h3>3.2 Login Cards</h3>
 
 <p>Every message from client to server begins with one or more login
 cards.  Each login card has the following format:</p>
 
 <blockquote>
@@ -131,11 +148,11 @@
 
 <p>Privileges are cumulative.  There can be multiple successful
 login cards.  The session privileges are the bit-wise OR of the
 privileges of each individual login.</p>
 
-<h3>2.3 File Cards</h3>
+<h3>3.3 File Cards</h3>
 
 <p>Repository content records or files are transferred using
 a "file" card.  File cards come in two different formats depending
 on whether the file is sent directly or as a delta from some
 other file.</p>
@@ -165,11 +182,11 @@
 <p>File cards are sent in both directions: client to server and
 server to client.  A delta might be sent before the source of
 the delta, so both client and server should remember deltas
 and be able to apply them when their source arrives.</p>
 
-<h3>2.4 Push and Pull Cards</h3>
+<h3>3.4 Push and Pull Cards</h3>
 
 <p>Among of the first cards in a client-to-server message are
 the push and pull cards.  The push card tell the server that
 the client is pushing content.  The pull card tell the server
 that the client wants to pull content.  In the event of a sync,
@@ -190,11 +207,11 @@
 
 <p>The server will also send a push card back to the client
 during a clone.  This is how the client determines what project
 code to put in the new repository it is constructing.</p>
 
-<h3>2.5 Clone Cards</h3>
+<h3>3.5 Clone Cards</h3>
 
 <p>A clone card works like a pull card in that it is sent from
 client to server in order to tell the server that the client
 wants to pull content.  But unlike the pull card, the clone
 card has no arguments.</p>
@@ -205,11 +222,11 @@
 
 <p>In response to a clone message, the server also sends the client
 a push message so that the client can discover the projectcode for
 this project.</p>
 
-<h3>2.6 Igot Cards</h3>
+<h3>3.6 Igot Cards</h3>
 
 <p>An igot card can be sent from either client to server or from
 server to client in order to indicate that the sender holds a copy
 of a particular file.  The format is:</p>
 
@@ -221,11 +238,11 @@
 the sender possesses.
 The receiver of an igot card will typically check to see if
 it also holds the same file and if not it will request the file
 using a gimme card in either the reply or in the next message.</p>
 
-<h3>2.7 Gimme Cards</h3>
+<h3>3.7 Gimme Cards</h3>
 
 <p>A gimme card is sent from either client to server or from server
 to client.  The gimme card asks the receiver to send a particular
 file back to the sender.  The format of a gimme card is this:</p>
 
@@ -236,11 +253,11 @@
 <p>The argument to the gimme card is the UUID of the file that
 the sender wants.  The receiver will typically respond to a
 gimme card by sending a file card in its reply or in the next
 message.</p>
 
-<h3>2.8 Cookie Cards</h3>
+<h3>3.8 Cookie Cards</h3>
 
 <p>A cookie card can be used by a server to record a small amount
 of state information on a client.  The server sends a cookie to the
 client.  The client sends the same cookie back to the server on
 its next request.  The cookie card has a single argument which
@@ -256,11 +273,11 @@
 cookie and the server must structure the cookie payload in such
 a way that it can tell if the cookie it sees is its own cookie or
 a cookie from another server.  (Typically the server will embed
 its servercode as part of the cookie.)</p>
 
-<h3>2.9 Error Cards</h3>
+<h3>3.9 Error Cards</h3>
 
 <p>If the server discovers anything wrong with a request, it generates
 an error card in its reply.  When the client sees the error card,
 it displays an error message to the user and aborts the sync
 operation.  An error card looks like this:</p>
@@ -276,16 +293,16 @@
 (ASCII 0x5C) is represented as two backslashes "\\".  Apart from
 space and newline, no other whitespace characters nor any
 unprintable characters are allowed in
 the error message.</p>
 
-<h3>2.10 Unknown Cards</h3>
+<h3>3.10 Unknown Cards</h3>
 
 <p>If either the client or the server sees a card that is not
 described above, then it generates an error and aborts.</p>
 
-<h2>3.0 Phantoms And Clusters</h2>
+<h2>4.0 Phantoms And Clusters</h2>
 
 <p>When a repository knows that a file exists and knows the UUID of
 that file, but it does not know the file content, then it stores that
 file as a "phantom".  A repository will typically create a phantom when
 it receives an igot card for a file that it does not hold or when it
@@ -316,11 +333,11 @@
 exactly is not a cluster.  There must be no extra whitespace in
 the file.  There must be one or more M cards.  There must be a
 single Z card with a correct MD5 checksum.  And all cards must
 be in strict lexicographical order.</p>
 
-<h3>3.1 The Unclustered Table</h3>
+<h3>4.1 The Unclustered Table</h3>
 
 <p>Every repository maintains a table named "<b>unclustered</b>"
 which records the identity of every file and phantom it holds that is not
 mentioned in a cluster.  The entries in the unclustered table can
 be thought of as leaves on a tree of files.  Some of the unclustered
@@ -327,13 +344,13 @@
 files will be clusters.  Those clusters may contain other clusters,
 which might contain still more clusters, and so forth.  Beginning
 with the files in the unclustered table, one can follow the chain
 of clusters to find every file in the repository.</p>
 
-<h2>4.0 Synchronization Strategies</h2>
-
-<h3>4.1 Pull</h3>
+<h2>5.0 Synchronization Strategies</h2>
+
+<h3>5.1 Pull</h3>
 
 <p>A typical pull operation proceeds as shown below.  Details
 of the actual implementation may very slightly but the gist of
 a pull is captured in the following steps:</p>
 
@@ -381,11 +398,11 @@
 protocol will continue to work even if there are multiple servers
 or if servers and clients sometimes change roles.  The only negative
 effects of these unusual arrangements is that more than the minimum
 number of clusters might be generated.</p>
 
-<h3>4.2 Push</h3>
+<h3>5.2 Push</h3>
 
 <p>A typical push operation proceeds roughly as shown below.  As
 with a pull, the actual implementation may vary slightly.</p>
 
 <ol>
@@ -415,19 +432,19 @@
 server knows all files that exist on the client.  Also, as with
 pull, the client attempts to keep the size of the request from
 growing too large by suppressing file cards once the
 size of the request reaches 1MB.</p>
 
-<h3>4.3 Sync</h3>
+<h3>5.3 Sync</h3>
 
 <p>A sync is just a pull and a push that happen at the same time.
 The first three steps of a pull are combined with the first five steps
 of a push.  Steps (4) through (7) of a pull are combined with steps
 (5) through (8) of a push.  And steps (8) through (10) of a pull
 are combined with step (9) of a push.</p>
 
-<h2>5.0 Summary</h2>
+<h2>6.0 Summary</h2>
 
 <p>Here are the key points of the synchronization protocol:</p>
 
 <ol>
 <li>The client sends one or more PUSH HTTP requests to the server.