File Annotation
Not logged in
16094f7ebc 2008-05-16       drh: <nowiki>
d87ca60c58 2008-05-15   stephan: <h1 align="center">
d87ca60c58 2008-05-15   stephan: Fossil Delta Format
d87ca60c58 2008-05-15   stephan: </h1>
d87ca60c58 2008-05-15   stephan: 
16094f7ebc 2008-05-16       drh: <p>Fossil achieves efficient storage and low-bandwidth synchronization
16094f7ebc 2008-05-16       drh: through the use of delta-compression.  Instead of storing
16094f7ebc 2008-05-16       drh: or transmitting the complete content of an artifact, fossil stores or
16094f7ebc 2008-05-16       drh: transmits only the changes relative to a related artifact.
16094f7ebc 2008-05-16       drh: </p>
16094f7ebc 2008-05-16       drh: 
16094f7ebc 2008-05-16       drh: <p>This document describes the delta-encoding format used by fossil.
16094f7ebc 2008-05-16       drh: The intended audience is developers working on either
16094f7ebc 2008-05-16       drh: <a href="index.wiki">fossil</a> itself, or on tools compatible with
16094f7ebc 2008-05-16       drh: fossil.</p>
16094f7ebc 2008-05-16       drh: 
16094f7ebc 2008-05-16       drh: <p>Note that the delta-encoding is not a fundamental element of the
16094f7ebc 2008-05-16       drh: state of a fossil repository.  A state of a fossil repository is
16094f7ebc 2008-05-16       drh: defined by the uncompressed and undeltaed content of all artifacts.
16094f7ebc 2008-05-16       drh: The fact the artifacts
16094f7ebc 2008-05-16       drh: are stored on disk using this delta-encoding format is merely an
16094f7ebc 2008-05-16       drh: optimization.  One could, in theory, create an entirely new and
16094f7ebc 2008-05-16       drh: compatible implementation of fossil that used a different delta-encoding
16094f7ebc 2008-05-16       drh: or did no delta-encoding at all.  However, experience has shown that
16094f7ebc 2008-05-16       drh: the delta-encoding described here is both efficient to compute and
16094f7ebc 2008-05-16       drh: results in very small deltas, so its continued use is recommended.</p>
16094f7ebc 2008-05-16       drh: 
16094f7ebc 2008-05-16       drh: <a name="structure"></a><h2>1.0 Structure</h2>
d87ca60c58 2008-05-15   stephan: <img src="delta1.gif" align="left" hspace="10">
d87ca60c58 2008-05-15   stephan: 
d87ca60c58 2008-05-15   stephan: <p>A delta consists of three parts, a "header", a "trailer", and a
d87ca60c58 2008-05-15   stephan: "segment-list" between them.</p>
d87ca60c58 2008-05-15   stephan: 
d87ca60c58 2008-05-15   stephan: <p>Both header and trailer provide information about the target
d87ca60c58 2008-05-15   stephan: helping the decoder, and the segment-list describes how the target can
d87ca60c58 2008-05-15   stephan: be constructed from the original.</p>
d87ca60c58 2008-05-15   stephan: 
16094f7ebc 2008-05-16       drh: <a name="header"></a><h3>1.1 Header</h3>
d87ca60c58 2008-05-15   stephan: <img src="delta6.gif" align="left" hspace="10">
d87ca60c58 2008-05-15   stephan: 
d87ca60c58 2008-05-15   stephan: <p>The header consists of a single number followed by a newline
d87ca60c58 2008-05-15   stephan: character (ASCII 0x0a). The number is the length of the target in
d87ca60c58 2008-05-15   stephan: bytes.</p>
d87ca60c58 2008-05-15   stephan: 
d87ca60c58 2008-05-15   stephan: <p>This means that, given a delta, the decoder can compute the size of
d87ca60c58 2008-05-15   stephan: the target (and allocate any necessary memory based on that) by simply
d87ca60c58 2008-05-15   stephan: reading the first line of the delta and decoding the number found
d87ca60c58 2008-05-15   stephan: there. In other words, before it has to decode everything else.</p>
d87ca60c58 2008-05-15   stephan: 
16094f7ebc 2008-05-16       drh: <a name="trailer"></a><h3>1.2 Trailer</h3>
d87ca60c58 2008-05-15   stephan: <img src="delta5.gif" align="left" hspace="10">
d87ca60c58 2008-05-15   stephan: 
d87ca60c58 2008-05-15   stephan: <p>The trailer consists of a single number followed by a semicolon (ASCII
d87ca60c58 2008-05-15   stephan: 0x3b). This number is a checksum of the target and can be used by a
d87ca60c58 2008-05-15   stephan: decoder to verify that the delta applied correctly, reconstructing the
d87ca60c58 2008-05-15   stephan: target from the original.</p>
d87ca60c58 2008-05-15   stephan: 
d87ca60c58 2008-05-15   stephan: <p>The checksum is computed by treating the target as a series of
d87ca60c58 2008-05-15   stephan: 32-bit integer numbers (MSB first), and summing these up, modulo
d87ca60c58 2008-05-15   stephan: 2^32-1. A target whose length is not a multiple of 4 is padded with
d87ca60c58 2008-05-15   stephan: 0-bytes (ASCII 0x00) at the end.</p>
d87ca60c58 2008-05-15   stephan: 
d87ca60c58 2008-05-15   stephan: <p>By putting this information at the end of the delta a decoder has
d87ca60c58 2008-05-15   stephan: it available immediately after the target has been reconstructed
d87ca60c58 2008-05-15   stephan: fully.</p>
d87ca60c58 2008-05-15   stephan: 
16094f7ebc 2008-05-16       drh: <a name="slist"></a><h3>1.3 Segment-List</h3>
d87ca60c58 2008-05-15   stephan: <img src="delta2.gif" align="left" hspace="10">
d87ca60c58 2008-05-15   stephan: 
d87ca60c58 2008-05-15   stephan: <p>The segment-list of a delta describes how to create the target from
d87ca60c58 2008-05-15   stephan: the original by a combination of inserting literal byte-sequences and
d87ca60c58 2008-05-15   stephan: copying ranges of bytes from the original. This is there the
d87ca60c58 2008-05-15   stephan: compression takes place, by encoding the large common parts of
d87ca60c58 2008-05-15   stephan: original and target in small copy instructions.</p>
d87ca60c58 2008-05-15   stephan: 
d87ca60c58 2008-05-15   stephan: <p>The target is constructed from beginning to end, with the data
d87ca60c58 2008-05-15   stephan: generated by each instruction appended after the data of all previous
d87ca60c58 2008-05-15   stephan: instructions, with no gaps.</p>
d87ca60c58 2008-05-15   stephan: 
16094f7ebc 2008-05-16       drh: <a name="insertlit"></a><h4>1.3.1 Insert Literal</h4>
d87ca60c58 2008-05-15   stephan: 
d87ca60c58 2008-05-15   stephan: <p>A literal is specified by two elements, the size of the literal in
d87ca60c58 2008-05-15   stephan: bytes, and the bytes of the literal itself.</p>
d87ca60c58 2008-05-15   stephan: 
d87ca60c58 2008-05-15   stephan: <img src="delta4.gif" align="left" hspace="10">
d87ca60c58 2008-05-15   stephan: <p>The length is written first, followed by a colon character (ASCII
d87ca60c58 2008-05-15   stephan: 0x3a), followed by the bytes of the literal.</p>
d87ca60c58 2008-05-15   stephan: 
16094f7ebc 2008-05-16       drh: <a name="copyrange"></a><h4>1.3.2 Copy Range</h4>
d87ca60c58 2008-05-15   stephan: 
d87ca60c58 2008-05-15   stephan: <p>A range to copy is specified by two numbers, the offset of the
d87ca60c58 2008-05-15   stephan: first byte in the original to copy, and the size of the range, in
d87ca60c58 2008-05-15   stephan: bytes. The size zero is special, its usage indicates that the range
d87ca60c58 2008-05-15   stephan: extends to the end of the original.</p>
d87ca60c58 2008-05-15   stephan: 
d87ca60c58 2008-05-15   stephan: <img src="delta3.gif" align="left" hspace="10">
d87ca60c58 2008-05-15   stephan: <p>The length is written first, followed by an "at" character (ASCII
d87ca60c58 2008-05-15   stephan: 0x40), then the offset, followed by a comma (ASCII 0x2c).</p>
d87ca60c58 2008-05-15   stephan: 
16094f7ebc 2008-05-16       drh: <a name="intcoding"></a><h2>2.0 Encoding of integers</h2>
d87ca60c58 2008-05-15   stephan: 
d87ca60c58 2008-05-15   stephan: <p>
d87ca60c58 2008-05-15   stephan: The format currently handles only 32 bit integer numbers. They are
d87ca60c58 2008-05-15   stephan: written base-64 encoded, MSB first, and without leading
d87ca60c58 2008-05-15   stephan: "0"-characters, except if they are significant (i.e. 0 => "0").
d87ca60c58 2008-05-15   stephan: </p>
d87ca60c58 2008-05-15   stephan: 
d87ca60c58 2008-05-15   stephan: <p>
d87ca60c58 2008-05-15   stephan: The base-64 coding is described in
d87ca60c58 2008-05-15   stephan: <a href="http://www.ietf.org/rfc/rfc3548.txt">RFC 3548</a>.
d87ca60c58 2008-05-15   stephan: </p>
d87ca60c58 2008-05-15   stephan: 
16094f7ebc 2008-05-16       drh: <a name="examples"></a><h2>3.0 Examples</h2>
16094f7ebc 2008-05-16       drh: 
16094f7ebc 2008-05-16       drh: <a name="examplesint"></a><h3>3.1 Integer encoding</h3>
d87ca60c58 2008-05-15   stephan: 
d87ca60c58 2008-05-15   stephan: <table border=1>
d87ca60c58 2008-05-15   stephan: <tr>
d87ca60c58 2008-05-15   stephan: <th>Value</th>
d87ca60c58 2008-05-15   stephan: <th>Encoding</th>
d87ca60c58 2008-05-15   stephan: </tr>
d87ca60c58 2008-05-15   stephan: <tr>
d87ca60c58 2008-05-15   stephan: <td>0</td>
d87ca60c58 2008-05-15   stephan: <td>0</td>
d87ca60c58 2008-05-15   stephan: </tr>
d87ca60c58 2008-05-15   stephan: <tr>
d87ca60c58 2008-05-15   stephan: <td>6246</td>
d87ca60c58 2008-05-15   stephan: <td>1Xb</td>
d87ca60c58 2008-05-15   stephan: </tr>
d87ca60c58 2008-05-15   stephan: <tr>
d87ca60c58 2008-05-15   stephan: <td>-1101438770</td>
d87ca60c58 2008-05-15   stephan: <td>2zMM3E</td>
d87ca60c58 2008-05-15   stephan: </tr>
d87ca60c58 2008-05-15   stephan: </table>
d87ca60c58 2008-05-15   stephan: 
16094f7ebc 2008-05-16       drh: <a name="examplesdelta"></a><h3>3.2 Delta encoding</h3>
d87ca60c58 2008-05-15   stephan: 
d87ca60c58 2008-05-15   stephan: <p>An example of a delta using the specified encoding is:</p>
d87ca60c58 2008-05-15   stephan: 
d87ca60c58 2008-05-15   stephan: <table border=1><tr><td><pre>
d87ca60c58 2008-05-15   stephan: 1Xb
d87ca60c58 2008-05-15   stephan: 4E@0,2:thFN@4C,6:scenda1B@Jd,6:scenda5x@Kt,6:pieces79@Qt,F: Example: eskil~E@Y0,2zMM3E;</pre>
d87ca60c58 2008-05-15   stephan: </td></tr></table>
d87ca60c58 2008-05-15   stephan: 
d87ca60c58 2008-05-15   stephan: <p>This can be taken apart into the following parts:</p>
d87ca60c58 2008-05-15   stephan: 
d87ca60c58 2008-05-15   stephan: <table border=1>
d87ca60c58 2008-05-15   stephan: <tr><th>What  </th> <th>Encoding         </th><th>Meaning </th><th>Details</th></tr>
d87ca60c58 2008-05-15   stephan: <tr><td>Header</td> <td>1Xb              </td><td>Size    </td><td> 6246	     </td></tr>
d87ca60c58 2008-05-15   stephan: <tr><td>S-List</td> <td>4E@0,	         </td><td>Copy    </td><td> 270 @ 0	     </td></tr>
d87ca60c58 2008-05-15   stephan: <tr><td>&nbsp;</td> <td>2:th	         </td><td>Literal </td><td> 2 'th'	     </td></tr>
d87ca60c58 2008-05-15   stephan: <tr><td>&nbsp;</td> <td>FN@4C,	         </td><td>Copy    </td><td> 983 @ 268	     </td></tr>
d87ca60c58 2008-05-15   stephan: <tr><td>&nbsp;</td> <td>6:scenda         </td><td>Literal </td><td> 6 'scenda'	     </td></tr>
d87ca60c58 2008-05-15   stephan: <tr><td>&nbsp;</td> <td>1B@Jd,	         </td><td>Copy    </td><td> 75 @ 1256	     </td></tr>
d87ca60c58 2008-05-15   stephan: <tr><td>&nbsp;</td> <td>6:scenda         </td><td>Literal </td><td> 6 'scenda'	     </td></tr>
d87ca60c58 2008-05-15   stephan: <tr><td>&nbsp;</td> <td>5x@Kt,	         </td><td>Copy    </td><td> 380 @ 1336	     </td></tr>
d87ca60c58 2008-05-15   stephan: <tr><td>&nbsp;</td> <td>6:pieces	 </td><td>Literal </td><td> 6 'pieces'	     </td></tr>
d87ca60c58 2008-05-15   stephan: <tr><td>&nbsp;</td> <td>79@Qt,	         </td><td>Copy    </td><td> 457 @ 1720     </td></tr>
d87ca60c58 2008-05-15   stephan: <tr><td>&nbsp;</td> <td>F: Example: eskil</td><td>Literal </td><td> 15 ' Example: eskil'</td></tr>
d87ca60c58 2008-05-15   stephan: <tr><td>&nbsp;</td> <td>~E@Y0,           </td><td>Copy    </td><td>  4046 @ 2176        </td></tr>
d87ca60c58 2008-05-15   stephan: <tr><td>Trailer</td><td>2zMM3E           </td><td>Ckecksum</td><td> -1101438770         </td></tr>
d87ca60c58 2008-05-15   stephan: </table>
d87ca60c58 2008-05-15   stephan: 
d87ca60c58 2008-05-15   stephan: <p>The unified diff behind the above delta is</p>
d87ca60c58 2008-05-15   stephan: 
d87ca60c58 2008-05-15   stephan: <table border=1><tr><td><pre>
d87ca60c58 2008-05-15   stephan: bluepeak:(761) ~/Projects/Tcl/Fossil/Devel/devel > diff -u ../DELTA/old ../DELTA/new
d87ca60c58 2008-05-15   stephan: --- ../DELTA/old        2007-08-23 21:14:40.000000000 -0700
d87ca60c58 2008-05-15   stephan: +++ ../DELTA/new        2007-08-23 21:14:33.000000000 -0700
d87ca60c58 2008-05-15   stephan: @@ -5,7 +5,7 @@
d87ca60c58 2008-05-15   stephan: 
d87ca60c58 2008-05-15   stephan:   *  If the server does not have write permission on the database
d87ca60c58 2008-05-15   stephan:      file, or on the directory containing the database file (and
d87ca60c58 2008-05-15   stephan: -    it is thus unable to update database because it cannot create
d87ca60c58 2008-05-15   stephan: +    it is thus unable to update the database because it cannot create
d87ca60c58 2008-05-15   stephan:      a rollback journal) then it currently fails silently on a push.
d87ca60c58 2008-05-15   stephan:      It needs to return a helpful error.
d87ca60c58 2008-05-15   stephan: 
d87ca60c58 2008-05-15   stephan: @@ -27,8 +27,8 @@
d87ca60c58 2008-05-15   stephan:   *  Additional information displayed for the "vinfo" page:
d87ca60c58 2008-05-15   stephan: 
d87ca60c58 2008-05-15   stephan:       +  All leaves of this version that are not included in the
9eb6ea75c1 2008-11-11    kejoki: -        descendant list.  With date, user, comment, and hyperlink.
9eb6ea75c1 2008-11-11    kejoki: -        Leaves in the descendant table should be marked as such.
d87ca60c58 2008-05-15   stephan: +        descendant list.  With date, user, comment, and hyperlink.
d87ca60c58 2008-05-15   stephan: +        Leaves in the descendant table should be marked as such.
d87ca60c58 2008-05-15   stephan:          See the compute_leaves() function to see how to find all
d87ca60c58 2008-05-15   stephan:          leaves.
d87ca60c58 2008-05-15   stephan:       +  Add file diff links to the file change list.
d87ca60c58 2008-05-15   stephan: @@ -37,7 +37,7 @@
d87ca60c58 2008-05-15   stephan: 
d87ca60c58 2008-05-15   stephan:   *  The /xfer handler (for push, pull, and clone) does not do
d87ca60c58 2008-05-15   stephan:      delta compression.  This results in excess bandwidth usage.
d87ca60c58 2008-05-15   stephan: -    There are some code in xfer.c that are sketches of ideas on
d87ca60c58 2008-05-15   stephan: +    There are some pieces in xfer.c that are sketches of ideas on
d87ca60c58 2008-05-15   stephan:      how to do delta compression, but nothing has been implemented.
d87ca60c58 2008-05-15   stephan: 
d87ca60c58 2008-05-15   stephan:   *  Enhancements to the diff and tkdiff commands in the cli.
d87ca60c58 2008-05-15   stephan: @@ -45,7 +45,7 @@
d87ca60c58 2008-05-15   stephan:      single file.  Allow diffs against any two arbitrary versions,
d87ca60c58 2008-05-15   stephan:      not just diffs against the current check-out.  Allow
d87ca60c58 2008-05-15   stephan:      configuration options to replace tkdiff with some other
d87ca60c58 2008-05-15   stephan: -    visual differ of the users choice.
d87ca60c58 2008-05-15   stephan: +    visual differ of the users choice. Example: eskil.
d87ca60c58 2008-05-15   stephan: 
d87ca60c58 2008-05-15   stephan:   *  Ticketing interface (expand this bullet)
d87ca60c58 2008-05-15   stephan: 
d87ca60c58 2008-05-15   stephan: </pre></td></tr></table>
d87ca60c58 2008-05-15   stephan: 
d87ca60c58 2008-05-15   stephan: 
d87ca60c58 2008-05-15   stephan: 
16094f7ebc 2008-05-16       drh: <a name="notes"></a><h2>Notes</h2>
d87ca60c58 2008-05-15   stephan: 
d87ca60c58 2008-05-15   stephan: <ul>
d87ca60c58 2008-05-15   stephan: <li>Pure text files generate a pure text delta.
d87ca60c58 2008-05-15   stephan: </li>
d87ca60c58 2008-05-15   stephan: <li>Binary files generate a delta that may contain some binary data.
d87ca60c58 2008-05-15   stephan: </li>
16094f7ebc 2008-05-16       drh: <li>The delta encoding does not attempt to compress the content
16094f7ebc 2008-05-16       drh: It was considered to be much
16094f7ebc 2008-05-16       drh: more sensible to do compression using a separate general-purpose
16094f7ebc 2008-05-16       drh: compression library, like <a href="http://www.zlib.net">zlib</a>.
d87ca60c58 2008-05-15   stephan: </li>
d87ca60c58 2008-05-15   stephan: </ul>