File Annotation
Not logged in
d87ca60c58 2008-05-15   stephan: <h1 align="center">
d87ca60c58 2008-05-15   stephan: Fossil Delta Format
d87ca60c58 2008-05-15   stephan: </h1>
d87ca60c58 2008-05-15   stephan: 
d87ca60c58 2008-05-15   stephan: <p>A key component for the efficient storage of multiple revisions of
d87ca60c58 2008-05-15   stephan: a file in fossil repositories is the use of delta-compression, i.e. to
d87ca60c58 2008-05-15   stephan: store only the changes between revisions instead of the whole
d87ca60c58 2008-05-15   stephan: file.</p>
d87ca60c58 2008-05-15   stephan: 
d87ca60c58 2008-05-15   stephan: <p>This document describes the format used to encode such changes,
d87ca60c58 2008-05-15   stephan: also known as "delta". It is targeted at developers working on either
d87ca60c58 2008-05-15   stephan: <a href="index.html">fossil</a> itself, or on tools compatible with
d87ca60c58 2008-05-15   stephan: it.</p>
d87ca60c58 2008-05-15   stephan: 
d87ca60c58 2008-05-15   stephan: <a name="structure"><h2>1.0 Structure</h2>
d87ca60c58 2008-05-15   stephan: <img src="delta1.gif" align="left" hspace="10">
d87ca60c58 2008-05-15   stephan: 
d87ca60c58 2008-05-15   stephan: <p>A delta consists of three parts, a "header", a "trailer", and a
d87ca60c58 2008-05-15   stephan: "segment-list" between them.</p>
d87ca60c58 2008-05-15   stephan: 
d87ca60c58 2008-05-15   stephan: <p>Both header and trailer provide information about the target
d87ca60c58 2008-05-15   stephan: helping the decoder, and the segment-list describes how the target can
d87ca60c58 2008-05-15   stephan: be constructed from the original.</p>
d87ca60c58 2008-05-15   stephan: 
d87ca60c58 2008-05-15   stephan: <a name="header"><h3>1.1 Header</h3>
d87ca60c58 2008-05-15   stephan: <img src="delta6.gif" align="left" hspace="10">
d87ca60c58 2008-05-15   stephan: 
d87ca60c58 2008-05-15   stephan: <p>The header consists of a single number followed by a newline
d87ca60c58 2008-05-15   stephan: character (ASCII 0x0a). The number is the length of the target in
d87ca60c58 2008-05-15   stephan: bytes.</p>
d87ca60c58 2008-05-15   stephan: 
d87ca60c58 2008-05-15   stephan: <p>This means that, given a delta, the decoder can compute the size of
d87ca60c58 2008-05-15   stephan: the target (and allocate any necessary memory based on that) by simply
d87ca60c58 2008-05-15   stephan: reading the first line of the delta and decoding the number found
d87ca60c58 2008-05-15   stephan: there. In other words, before it has to decode everything else.</p>
d87ca60c58 2008-05-15   stephan: 
d87ca60c58 2008-05-15   stephan: <a name="trailer"><h3>1.2 Trailer</h3>
d87ca60c58 2008-05-15   stephan: <img src="delta5.gif" align="left" hspace="10">
d87ca60c58 2008-05-15   stephan: 
d87ca60c58 2008-05-15   stephan: <p>The trailer consists of a single number followed by a semicolon (ASCII
d87ca60c58 2008-05-15   stephan: 0x3b). This number is a checksum of the target and can be used by a
d87ca60c58 2008-05-15   stephan: decoder to verify that the delta applied correctly, reconstructing the
d87ca60c58 2008-05-15   stephan: target from the original.</p>
d87ca60c58 2008-05-15   stephan: 
d87ca60c58 2008-05-15   stephan: <p>The checksum is computed by treating the target as a series of
d87ca60c58 2008-05-15   stephan: 32-bit integer numbers (MSB first), and summing these up, modulo
d87ca60c58 2008-05-15   stephan: 2^32-1. A target whose length is not a multiple of 4 is padded with
d87ca60c58 2008-05-15   stephan: 0-bytes (ASCII 0x00) at the end.</p>
d87ca60c58 2008-05-15   stephan: 
d87ca60c58 2008-05-15   stephan: <p>By putting this information at the end of the delta a decoder has
d87ca60c58 2008-05-15   stephan: it available immediately after the target has been reconstructed
d87ca60c58 2008-05-15   stephan: fully.</p>
d87ca60c58 2008-05-15   stephan: 
d87ca60c58 2008-05-15   stephan: <a name="slist"><h3>1.3 Segment-List</h3>
d87ca60c58 2008-05-15   stephan: <img src="delta2.gif" align="left" hspace="10">
d87ca60c58 2008-05-15   stephan: 
d87ca60c58 2008-05-15   stephan: <p>The segment-list of a delta describes how to create the target from
d87ca60c58 2008-05-15   stephan: the original by a combination of inserting literal byte-sequences and
d87ca60c58 2008-05-15   stephan: copying ranges of bytes from the original. This is there the
d87ca60c58 2008-05-15   stephan: compression takes place, by encoding the large common parts of
d87ca60c58 2008-05-15   stephan: original and target in small copy instructions.</p>
d87ca60c58 2008-05-15   stephan: 
d87ca60c58 2008-05-15   stephan: <p>The target is constructed from beginning to end, with the data
d87ca60c58 2008-05-15   stephan: generated by each instruction appended after the data of all previous
d87ca60c58 2008-05-15   stephan: instructions, with no gaps.</p>
d87ca60c58 2008-05-15   stephan: 
d87ca60c58 2008-05-15   stephan: <a name="insertlit"><h4>1.3.1 Insert Literal</h4>
d87ca60c58 2008-05-15   stephan: 
d87ca60c58 2008-05-15   stephan: <p>A literal is specified by two elements, the size of the literal in
d87ca60c58 2008-05-15   stephan: bytes, and the bytes of the literal itself.</p>
d87ca60c58 2008-05-15   stephan: 
d87ca60c58 2008-05-15   stephan: <img src="delta4.gif" align="left" hspace="10">
d87ca60c58 2008-05-15   stephan: <p>The length is written first, followed by a colon character (ASCII
d87ca60c58 2008-05-15   stephan: 0x3a), followed by the bytes of the literal.</p>
d87ca60c58 2008-05-15   stephan: 
d87ca60c58 2008-05-15   stephan: <a name="copyrange"><h4>1.3.2 Copy Range</h4>
d87ca60c58 2008-05-15   stephan: 
d87ca60c58 2008-05-15   stephan: <p>A range to copy is specified by two numbers, the offset of the
d87ca60c58 2008-05-15   stephan: first byte in the original to copy, and the size of the range, in
d87ca60c58 2008-05-15   stephan: bytes. The size zero is special, its usage indicates that the range
d87ca60c58 2008-05-15   stephan: extends to the end of the original.</p>
d87ca60c58 2008-05-15   stephan: 
d87ca60c58 2008-05-15   stephan: <img src="delta3.gif" align="left" hspace="10">
d87ca60c58 2008-05-15   stephan: <p>The length is written first, followed by an "at" character (ASCII
d87ca60c58 2008-05-15   stephan: 0x40), then the offset, followed by a comma (ASCII 0x2c).</p>
d87ca60c58 2008-05-15   stephan: 
d87ca60c58 2008-05-15   stephan: <a name="intcoding"><h2>2.0 Encoding of integers</h2>
d87ca60c58 2008-05-15   stephan: 
d87ca60c58 2008-05-15   stephan: <p>
d87ca60c58 2008-05-15   stephan: The format currently handles only 32 bit integer numbers. They are
d87ca60c58 2008-05-15   stephan: written base-64 encoded, MSB first, and without leading
d87ca60c58 2008-05-15   stephan: "0"-characters, except if they are significant (i.e. 0 => "0").
d87ca60c58 2008-05-15   stephan: </p>
d87ca60c58 2008-05-15   stephan: 
d87ca60c58 2008-05-15   stephan: <p>
d87ca60c58 2008-05-15   stephan: The base-64 coding is described in
d87ca60c58 2008-05-15   stephan: <a href="http://www.ietf.org/rfc/rfc3548.txt">RFC 3548</a>.
d87ca60c58 2008-05-15   stephan: </p>
d87ca60c58 2008-05-15   stephan: 
d87ca60c58 2008-05-15   stephan: <a name="examples"><h2>3.0 Examples</h2>
d87ca60c58 2008-05-15   stephan: 
d87ca60c58 2008-05-15   stephan: <a name="examplesint"><h3>3.1 Integer encoding</h3>
d87ca60c58 2008-05-15   stephan: 
d87ca60c58 2008-05-15   stephan: <table border=1>
d87ca60c58 2008-05-15   stephan: <tr>
d87ca60c58 2008-05-15   stephan: <th>Value</th>
d87ca60c58 2008-05-15   stephan: <th>Encoding</th>
d87ca60c58 2008-05-15   stephan: </tr>
d87ca60c58 2008-05-15   stephan: <tr>
d87ca60c58 2008-05-15   stephan: <td>0</td>
d87ca60c58 2008-05-15   stephan: <td>0</td>
d87ca60c58 2008-05-15   stephan: </tr>
d87ca60c58 2008-05-15   stephan: <tr>
d87ca60c58 2008-05-15   stephan: <td>6246</td>
d87ca60c58 2008-05-15   stephan: <td>1Xb</td>
d87ca60c58 2008-05-15   stephan: </tr>
d87ca60c58 2008-05-15   stephan: <tr>
d87ca60c58 2008-05-15   stephan: <td>-1101438770</td>
d87ca60c58 2008-05-15   stephan: <td>2zMM3E</td>
d87ca60c58 2008-05-15   stephan: </tr>
d87ca60c58 2008-05-15   stephan: </table>
d87ca60c58 2008-05-15   stephan: 
d87ca60c58 2008-05-15   stephan: <a name="examplesdelta"><h3>3.2 Delta encoding</h3>
d87ca60c58 2008-05-15   stephan: 
d87ca60c58 2008-05-15   stephan: <p>An example of a delta using the specified encoding is:</p>
d87ca60c58 2008-05-15   stephan: 
d87ca60c58 2008-05-15   stephan: <table border=1><tr><td><pre>
d87ca60c58 2008-05-15   stephan: 1Xb
d87ca60c58 2008-05-15   stephan: 4E@0,2:thFN@4C,6:scenda1B@Jd,6:scenda5x@Kt,6:pieces79@Qt,F: Example: eskil~E@Y0,2zMM3E;</pre>
d87ca60c58 2008-05-15   stephan: </td></tr></table>
d87ca60c58 2008-05-15   stephan: 
d87ca60c58 2008-05-15   stephan: <p>This can be taken apart into the following parts:</p>
d87ca60c58 2008-05-15   stephan: 
d87ca60c58 2008-05-15   stephan: <table border=1>
d87ca60c58 2008-05-15   stephan: <tr><th>What  </th> <th>Encoding         </th><th>Meaning </th><th>Details</th></tr>
d87ca60c58 2008-05-15   stephan: <tr><td>Header</td> <td>1Xb              </td><td>Size    </td><td> 6246	     </td></tr>
d87ca60c58 2008-05-15   stephan: <tr><td>S-List</td> <td>4E@0,	         </td><td>Copy    </td><td> 270 @ 0	     </td></tr>
d87ca60c58 2008-05-15   stephan: <tr><td>&nbsp;</td> <td>2:th	         </td><td>Literal </td><td> 2 'th'	     </td></tr>
d87ca60c58 2008-05-15   stephan: <tr><td>&nbsp;</td> <td>FN@4C,	         </td><td>Copy    </td><td> 983 @ 268	     </td></tr>
d87ca60c58 2008-05-15   stephan: <tr><td>&nbsp;</td> <td>6:scenda         </td><td>Literal </td><td> 6 'scenda'	     </td></tr>
d87ca60c58 2008-05-15   stephan: <tr><td>&nbsp;</td> <td>1B@Jd,	         </td><td>Copy    </td><td> 75 @ 1256	     </td></tr>
d87ca60c58 2008-05-15   stephan: <tr><td>&nbsp;</td> <td>6:scenda         </td><td>Literal </td><td> 6 'scenda'	     </td></tr>
d87ca60c58 2008-05-15   stephan: <tr><td>&nbsp;</td> <td>5x@Kt,	         </td><td>Copy    </td><td> 380 @ 1336	     </td></tr>
d87ca60c58 2008-05-15   stephan: <tr><td>&nbsp;</td> <td>6:pieces	 </td><td>Literal </td><td> 6 'pieces'	     </td></tr>
d87ca60c58 2008-05-15   stephan: <tr><td>&nbsp;</td> <td>79@Qt,	         </td><td>Copy    </td><td> 457 @ 1720     </td></tr>
d87ca60c58 2008-05-15   stephan: <tr><td>&nbsp;</td> <td>F: Example: eskil</td><td>Literal </td><td> 15 ' Example: eskil'</td></tr>
d87ca60c58 2008-05-15   stephan: <tr><td>&nbsp;</td> <td>~E@Y0,           </td><td>Copy    </td><td>  4046 @ 2176        </td></tr>
d87ca60c58 2008-05-15   stephan: <tr><td>Trailer</td><td>2zMM3E           </td><td>Ckecksum</td><td> -1101438770         </td></tr>
d87ca60c58 2008-05-15   stephan: </table>
d87ca60c58 2008-05-15   stephan: 
d87ca60c58 2008-05-15   stephan: <p>The unified diff behind the above delta is</p>
d87ca60c58 2008-05-15   stephan: 
d87ca60c58 2008-05-15   stephan: <table border=1><tr><td><pre>
d87ca60c58 2008-05-15   stephan: bluepeak:(761) ~/Projects/Tcl/Fossil/Devel/devel > diff -u ../DELTA/old ../DELTA/new
d87ca60c58 2008-05-15   stephan: --- ../DELTA/old        2007-08-23 21:14:40.000000000 -0700
d87ca60c58 2008-05-15   stephan: +++ ../DELTA/new        2007-08-23 21:14:33.000000000 -0700
d87ca60c58 2008-05-15   stephan: @@ -5,7 +5,7 @@
d87ca60c58 2008-05-15   stephan: 
d87ca60c58 2008-05-15   stephan:   *  If the server does not have write permission on the database
d87ca60c58 2008-05-15   stephan:      file, or on the directory containing the database file (and
d87ca60c58 2008-05-15   stephan: -    it is thus unable to update database because it cannot create
d87ca60c58 2008-05-15   stephan: +    it is thus unable to update the database because it cannot create
d87ca60c58 2008-05-15   stephan:      a rollback journal) then it currently fails silently on a push.
d87ca60c58 2008-05-15   stephan:      It needs to return a helpful error.
d87ca60c58 2008-05-15   stephan: 
d87ca60c58 2008-05-15   stephan: @@ -27,8 +27,8 @@
d87ca60c58 2008-05-15   stephan:   *  Additional information displayed for the "vinfo" page:
d87ca60c58 2008-05-15   stephan: 
d87ca60c58 2008-05-15   stephan:       +  All leaves of this version that are not included in the
d87ca60c58 2008-05-15   stephan: -        decendant list.  With date, user, comment, and hyperlink.
d87ca60c58 2008-05-15   stephan: -        Leaves in the decendant table should be marked as such.
d87ca60c58 2008-05-15   stephan: +        descendant list.  With date, user, comment, and hyperlink.
d87ca60c58 2008-05-15   stephan: +        Leaves in the descendant table should be marked as such.
d87ca60c58 2008-05-15   stephan:          See the compute_leaves() function to see how to find all
d87ca60c58 2008-05-15   stephan:          leaves.
d87ca60c58 2008-05-15   stephan:       +  Add file diff links to the file change list.
d87ca60c58 2008-05-15   stephan: @@ -37,7 +37,7 @@
d87ca60c58 2008-05-15   stephan: 
d87ca60c58 2008-05-15   stephan:   *  The /xfer handler (for push, pull, and clone) does not do
d87ca60c58 2008-05-15   stephan:      delta compression.  This results in excess bandwidth usage.
d87ca60c58 2008-05-15   stephan: -    There are some code in xfer.c that are sketches of ideas on
d87ca60c58 2008-05-15   stephan: +    There are some pieces in xfer.c that are sketches of ideas on
d87ca60c58 2008-05-15   stephan:      how to do delta compression, but nothing has been implemented.
d87ca60c58 2008-05-15   stephan: 
d87ca60c58 2008-05-15   stephan:   *  Enhancements to the diff and tkdiff commands in the cli.
d87ca60c58 2008-05-15   stephan: @@ -45,7 +45,7 @@
d87ca60c58 2008-05-15   stephan:      single file.  Allow diffs against any two arbitrary versions,
d87ca60c58 2008-05-15   stephan:      not just diffs against the current check-out.  Allow
d87ca60c58 2008-05-15   stephan:      configuration options to replace tkdiff with some other
d87ca60c58 2008-05-15   stephan: -    visual differ of the users choice.
d87ca60c58 2008-05-15   stephan: +    visual differ of the users choice. Example: eskil.
d87ca60c58 2008-05-15   stephan: 
d87ca60c58 2008-05-15   stephan:   *  Ticketing interface (expand this bullet)
d87ca60c58 2008-05-15   stephan: 
d87ca60c58 2008-05-15   stephan: </pre></td></tr></table>
d87ca60c58 2008-05-15   stephan: 
d87ca60c58 2008-05-15   stephan: 
d87ca60c58 2008-05-15   stephan: 
d87ca60c58 2008-05-15   stephan: <a name="notes"><h2>Notes</h2>
d87ca60c58 2008-05-15   stephan: 
d87ca60c58 2008-05-15   stephan: <ul>
d87ca60c58 2008-05-15   stephan: <li>Pure text files generate a pure text delta.
d87ca60c58 2008-05-15   stephan: </li>
d87ca60c58 2008-05-15   stephan: <li>Binary files generate a delta that may contain some binary data.
d87ca60c58 2008-05-15   stephan: </li>
d87ca60c58 2008-05-15   stephan: <li>Instead of putting special instructions for general compression
d87ca60c58 2008-05-15   stephan: into the delta-format itself, specifically the segment-list, like
d87ca60c58 2008-05-15   stephan: run-length encoding of literals, etc. it was considered to be much
d87ca60c58 2008-05-15   stephan: more sensible to keep the various concern separate and use a general
d87ca60c58 2008-05-15   stephan: compression library, like <a href="http://www.zlib.net">zlib</a>, to
d87ca60c58 2008-05-15   stephan: compress the full delta after its generation.
d87ca60c58 2008-05-15   stephan: </li>
d87ca60c58 2008-05-15   stephan: </ul>