d87ca60c58 2008-05-15 stephan: <h1 align="center"> d87ca60c58 2008-05-15 stephan: Fossil Delta Format d87ca60c58 2008-05-15 stephan: </h1> d87ca60c58 2008-05-15 stephan: d87ca60c58 2008-05-15 stephan: <p>A key component for the efficient storage of multiple revisions of d87ca60c58 2008-05-15 stephan: a file in fossil repositories is the use of delta-compression, i.e. to d87ca60c58 2008-05-15 stephan: store only the changes between revisions instead of the whole d87ca60c58 2008-05-15 stephan: file.</p> d87ca60c58 2008-05-15 stephan: d87ca60c58 2008-05-15 stephan: <p>This document describes the format used to encode such changes, d87ca60c58 2008-05-15 stephan: also known as "delta". It is targeted at developers working on either d87ca60c58 2008-05-15 stephan: <a href="index.html">fossil</a> itself, or on tools compatible with d87ca60c58 2008-05-15 stephan: it.</p> d87ca60c58 2008-05-15 stephan: d87ca60c58 2008-05-15 stephan: <a name="structure"><h2>1.0 Structure</h2> d87ca60c58 2008-05-15 stephan: <img src="delta1.gif" align="left" hspace="10"> d87ca60c58 2008-05-15 stephan: d87ca60c58 2008-05-15 stephan: <p>A delta consists of three parts, a "header", a "trailer", and a d87ca60c58 2008-05-15 stephan: "segment-list" between them.</p> d87ca60c58 2008-05-15 stephan: d87ca60c58 2008-05-15 stephan: <p>Both header and trailer provide information about the target d87ca60c58 2008-05-15 stephan: helping the decoder, and the segment-list describes how the target can d87ca60c58 2008-05-15 stephan: be constructed from the original.</p> d87ca60c58 2008-05-15 stephan: d87ca60c58 2008-05-15 stephan: <a name="header"><h3>1.1 Header</h3> d87ca60c58 2008-05-15 stephan: <img src="delta6.gif" align="left" hspace="10"> d87ca60c58 2008-05-15 stephan: d87ca60c58 2008-05-15 stephan: <p>The header consists of a single number followed by a newline d87ca60c58 2008-05-15 stephan: character (ASCII 0x0a). The number is the length of the target in d87ca60c58 2008-05-15 stephan: bytes.</p> d87ca60c58 2008-05-15 stephan: d87ca60c58 2008-05-15 stephan: <p>This means that, given a delta, the decoder can compute the size of d87ca60c58 2008-05-15 stephan: the target (and allocate any necessary memory based on that) by simply d87ca60c58 2008-05-15 stephan: reading the first line of the delta and decoding the number found d87ca60c58 2008-05-15 stephan: there. In other words, before it has to decode everything else.</p> d87ca60c58 2008-05-15 stephan: d87ca60c58 2008-05-15 stephan: <a name="trailer"><h3>1.2 Trailer</h3> d87ca60c58 2008-05-15 stephan: <img src="delta5.gif" align="left" hspace="10"> d87ca60c58 2008-05-15 stephan: d87ca60c58 2008-05-15 stephan: <p>The trailer consists of a single number followed by a semicolon (ASCII d87ca60c58 2008-05-15 stephan: 0x3b). This number is a checksum of the target and can be used by a d87ca60c58 2008-05-15 stephan: decoder to verify that the delta applied correctly, reconstructing the d87ca60c58 2008-05-15 stephan: target from the original.</p> d87ca60c58 2008-05-15 stephan: d87ca60c58 2008-05-15 stephan: <p>The checksum is computed by treating the target as a series of d87ca60c58 2008-05-15 stephan: 32-bit integer numbers (MSB first), and summing these up, modulo d87ca60c58 2008-05-15 stephan: 2^32-1. A target whose length is not a multiple of 4 is padded with d87ca60c58 2008-05-15 stephan: 0-bytes (ASCII 0x00) at the end.</p> d87ca60c58 2008-05-15 stephan: d87ca60c58 2008-05-15 stephan: <p>By putting this information at the end of the delta a decoder has d87ca60c58 2008-05-15 stephan: it available immediately after the target has been reconstructed d87ca60c58 2008-05-15 stephan: fully.</p> d87ca60c58 2008-05-15 stephan: d87ca60c58 2008-05-15 stephan: <a name="slist"><h3>1.3 Segment-List</h3> d87ca60c58 2008-05-15 stephan: <img src="delta2.gif" align="left" hspace="10"> d87ca60c58 2008-05-15 stephan: d87ca60c58 2008-05-15 stephan: <p>The segment-list of a delta describes how to create the target from d87ca60c58 2008-05-15 stephan: the original by a combination of inserting literal byte-sequences and d87ca60c58 2008-05-15 stephan: copying ranges of bytes from the original. This is there the d87ca60c58 2008-05-15 stephan: compression takes place, by encoding the large common parts of d87ca60c58 2008-05-15 stephan: original and target in small copy instructions.</p> d87ca60c58 2008-05-15 stephan: d87ca60c58 2008-05-15 stephan: <p>The target is constructed from beginning to end, with the data d87ca60c58 2008-05-15 stephan: generated by each instruction appended after the data of all previous d87ca60c58 2008-05-15 stephan: instructions, with no gaps.</p> d87ca60c58 2008-05-15 stephan: d87ca60c58 2008-05-15 stephan: <a name="insertlit"><h4>1.3.1 Insert Literal</h4> d87ca60c58 2008-05-15 stephan: d87ca60c58 2008-05-15 stephan: <p>A literal is specified by two elements, the size of the literal in d87ca60c58 2008-05-15 stephan: bytes, and the bytes of the literal itself.</p> d87ca60c58 2008-05-15 stephan: d87ca60c58 2008-05-15 stephan: <img src="delta4.gif" align="left" hspace="10"> d87ca60c58 2008-05-15 stephan: <p>The length is written first, followed by a colon character (ASCII d87ca60c58 2008-05-15 stephan: 0x3a), followed by the bytes of the literal.</p> d87ca60c58 2008-05-15 stephan: d87ca60c58 2008-05-15 stephan: <a name="copyrange"><h4>1.3.2 Copy Range</h4> d87ca60c58 2008-05-15 stephan: d87ca60c58 2008-05-15 stephan: <p>A range to copy is specified by two numbers, the offset of the d87ca60c58 2008-05-15 stephan: first byte in the original to copy, and the size of the range, in d87ca60c58 2008-05-15 stephan: bytes. The size zero is special, its usage indicates that the range d87ca60c58 2008-05-15 stephan: extends to the end of the original.</p> d87ca60c58 2008-05-15 stephan: d87ca60c58 2008-05-15 stephan: <img src="delta3.gif" align="left" hspace="10"> d87ca60c58 2008-05-15 stephan: <p>The length is written first, followed by an "at" character (ASCII d87ca60c58 2008-05-15 stephan: 0x40), then the offset, followed by a comma (ASCII 0x2c).</p> d87ca60c58 2008-05-15 stephan: d87ca60c58 2008-05-15 stephan: <a name="intcoding"><h2>2.0 Encoding of integers</h2> d87ca60c58 2008-05-15 stephan: d87ca60c58 2008-05-15 stephan: <p> d87ca60c58 2008-05-15 stephan: The format currently handles only 32 bit integer numbers. They are d87ca60c58 2008-05-15 stephan: written base-64 encoded, MSB first, and without leading d87ca60c58 2008-05-15 stephan: "0"-characters, except if they are significant (i.e. 0 => "0"). d87ca60c58 2008-05-15 stephan: </p> d87ca60c58 2008-05-15 stephan: d87ca60c58 2008-05-15 stephan: <p> d87ca60c58 2008-05-15 stephan: The base-64 coding is described in d87ca60c58 2008-05-15 stephan: <a href="http://www.ietf.org/rfc/rfc3548.txt">RFC 3548</a>. d87ca60c58 2008-05-15 stephan: </p> d87ca60c58 2008-05-15 stephan: d87ca60c58 2008-05-15 stephan: <a name="examples"><h2>3.0 Examples</h2> d87ca60c58 2008-05-15 stephan: d87ca60c58 2008-05-15 stephan: <a name="examplesint"><h3>3.1 Integer encoding</h3> d87ca60c58 2008-05-15 stephan: d87ca60c58 2008-05-15 stephan: <table border=1> d87ca60c58 2008-05-15 stephan: <tr> d87ca60c58 2008-05-15 stephan: <th>Value</th> d87ca60c58 2008-05-15 stephan: <th>Encoding</th> d87ca60c58 2008-05-15 stephan: </tr> d87ca60c58 2008-05-15 stephan: <tr> d87ca60c58 2008-05-15 stephan: <td>0</td> d87ca60c58 2008-05-15 stephan: <td>0</td> d87ca60c58 2008-05-15 stephan: </tr> d87ca60c58 2008-05-15 stephan: <tr> d87ca60c58 2008-05-15 stephan: <td>6246</td> d87ca60c58 2008-05-15 stephan: <td>1Xb</td> d87ca60c58 2008-05-15 stephan: </tr> d87ca60c58 2008-05-15 stephan: <tr> d87ca60c58 2008-05-15 stephan: <td>-1101438770</td> d87ca60c58 2008-05-15 stephan: <td>2zMM3E</td> d87ca60c58 2008-05-15 stephan: </tr> d87ca60c58 2008-05-15 stephan: </table> d87ca60c58 2008-05-15 stephan: d87ca60c58 2008-05-15 stephan: <a name="examplesdelta"><h3>3.2 Delta encoding</h3> d87ca60c58 2008-05-15 stephan: d87ca60c58 2008-05-15 stephan: <p>An example of a delta using the specified encoding is:</p> d87ca60c58 2008-05-15 stephan: d87ca60c58 2008-05-15 stephan: <table border=1><tr><td><pre> d87ca60c58 2008-05-15 stephan: 1Xb d87ca60c58 2008-05-15 stephan: 4E@0,2:thFN@4C,6:scenda1B@Jd,6:scenda5x@Kt,6:pieces79@Qt,F: Example: eskil~E@Y0,2zMM3E;</pre> d87ca60c58 2008-05-15 stephan: </td></tr></table> d87ca60c58 2008-05-15 stephan: d87ca60c58 2008-05-15 stephan: <p>This can be taken apart into the following parts:</p> d87ca60c58 2008-05-15 stephan: d87ca60c58 2008-05-15 stephan: <table border=1> d87ca60c58 2008-05-15 stephan: <tr><th>What </th> <th>Encoding </th><th>Meaning </th><th>Details</th></tr> d87ca60c58 2008-05-15 stephan: <tr><td>Header</td> <td>1Xb </td><td>Size </td><td> 6246 </td></tr> d87ca60c58 2008-05-15 stephan: <tr><td>S-List</td> <td>4E@0, </td><td>Copy </td><td> 270 @ 0 </td></tr> d87ca60c58 2008-05-15 stephan: <tr><td> </td> <td>2:th </td><td>Literal </td><td> 2 'th' </td></tr> d87ca60c58 2008-05-15 stephan: <tr><td> </td> <td>FN@4C, </td><td>Copy </td><td> 983 @ 268 </td></tr> d87ca60c58 2008-05-15 stephan: <tr><td> </td> <td>6:scenda </td><td>Literal </td><td> 6 'scenda' </td></tr> d87ca60c58 2008-05-15 stephan: <tr><td> </td> <td>1B@Jd, </td><td>Copy </td><td> 75 @ 1256 </td></tr> d87ca60c58 2008-05-15 stephan: <tr><td> </td> <td>6:scenda </td><td>Literal </td><td> 6 'scenda' </td></tr> d87ca60c58 2008-05-15 stephan: <tr><td> </td> <td>5x@Kt, </td><td>Copy </td><td> 380 @ 1336 </td></tr> d87ca60c58 2008-05-15 stephan: <tr><td> </td> <td>6:pieces </td><td>Literal </td><td> 6 'pieces' </td></tr> d87ca60c58 2008-05-15 stephan: <tr><td> </td> <td>79@Qt, </td><td>Copy </td><td> 457 @ 1720 </td></tr> d87ca60c58 2008-05-15 stephan: <tr><td> </td> <td>F: Example: eskil</td><td>Literal </td><td> 15 ' Example: eskil'</td></tr> d87ca60c58 2008-05-15 stephan: <tr><td> </td> <td>~E@Y0, </td><td>Copy </td><td> 4046 @ 2176 </td></tr> d87ca60c58 2008-05-15 stephan: <tr><td>Trailer</td><td>2zMM3E </td><td>Ckecksum</td><td> -1101438770 </td></tr> d87ca60c58 2008-05-15 stephan: </table> d87ca60c58 2008-05-15 stephan: d87ca60c58 2008-05-15 stephan: <p>The unified diff behind the above delta is</p> d87ca60c58 2008-05-15 stephan: d87ca60c58 2008-05-15 stephan: <table border=1><tr><td><pre> d87ca60c58 2008-05-15 stephan: bluepeak:(761) ~/Projects/Tcl/Fossil/Devel/devel > diff -u ../DELTA/old ../DELTA/new d87ca60c58 2008-05-15 stephan: --- ../DELTA/old 2007-08-23 21:14:40.000000000 -0700 d87ca60c58 2008-05-15 stephan: +++ ../DELTA/new 2007-08-23 21:14:33.000000000 -0700 d87ca60c58 2008-05-15 stephan: @@ -5,7 +5,7 @@ d87ca60c58 2008-05-15 stephan: d87ca60c58 2008-05-15 stephan: * If the server does not have write permission on the database d87ca60c58 2008-05-15 stephan: file, or on the directory containing the database file (and d87ca60c58 2008-05-15 stephan: - it is thus unable to update database because it cannot create d87ca60c58 2008-05-15 stephan: + it is thus unable to update the database because it cannot create d87ca60c58 2008-05-15 stephan: a rollback journal) then it currently fails silently on a push. d87ca60c58 2008-05-15 stephan: It needs to return a helpful error. d87ca60c58 2008-05-15 stephan: d87ca60c58 2008-05-15 stephan: @@ -27,8 +27,8 @@ d87ca60c58 2008-05-15 stephan: * Additional information displayed for the "vinfo" page: d87ca60c58 2008-05-15 stephan: d87ca60c58 2008-05-15 stephan: + All leaves of this version that are not included in the d87ca60c58 2008-05-15 stephan: - decendant list. With date, user, comment, and hyperlink. d87ca60c58 2008-05-15 stephan: - Leaves in the decendant table should be marked as such. d87ca60c58 2008-05-15 stephan: + descendant list. With date, user, comment, and hyperlink. d87ca60c58 2008-05-15 stephan: + Leaves in the descendant table should be marked as such. d87ca60c58 2008-05-15 stephan: See the compute_leaves() function to see how to find all d87ca60c58 2008-05-15 stephan: leaves. d87ca60c58 2008-05-15 stephan: + Add file diff links to the file change list. d87ca60c58 2008-05-15 stephan: @@ -37,7 +37,7 @@ d87ca60c58 2008-05-15 stephan: d87ca60c58 2008-05-15 stephan: * The /xfer handler (for push, pull, and clone) does not do d87ca60c58 2008-05-15 stephan: delta compression. This results in excess bandwidth usage. d87ca60c58 2008-05-15 stephan: - There are some code in xfer.c that are sketches of ideas on d87ca60c58 2008-05-15 stephan: + There are some pieces in xfer.c that are sketches of ideas on d87ca60c58 2008-05-15 stephan: how to do delta compression, but nothing has been implemented. d87ca60c58 2008-05-15 stephan: d87ca60c58 2008-05-15 stephan: * Enhancements to the diff and tkdiff commands in the cli. d87ca60c58 2008-05-15 stephan: @@ -45,7 +45,7 @@ d87ca60c58 2008-05-15 stephan: single file. Allow diffs against any two arbitrary versions, d87ca60c58 2008-05-15 stephan: not just diffs against the current check-out. Allow d87ca60c58 2008-05-15 stephan: configuration options to replace tkdiff with some other d87ca60c58 2008-05-15 stephan: - visual differ of the users choice. d87ca60c58 2008-05-15 stephan: + visual differ of the users choice. Example: eskil. d87ca60c58 2008-05-15 stephan: d87ca60c58 2008-05-15 stephan: * Ticketing interface (expand this bullet) d87ca60c58 2008-05-15 stephan: d87ca60c58 2008-05-15 stephan: </pre></td></tr></table> d87ca60c58 2008-05-15 stephan: d87ca60c58 2008-05-15 stephan: d87ca60c58 2008-05-15 stephan: d87ca60c58 2008-05-15 stephan: <a name="notes"><h2>Notes</h2> d87ca60c58 2008-05-15 stephan: d87ca60c58 2008-05-15 stephan: <ul> d87ca60c58 2008-05-15 stephan: <li>Pure text files generate a pure text delta. d87ca60c58 2008-05-15 stephan: </li> d87ca60c58 2008-05-15 stephan: <li>Binary files generate a delta that may contain some binary data. d87ca60c58 2008-05-15 stephan: </li> d87ca60c58 2008-05-15 stephan: <li>Instead of putting special instructions for general compression d87ca60c58 2008-05-15 stephan: into the delta-format itself, specifically the segment-list, like d87ca60c58 2008-05-15 stephan: run-length encoding of literals, etc. it was considered to be much d87ca60c58 2008-05-15 stephan: more sensible to keep the various concern separate and use a general d87ca60c58 2008-05-15 stephan: compression library, like <a href="http://www.zlib.net">zlib</a>, to d87ca60c58 2008-05-15 stephan: compress the full delta after its generation. d87ca60c58 2008-05-15 stephan: </li> d87ca60c58 2008-05-15 stephan: </ul>