File Annotation
Not logged in
d87ca60c58 2008-05-15   stephan: <h1 align="center">
d87ca60c58 2008-05-15   stephan: Fossil Repository Integrity Self-Checks
d87ca60c58 2008-05-15   stephan: </h1>
d87ca60c58 2008-05-15   stephan: 
d87ca60c58 2008-05-15   stephan: <p>
d87ca60c58 2008-05-15   stephan: Even though fossil is a relatively new project and still contains
d87ca60c58 2008-05-15   stephan: many bugs, it is designed with features to give it a high level
d87ca60c58 2008-05-15   stephan: of integrity so that you can have confidence that you will not
d87ca60c58 2008-05-15   stephan: lose your files.  This note describes the defensive measures that
d87ca60c58 2008-05-15   stephan: fossil uses to help prevent file loss due to bugs.
d87ca60c58 2008-05-15   stephan: </p>
d87ca60c58 2008-05-15   stephan: 
d87ca60c58 2008-05-15   stephan: <p><i>Follow-up as of 2007-11-24:</i>
d87ca60c58 2008-05-15   stephan: Fossil has been hosting itself and several other projects for
d87ca60c58 2008-05-15   stephan: months now.  Many bugs have been encountered.  But, thanks in large
d87ca60c58 2008-05-15   stephan: part to the defensive measures described here, no data has been
d87ca60c58 2008-05-15   stephan: lost.  The integrity checks are doing their job well.</p>
d87ca60c58 2008-05-15   stephan: 
d87ca60c58 2008-05-15   stephan: <h2>Atomic Check-ins With Rollback</h2>
d87ca60c58 2008-05-15   stephan: 
d87ca60c58 2008-05-15   stephan: <p>
d87ca60c58 2008-05-15   stephan: The fossil repository is an
d87ca60c58 2008-05-15   stephan: <a href="http://www.sqlite.org/">SQLite version 3</a> database file.
d87ca60c58 2008-05-15   stephan: SQLite is very mature and stable and has been in wide-spread use for many
d87ca60c58 2008-05-15   stephan: years, so we have little worries that it might cause repository
d87ca60c58 2008-05-15   stephan: corruption.  SQLite
d87ca60c58 2008-05-15   stephan: databases do not corrupt even if a program or system crash or power
d87ca60c58 2008-05-15   stephan: failure occurs in the middle of the update.  If some kind of crash
d87ca60c58 2008-05-15   stephan: does occur in the middle of a change, then all the changes are rolled
d87ca60c58 2008-05-15   stephan: back the next time that the database is accessed.
d87ca60c58 2008-05-15   stephan: </p>
d87ca60c58 2008-05-15   stephan: 
d87ca60c58 2008-05-15   stephan: <p>
d87ca60c58 2008-05-15   stephan: A check-in operation in fossil makes many changes to the repository
d87ca60c58 2008-05-15   stephan: database.  But all these changes happen within a single transaction.
d87ca60c58 2008-05-15   stephan: If something goes wrong in the middle of the commit, then the transaction
d87ca60c58 2008-05-15   stephan: is rolled back and the database is unchanged.
d87ca60c58 2008-05-15   stephan: </p>
d87ca60c58 2008-05-15   stephan: 
d87ca60c58 2008-05-15   stephan: <h2>Verification Of Delta Encodings Prior To Transaction Commit</h2>
d87ca60c58 2008-05-15   stephan: 
d87ca60c58 2008-05-15   stephan: <p>
d87ca60c58 2008-05-15   stephan: The content files that comprise the global state of a fossil respository
d87ca60c58 2008-05-15   stephan: are stored in the repository as a tree.  The leaves of the tree are
d87ca60c58 2008-05-15   stephan: stored as zlib-compressed BLOBs.  Interior nodes are deltas from their
d87ca60c58 2008-05-15   stephan: decendants.  A lot of encoding is going on.  There is
d87ca60c58 2008-05-15   stephan: zlib-compression which is relatively well-tested but still might
d87ca60c58 2008-05-15   stephan: cause corruption if used improperly.  And there is the relatively
d87ca60c58 2008-05-15   stephan: new delta-encoding mechanism designed expressly for fossil.  We want
d87ca60c58 2008-05-15   stephan: to make sure that bugs in these encoding mechanisms do not lead to
d87ca60c58 2008-05-15   stephan: loss of data.
d87ca60c58 2008-05-15   stephan: </p>
d87ca60c58 2008-05-15   stephan: 
d87ca60c58 2008-05-15   stephan: <p>
d87ca60c58 2008-05-15   stephan: To increase our confidence that everything in the repository is
d87ca60c58 2008-05-15   stephan: recoverable, fossil makes sure it can extract an exact replicate
d87ca60c58 2008-05-15   stephan: of every content file that it changes just prior to transaction
d87ca60c58 2008-05-15   stephan: commit.  So during the course of check-in (or other repository
d87ca60c58 2008-05-15   stephan: operation) many different files
d87ca60c58 2008-05-15   stephan: in the repository might be modified.  Some files are simply
d87ca60c58 2008-05-15   stephan: compressed.  Other files are delta encoded and then compressed.
d87ca60c58 2008-05-15   stephan: While all this is going on, fossil makes a record of every file
d87ca60c58 2008-05-15   stephan: that is encoded and the SHA1 hash of the original content of that
d87ca60c58 2008-05-15   stephan: file.  Then just before transaction commit, fossil re-extracts
d87ca60c58 2008-05-15   stephan: the original content of all files that were written, computes
d87ca60c58 2008-05-15   stephan: the SHA1 checksum again, and verifies that the checksums match.
d87ca60c58 2008-05-15   stephan: If anything does not match up, an error
d87ca60c58 2008-05-15   stephan: message is printed and the transaction rolls back.
d87ca60c58 2008-05-15   stephan: </p>
d87ca60c58 2008-05-15   stephan: 
d87ca60c58 2008-05-15   stephan: <p>
d87ca60c58 2008-05-15   stephan: So, in other words, fossil always checks to make sure it can
d87ca60c58 2008-05-15   stephan: re-extract a file before it commits a change to that file.
d87ca60c58 2008-05-15   stephan: Hence bugs in fossil are unlikely to corrupt the repository in
d87ca60c58 2008-05-15   stephan: a way that prevents us from extracting historical versions of
d87ca60c58 2008-05-15   stephan: files.
d87ca60c58 2008-05-15   stephan: </p>
d87ca60c58 2008-05-15   stephan: 
d87ca60c58 2008-05-15   stephan: <h2>Checksum Over All Files In A Baseline</h2>
d87ca60c58 2008-05-15   stephan: 
d87ca60c58 2008-05-15   stephan: <p>
d87ca60c58 2008-05-15   stephan: Manifest artifacts that define a baseline have two fields (the
d87ca60c58 2008-05-15   stephan: R-card and Z-card) that record MD5 hashs of the manifest itself
d87ca60c58 2008-05-15   stephan: and of all other files in the manifest.  Prior to any check-in
d87ca60c58 2008-05-15   stephan: commit, these checksums are verified to ensure that the baseline
d87ca60c58 2008-05-15   stephan: checked in agrees exactly with what is on disk.  Similarly,
d87ca60c58 2008-05-15   stephan: the repository checksum is verified after a checkout to make
d87ca60c58 2008-05-15   stephan: sure that the entire repository was checked out correctly.
d87ca60c58 2008-05-15   stephan: Note that these added checks use a different hash (MD5 instead
d87ca60c58 2008-05-15   stephan: of SHA1) in order to avoid common-mode failures in the hash
d87ca60c58 2008-05-15   stephan: algorithm implementation.
d87ca60c58 2008-05-15   stephan: </p>