File Annotation
Not logged in
522824b26a 2009-08-28       drh: <title>Fossil Repository Integrity Self-Checks</title>
522824b26a 2009-08-28       drh: 
522824b26a 2009-08-28       drh: <h1 align="center">Fossil Repository Integrity Self-Checks</h1>
522824b26a 2009-08-28       drh: 
522824b26a 2009-08-28       drh: Fossil is designed with features to give it a high level
522824b26a 2009-08-28       drh: of integrity so that users can have confidence that content will
522824b26a 2009-08-28       drh: never be mangled or lost by Fossil.
522824b26a 2009-08-28       drh: This note describes the defensive measures that
522824b26a 2009-08-28       drh: Fossil uses to help prevent information loss due to bugs.
522824b26a 2009-08-28       drh: 
16094f7ebc 2008-05-16       drh: Fossil has been hosting itself and many other projects for
522824b26a 2009-08-28       drh: years now.  Many bugs have been encountered.  But, thanks in large
d87ca60c58 2008-05-15   stephan: part to the defensive measures described here, no data has been
d87ca60c58 2008-05-15   stephan: lost.  The integrity checks are doing their job well.</p>
d87ca60c58 2008-05-15   stephan: 
d87ca60c58 2008-05-15   stephan: <h2>Atomic Check-ins With Rollback</h2>
d87ca60c58 2008-05-15   stephan: 
d87ca60c58 2008-05-15   stephan: The fossil repository is an
d87ca60c58 2008-05-15   stephan: <a href="http://www.sqlite.org/">SQLite version 3</a> database file.
d87ca60c58 2008-05-15   stephan: SQLite is very mature and stable and has been in wide-spread use for many
16094f7ebc 2008-05-16       drh: years, so we are confident it will not cause repository
d87ca60c58 2008-05-15   stephan: corruption.  SQLite
d87ca60c58 2008-05-15   stephan: databases do not corrupt even if a program or system crash or power
d87ca60c58 2008-05-15   stephan: failure occurs in the middle of the update.  If some kind of crash
d87ca60c58 2008-05-15   stephan: does occur in the middle of a change, then all the changes are rolled
d87ca60c58 2008-05-15   stephan: back the next time that the database is accessed.
522824b26a 2009-08-28       drh: 
d87ca60c58 2008-05-15   stephan: A check-in operation in fossil makes many changes to the repository
d87ca60c58 2008-05-15   stephan: database.  But all these changes happen within a single transaction.
d87ca60c58 2008-05-15   stephan: If something goes wrong in the middle of the commit, then the transaction
d87ca60c58 2008-05-15   stephan: is rolled back and the database is unchanged.
d87ca60c58 2008-05-15   stephan: 
d87ca60c58 2008-05-15   stephan: <h2>Verification Of Delta Encodings Prior To Transaction Commit</h2>
d87ca60c58 2008-05-15   stephan: 
d87ca60c58 2008-05-15   stephan: The content files that comprise the global state of a fossil respository
d87ca60c58 2008-05-15   stephan: are stored in the repository as a tree.  The leaves of the tree are
d87ca60c58 2008-05-15   stephan: stored as zlib-compressed BLOBs.  Interior nodes are deltas from their
d87ca60c58 2008-05-15   stephan: decendants.  A lot of encoding is going on.  There is
d87ca60c58 2008-05-15   stephan: zlib-compression which is relatively well-tested but still might
d87ca60c58 2008-05-15   stephan: cause corruption if used improperly.  And there is the relatively
d87ca60c58 2008-05-15   stephan: new delta-encoding mechanism designed expressly for fossil.  We want
d87ca60c58 2008-05-15   stephan: to make sure that bugs in these encoding mechanisms do not lead to
d87ca60c58 2008-05-15   stephan: loss of data.
522824b26a 2009-08-28       drh: 
d87ca60c58 2008-05-15   stephan: To increase our confidence that everything in the repository is
d87ca60c58 2008-05-15   stephan: recoverable, fossil makes sure it can extract an exact replicate
d87ca60c58 2008-05-15   stephan: of every content file that it changes just prior to transaction
d87ca60c58 2008-05-15   stephan: commit.  So during the course of check-in (or other repository
d87ca60c58 2008-05-15   stephan: operation) many different files
d87ca60c58 2008-05-15   stephan: in the repository might be modified.  Some files are simply
d87ca60c58 2008-05-15   stephan: compressed.  Other files are delta encoded and then compressed.
d87ca60c58 2008-05-15   stephan: While all this is going on, fossil makes a record of every file
d87ca60c58 2008-05-15   stephan: that is encoded and the SHA1 hash of the original content of that
d87ca60c58 2008-05-15   stephan: file.  Then just before transaction commit, fossil re-extracts
d87ca60c58 2008-05-15   stephan: the original content of all files that were written, computes
d87ca60c58 2008-05-15   stephan: the SHA1 checksum again, and verifies that the checksums match.
d87ca60c58 2008-05-15   stephan: If anything does not match up, an error
d87ca60c58 2008-05-15   stephan: message is printed and the transaction rolls back.
522824b26a 2009-08-28       drh: 
d87ca60c58 2008-05-15   stephan: So, in other words, fossil always checks to make sure it can
d87ca60c58 2008-05-15   stephan: re-extract a file before it commits a change to that file.
d87ca60c58 2008-05-15   stephan: Hence bugs in fossil are unlikely to corrupt the repository in
d87ca60c58 2008-05-15   stephan: a way that prevents us from extracting historical versions of
d87ca60c58 2008-05-15   stephan: files.
d87ca60c58 2008-05-15   stephan: 
904ee40b93 2009-01-23       drh: <h2>Checksum Over All Files In A Check-in</h2>
d87ca60c58 2008-05-15   stephan: 
904ee40b93 2009-01-23       drh: Manifest artifacts that define a check-in have two fields (the
d87ca60c58 2008-05-15   stephan: R-card and Z-card) that record MD5 hashs of the manifest itself
d87ca60c58 2008-05-15   stephan: and of all other files in the manifest.  Prior to any check-in
904ee40b93 2009-01-23       drh: commit, these checksums are verified to ensure that the check-in
d87ca60c58 2008-05-15   stephan: checked in agrees exactly with what is on disk.  Similarly,
d87ca60c58 2008-05-15   stephan: the repository checksum is verified after a checkout to make
d87ca60c58 2008-05-15   stephan: sure that the entire repository was checked out correctly.
d87ca60c58 2008-05-15   stephan: Note that these added checks use a different hash (MD5 instead
d87ca60c58 2008-05-15   stephan: of SHA1) in order to avoid common-mode failures in the hash
d87ca60c58 2008-05-15   stephan: algorithm implementation.
522824b26a 2009-08-28       drh: 
522824b26a 2009-08-28       drh: 
522824b26a 2009-08-28       drh: <h2>Checksums On Control Artifacts And Deltas</h2>
522824b26a 2009-08-28       drh: 
522824b26a 2009-08-28       drh: Every [./fileformat.wiki | control artifact] in a fossil repository
522824b26a 2009-08-28       drh: contains a "Z-card" bearing an MD5 checksum over the rest of the
522824b26a 2009-08-28       drh: artifact.  Any mismatch causes the control artifact to be ignored.
522824b26a 2009-08-28       drh: 
522824b26a 2009-08-28       drh: The [./delta_format.wiki | file delta format] includes a 32-bit
522824b26a 2009-08-28       drh: checksum of the target file.  Whenever a file is reconstructed from
522824b26a 2009-08-28       drh: a delta, that checksum is verified to make sure the reconstruction
522824b26a 2009-08-28       drh: was done correctly.
522824b26a 2009-08-28       drh: 
522824b26a 2009-08-28       drh: <h2>Reliability Versus Performance</h2>
522824b26a 2009-08-28       drh: 
522824b26a 2009-08-28       drh: Some version control systems make a big deal out of being "high performance"
522824b26a 2009-08-28       drh: or the "fastest version control system".  Fossil makes no such claims and has
522824b26a 2009-08-28       drh: no such ambition.  Indeed, profiling indicates that fossil bears a
522824b26a 2009-08-28       drh: substantial performance cost for
522824b26a 2009-08-28       drh: doing all of the checksumming and verification outlined above.
522824b26a 2009-08-28       drh: Fossil takes the philosophy of the
522824b26a 2009-08-28       drh: <a href="http://en.wikipedia.org/wiki/The_Tortoise_and_the_Hare">tortoise</a>:
522824b26a 2009-08-28       drh: reliability is more important than raw speed.  The developers of
522824b26a 2009-08-28       drh: fossil see no merit in getting the wrong answer quickly.
522824b26a 2009-08-28       drh: 
522824b26a 2009-08-28       drh: Fossil may not be the fastest versioning system, but it is "fast enough".
522824b26a 2009-08-28       drh: Fossil runs quickly enough to stay out of the developers way.
522824b26a 2009-08-28       drh: Most operations complete in under a second.