File Annotation
Not logged in
dbda8d6ce9 2007-07-21       drh: <html>
dbda8d6ce9 2007-07-21       drh: <head>
dbda8d6ce9 2007-07-21       drh: <title>Fossil Repository Integrity Self-Checks</title>
dbda8d6ce9 2007-07-21       drh: </head>
dbda8d6ce9 2007-07-21       drh: <body bgcolor="white">
dbda8d6ce9 2007-07-21       drh: <h1 align="center">
dbda8d6ce9 2007-07-21       drh: Fossil Repository Integrity Self-Checks
dbda8d6ce9 2007-07-21       drh: </h1>
dbda8d6ce9 2007-07-21       drh: 
dbda8d6ce9 2007-07-21       drh: <p>
dbda8d6ce9 2007-07-21       drh: Even though fossil is a relatively new project and still contains
dbda8d6ce9 2007-07-21       drh: many bugs, it is designed with features to give it a high level
dbda8d6ce9 2007-07-21       drh: of integrity so that you can have confidence that you will not
dbda8d6ce9 2007-07-21       drh: lose your files.  This note describes the defensive measures that
dbda8d6ce9 2007-07-21       drh: fossil uses to help prevent file loss due to bugs.
dbda8d6ce9 2007-07-21       drh: </p>
dbda8d6ce9 2007-07-21       drh: 
dbda8d6ce9 2007-07-21       drh: <h2>Atomic Check-ins With Rollback</h2>
dbda8d6ce9 2007-07-21       drh: 
dbda8d6ce9 2007-07-21       drh: <p>
dbda8d6ce9 2007-07-21       drh: The fossil repository is an
dbda8d6ce9 2007-07-21       drh: <a href="http://www.sqlite.org/">SQLite</a> database file.  SQLite
dbda8d6ce9 2007-07-21       drh: is very mature and stable and has been in wide-spread use for many
dbda8d6ce9 2007-07-21       drh: years, so we have little worries that it might cause repository
dbda8d6ce9 2007-07-21       drh: corruption.  SQLite
dbda8d6ce9 2007-07-21       drh: databases do not corrupt even if a program or system crash or power
dbda8d6ce9 2007-07-21       drh: failure occurs in the middle of the update.  If some kind of crash
dbda8d6ce9 2007-07-21       drh: does occur in the middle of a change, then all the changes are rolled
dbda8d6ce9 2007-07-21       drh: back the next time that the database is accessed.
dbda8d6ce9 2007-07-21       drh: </p>
dbda8d6ce9 2007-07-21       drh: 
dbda8d6ce9 2007-07-21       drh: <p>
dbda8d6ce9 2007-07-21       drh: A check-in operation in fossil makes many changes to the repository
dbda8d6ce9 2007-07-21       drh: database.  But all these changes happen within a single transaction.
dbda8d6ce9 2007-07-21       drh: If something goes wrong in the middle of the commit, then the transaction
dbda8d6ce9 2007-07-21       drh: is rolled back and the database is unchanged.
dbda8d6ce9 2007-07-21       drh: </p>
dbda8d6ce9 2007-07-21       drh: 
dbda8d6ce9 2007-07-21       drh: <h2>Verification Of Delta Encodings Prior To Transaction Commit</h2>
dbda8d6ce9 2007-07-21       drh: 
dbda8d6ce9 2007-07-21       drh: <p>
dbda8d6ce9 2007-07-21       drh: The content files that comprise the global state of a fossil respository
dbda8d6ce9 2007-07-21       drh: are stored in the repository as a tree.  The leaves of the tree are
dbda8d6ce9 2007-07-21       drh: stored as zlib-compressed BLOBs.  Interior nodes are deltas from their
dbda8d6ce9 2007-07-21       drh: decendents.  There is a lot of encoding going on here.  There is
dbda8d6ce9 2007-07-21       drh: zlib-compression which is relatively well-tested but still might
dbda8d6ce9 2007-07-21       drh: cause corruption if used improperly.  And there is the relatively
dbda8d6ce9 2007-07-21       drh: new delta-encoding mechanism designed expressly for fossil.  We want
dbda8d6ce9 2007-07-21       drh: to make sure that bugs in these encoding mechanisms do not lead to
dbda8d6ce9 2007-07-21       drh: loss of data.
dbda8d6ce9 2007-07-21       drh: </p>
dbda8d6ce9 2007-07-21       drh: 
dbda8d6ce9 2007-07-21       drh: <p>
dbda8d6ce9 2007-07-21       drh: To increase our confidence that everything in the repository is
dbda8d6ce9 2007-07-21       drh: recoverable, fossil makes sure it can extract an exact replicate
dbda8d6ce9 2007-07-21       drh: of every content file that it changes just prior to transaction
dbda8d6ce9 2007-07-21       drh: commit.  So during the course of check-in, many different files
dbda8d6ce9 2007-07-21       drh: in the repository might be modified.  Some files are simply
dbda8d6ce9 2007-07-21       drh: compressed.  Other files are delta encoded and then compressed.
dbda8d6ce9 2007-07-21       drh: While all this is going on, fossil makes a record of every file
dbda8d6ce9 2007-07-21       drh: that is encoded and the MD5 hash of the original content of that
dbda8d6ce9 2007-07-21       drh: file.  Then just before transaction commit, fossil re-extracts
dbda8d6ce9 2007-07-21       drh: the original content of all files that were written, computes
dbda8d6ce9 2007-07-21       drh: the MD5 checksum again, and verifies that the checksums match.
dbda8d6ce9 2007-07-21       drh: If anything does not match up, an error
dbda8d6ce9 2007-07-21       drh: message is printed and the transaction rolls back.
dbda8d6ce9 2007-07-21       drh: </p>
dbda8d6ce9 2007-07-21       drh: 
dbda8d6ce9 2007-07-21       drh: <p>
dbda8d6ce9 2007-07-21       drh: So, in other words, fossil always checks to make sure it can
dbda8d6ce9 2007-07-21       drh: re-extract a file before it commits a check-in of that file.
dbda8d6ce9 2007-07-21       drh: Hence bugs in fossil are unlikely to corrupt the repository in
dbda8d6ce9 2007-07-21       drh: a way that prevents us from extracting historical versions of
dbda8d6ce9 2007-07-21       drh: files.
dbda8d6ce9 2007-07-21       drh: </p>
dbda8d6ce9 2007-07-21       drh: 
dbda8d6ce9 2007-07-21       drh: <h2>Checksums on all files and versions</h2>
dbda8d6ce9 2007-07-21       drh: 
dbda8d6ce9 2007-07-21       drh: <p>
dbda8d6ce9 2007-07-21       drh: Repository records of type "file" (records that hold the content
dbda8d6ce9 2007-07-21       drh: of project files) contain a "cksum" property which records the
dbda8d6ce9 2007-07-21       drh: MD5 checksum of the content of that file.  So if something goes
dbda8d6ce9 2007-07-21       drh: wrong in the file extraction process we will at least know about
dbda8d6ce9 2007-07-21       drh: it.  This checksum is in addition to the digital signature that
dbda8d6ce9 2007-07-21       drh: is over the entire header and content of the record.
dbda8d6ce9 2007-07-21       drh: </p>
dbda8d6ce9 2007-07-21       drh: 
dbda8d6ce9 2007-07-21       drh: <p>
dbda8d6ce9 2007-07-21       drh: Repository records of type "version" contain a "cksum"
dbda8d6ce9 2007-07-21       drh: property that holds the MD5 checksum of the concatenation of
dbda8d6ce9 2007-07-21       drh: every file in the entire project.  During a check-in, after
dbda8d6ce9 2007-07-21       drh: fossil has inserted all changes into the repository, it goes
dbda8d6ce9 2007-07-21       drh: back and rereads every file out of the repository and recomputes
dbda8d6ce9 2007-07-21       drh: this global checksum based on the respository content.  It then
dbda8d6ce9 2007-07-21       drh: computes an MD5 checksum over the files on disk.  If these two
dbda8d6ce9 2007-07-21       drh: checksums do not match, the check-in files and rolls back.
dbda8d6ce9 2007-07-21       drh: Thus if a check-in transaction is successful, we have high
dbda8d6ce9 2007-07-21       drh: confidence that the content in the repository exactly matches
dbda8d6ce9 2007-07-21       drh: the content on disk.
dbda8d6ce9 2007-07-21       drh: </p>
dbda8d6ce9 2007-07-21       drh: 
dbda8d6ce9 2007-07-21       drh: <p>
dbda8d6ce9 2007-07-21       drh: Every project files is verified by three separate checksums.
dbda8d6ce9 2007-07-21       drh: There is an SHA256 checksum used as part of the digital signature
dbda8d6ce9 2007-07-21       drh: on the file.  There is an MD5 checksum on the content of each
dbda8d6ce9 2007-07-21       drh: individual file.  And there is a global MD5 checksum over the
dbda8d6ce9 2007-07-21       drh: entire project source tree.  If any of these cross-checks do not
dbda8d6ce9 2007-07-21       drh: match then the operation fails and an error is displayed.  Taken
dbda8d6ce9 2007-07-21       drh: together, these cross-checks give us high confidence that the
dbda8d6ce9 2007-07-21       drh: files you checked out are identical to the files you checked in.
dbda8d6ce9 2007-07-21       drh: </p>