Diff
Not logged in

Differences From:

File ci_fossil.txt part of check-in [103c397e4b] - Updated my work list, added first notes about 'cvs import' functionality. by aku on 2007-08-28 03:34:12. [view]

To:

File ci_fossil.txt part of check-in [10062df2fa] - Reworked my notes regarding 'reconstruct' based on my reading of content.c, checkin.c, and manifest.c by aku on 2007-08-28 05:01:23. [view]

@@ -59,42 +59,69 @@
 ----------------------------------
 
 reconstruct
 
-It is currently not clear to me when and how fossil does
-delta-compression. Does it use deltas, or reverse deltas, or a
-combination of both ? And when does it generate the deltas ?
+The core logic for handling content is in the file "content.c", in
+particular the functions 'content_put' and 'content_deltify'. One of
+the main users of these functions is in the file "checkin.c", see the
+function 'commit_cmd'.
+
+The logic is clear. The new modified files are simply stored without
+delta-compression, using 'content_put'. And should fosssil have an id
+for the _previous_ revision of the committed file it uses
+'content_deltify' to convert the already stored data for that revision
+into a delta with the just stored new revision as origin.
+
+In other words, fossil produces reverse deltas, with leaf revisions
+stored just zip-compressed (plain) and older revisions using both zip-
+and delta-compression.
+
+Of note is that the underlying logic in 'content_deltify' gives up on
+delta compression if the involved files are either not large enough,
+or if the achieved compression factor was not high enough. In that
+case the old revision of the file is left plain.
+
+The scheme can thus be called a 'truncated reverse delta'.
+
+The manifest is created and committed after the modified files. It
+uses the same logic as for the regular files. The new leaf is stored
+plain, and storage of the parent manifest is modified to be a delta
+with the current as origin.
+
+Further note that for a checkin of a merge result oonly the primary
+parent is modified in that way. The secondary parent, the one merged
+into the current revision is not touched. I.e. from the storage layer
+point of view this revision is still a leaf and the data is kept
+stored plain, not delta-compressed.
+
+
+
+Now the "reconstruct" can be done like so:
+
+- Scan the files in the indicated directory, and look for a manifest.
+
+- When the manifest has been found parse its contents and follow the
+  chain of parent links to locate the root manifest (no parent).
 
-The trivial solution is that it uses deltas, i.e. the first revision
-of a file is stored without delta compression and all future versions
-are deltas from that, and the delta is generated when the new revision
-is committed. With the obvious disadvantage that newer revisions take
-more and more time to be decompressed as the set of deltas to apply
-grows.
+- Import the files referenced by the root manifest, then the manifest
+  itself. This can be done using a modified form of the 'commit_cmd'
+  which does not have to construct a manifest on its own from vfile,
+  vmerge, etc.
 
-And during xfer it simply sends the deltas as is, making for easy
-integration on the remote side.
+- After that recursively apply the import of the previous step to the
+  children of the root, and so on.
 
-However reconstruct, that method sees initially only an unordered
-collection of files, some of which may be manifests, others are data
-files, and if it imports them in a random order it might find that
-file X, which was imported first and therefore has no delta
-compression, is actually somewhere in the middle of a line of
-revisions, and should be delta-compressed, and then it has to find out
-the predecessor and do the compression, etc.
+For an incremental "reconstruct" the collection of files would not be
+a single tree with a root, but a forest, and the roots to look for are
+not manifests without parent, but with a parent which is already
+present in the repository. After one such root has been found and
+processed the unprocessed files have to be searched further for more
+roots, and only if no such are found anymore will the remaining files
+be considered as superfluous.
 
-So depending on how the internal logic of delta-compression is done
-reconstruct might need more logic to help the lower level achieve good
-compression.
+We can use the functions in "manifest.c" for the parsing and following
+the parental chain.
 
-Like, in a first pass determine which files are manifests, and read
-enough of them to determine their parent/child structure, and in a
-second pass actually imports them, in topological order, with all
-relevant non-manifest files for a manifest imported as that time
-too. With that the underlying engine would see the files basically in
-the same order as generated by a regular series of commits.
-
-Problems for reconstruct: Files referenced, but not present, and,
-conversely, files present, but not referenced. This can done as part
-of the second pass, aborting when a missing file is encountered, with
-(un)marking of used files, and at the end we know the unused
-files. Could also be a separate pass between first and second.
+Hm. But we have no direct child information. So the above algorithm
+has to be modified, we have to scan all manifests before we start
+importing, and we have to create a reverse index, from manifest to
+children so that we can perform the import from root to leaves.