Overview
SHA1 Hash: | 10062df2fa9b02069f0d3c90dd1d990717a921a9 |
---|---|
Date: | 2007-08-28 05:01:23 |
User: | aku |
Comment: | Reworked my notes regarding 'reconstruct' based on my reading of content.c, checkin.c, and manifest.c |
Timelines: | ancestors | descendants | both | trunk |
Other Links: | files | ZIP archive | manifest |
Tags And Properties
- branch=trunk inherited from [a28c83647d]
- sym-trunk inherited from [a28c83647d]
Changes
[hide diffs]Modified ci_fossil.txt from [ffe422bb7a] to [2064b7fe7f].
@@ -58,43 +58,70 @@ ---------------------------------- reconstruct -It is currently not clear to me when and how fossil does -delta-compression. Does it use deltas, or reverse deltas, or a -combination of both ? And when does it generate the deltas ? +The core logic for handling content is in the file "content.c", in +particular the functions 'content_put' and 'content_deltify'. One of +the main users of these functions is in the file "checkin.c", see the +function 'commit_cmd'. + +The logic is clear. The new modified files are simply stored without +delta-compression, using 'content_put'. And should fosssil have an id +for the _previous_ revision of the committed file it uses +'content_deltify' to convert the already stored data for that revision +into a delta with the just stored new revision as origin. + +In other words, fossil produces reverse deltas, with leaf revisions +stored just zip-compressed (plain) and older revisions using both zip- +and delta-compression. + +Of note is that the underlying logic in 'content_deltify' gives up on +delta compression if the involved files are either not large enough, +or if the achieved compression factor was not high enough. In that +case the old revision of the file is left plain. + +The scheme can thus be called a 'truncated reverse delta'. + +The manifest is created and committed after the modified files. It +uses the same logic as for the regular files. The new leaf is stored +plain, and storage of the parent manifest is modified to be a delta +with the current as origin. + +Further note that for a checkin of a merge result oonly the primary +parent is modified in that way. The secondary parent, the one merged +into the current revision is not touched. I.e. from the storage layer +point of view this revision is still a leaf and the data is kept +stored plain, not delta-compressed. + + + +Now the "reconstruct" can be done like so: + +- Scan the files in the indicated directory, and look for a manifest. + +- When the manifest has been found parse its contents and follow the + chain of parent links to locate the root manifest (no parent). -The trivial solution is that it uses deltas, i.e. the first revision -of a file is stored without delta compression and all future versions -are deltas from that, and the delta is generated when the new revision -is committed. With the obvious disadvantage that newer revisions take -more and more time to be decompressed as the set of deltas to apply -grows. +- Import the files referenced by the root manifest, then the manifest + itself. This can be done using a modified form of the 'commit_cmd' + which does not have to construct a manifest on its own from vfile, + vmerge, etc. -And during xfer it simply sends the deltas as is, making for easy -integration on the remote side. +- After that recursively apply the import of the previous step to the + children of the root, and so on. -However reconstruct, that method sees initially only an unordered -collection of files, some of which may be manifests, others are data -files, and if it imports them in a random order it might find that -file X, which was imported first and therefore has no delta -compression, is actually somewhere in the middle of a line of -revisions, and should be delta-compressed, and then it has to find out -the predecessor and do the compression, etc. +For an incremental "reconstruct" the collection of files would not be +a single tree with a root, but a forest, and the roots to look for are +not manifests without parent, but with a parent which is already +present in the repository. After one such root has been found and +processed the unprocessed files have to be searched further for more +roots, and only if no such are found anymore will the remaining files +be considered as superfluous. -So depending on how the internal logic of delta-compression is done -reconstruct might need more logic to help the lower level achieve good -compression. +We can use the functions in "manifest.c" for the parsing and following +the parental chain. -Like, in a first pass determine which files are manifests, and read -enough of them to determine their parent/child structure, and in a -second pass actually imports them, in topological order, with all -relevant non-manifest files for a manifest imported as that time -too. With that the underlying engine would see the files basically in -the same order as generated by a regular series of commits. - -Problems for reconstruct: Files referenced, but not present, and, -conversely, files present, but not referenced. This can done as part -of the second pass, aborting when a missing file is encountered, with -(un)marking of used files, and at the end we know the unused -files. Could also be a separate pass between first and second. +Hm. But we have no direct child information. So the above algorithm +has to be modified, we have to scan all manifests before we start +importing, and we have to create a reverse index, from manifest to +children so that we can perform the import from root to leaves.