Fossil: File Annotation

103c397e4b 2007-08-28       aku: 
103c397e4b 2007-08-28       aku: To perform CVS imports for fossil we need at least the ability to
103c397e4b 2007-08-28       aku: parse CVS files, i.e. RCS files, with slight differences.
103c397e4b 2007-08-28       aku: 
103c397e4b 2007-08-28       aku: For the general architecture of the import facility we have two major
103c397e4b 2007-08-28       aku: paths to choose between.
103c397e4b 2007-08-28       aku: 
103c397e4b 2007-08-28       aku: One is to use an external tool which processes a cvs repository and
103c397e4b 2007-08-28       aku: drives fossil through its CLI to insert the found changesets.
103c397e4b 2007-08-28       aku: 
103c397e4b 2007-08-28       aku: The other is to integrate the whole facility into the fossil binary
103c397e4b 2007-08-28       aku: itself.
103c397e4b 2007-08-28       aku: 
103c397e4b 2007-08-28       aku: I dislike the second choice. It may be faster, as the implementation
103c397e4b 2007-08-28       aku: can use all internal functionality of fossil to perform the import,
103c397e4b 2007-08-28       aku: however it will also bloat the binary with functionality not needed
103c397e4b 2007-08-28       aku: most of the time. Which becomes especially obvious if more importers
103c397e4b 2007-08-28       aku: are to be written, like for monotone, bazaar, mercurial, bitkeeper,
103c397e4b 2007-08-28       aku: git, SVN, Arc, etc. Keeping all this out of the core fossil binary is
103c397e4b 2007-08-28       aku: IMHO more beneficial in the long term, also from a maintenance point
103c397e4b 2007-08-28       aku: of view. The tools can evolve separately. Especially important for CVS
103c397e4b 2007-08-28       aku: as it will have to deal with lots of broken repositories, all
103c397e4b 2007-08-28       aku: different.
103c397e4b 2007-08-28       aku: 
103c397e4b 2007-08-28       aku: However, nothing speaks against looking for common parts in all
103c397e4b 2007-08-28       aku: possible import tools, and having these in the fossil core, as a
103c397e4b 2007-08-28       aku: general backend all importer may use. Something like that has already
103c397e4b 2007-08-28       aku: been proposed: The deconstruct|reconstruct methods. For us, actually
103c397e4b 2007-08-28       aku: only reconstruct is important. Taking an unordered collection of files
103c397e4b 2007-08-28       aku: (data, and manifests) it generates a proper fossil repository.  With
103c397e4b 2007-08-28       aku: that method implemented all import tools only have to generate the
103c397e4b 2007-08-28       aku: necessary collection and then leave the main work of filling the
103c397e4b 2007-08-28       aku: database to fossil itself.
103c397e4b 2007-08-28       aku: 
103c397e4b 2007-08-28       aku: The disadvantage of this method is however that it will gobble up a
103c397e4b 2007-08-28       aku: lot of temporary space in the filesystem to hold all unique revisions
103c397e4b 2007-08-28       aku: of all files in their expanded form.
103c397e4b 2007-08-28       aku: 
103c397e4b 2007-08-28       aku: It might be worthwhile to consider an extension of 'reconstruct' which
103c397e4b 2007-08-28       aku: is able to incrementally add a set of files to an exiting fossil
103c397e4b 2007-08-28       aku: repository already containing revisions. In that case the import tool
103c397e4b 2007-08-28       aku: can be changed to incrementally generate the collection for a
103c397e4b 2007-08-28       aku: particular revision, import it, and iterate over all revisions in the
103c397e4b 2007-08-28       aku: origin repository. This is of course also dependent on the origin
103c397e4b 2007-08-28       aku: repository itself, how good it supports such incremental export.
103c397e4b 2007-08-28       aku: 
103c397e4b 2007-08-28       aku: This also leads to a possible method for performing the import using
103c397e4b 2007-08-28       aku: only existing functionality ('reconstruct' has not been implemented
103c397e4b 2007-08-28       aku: yet). Instead generating an unordered collection for each revision
103c397e4b 2007-08-28       aku: generate a properly setup workspace, simply commit it. This will
103c397e4b 2007-08-28       aku: require use of rm, add and update methods as well, to remove old and
103c397e4b 2007-08-28       aku: enter new files, and point the fossil repository to the correct parent
103c397e4b 2007-08-28       aku: revision from the new revision is derived.
103c397e4b 2007-08-28       aku: 
103c397e4b 2007-08-28       aku: The relative efficiency (in time) of these incremental methods versus
103c397e4b 2007-08-28       aku: importing a complete collection of files encoding the entire origin
103c397e4b 2007-08-28       aku: repository however is not clear.
103c397e4b 2007-08-28       aku: 
103c397e4b 2007-08-28       aku: ----------------------------------
103c397e4b 2007-08-28       aku: 
103c397e4b 2007-08-28       aku: reconstruct
103c397e4b 2007-08-28       aku: 
10062df2fa 2007-08-28       aku: The core logic for handling content is in the file "content.c", in
10062df2fa 2007-08-28       aku: particular the functions 'content_put' and 'content_deltify'. One of
10062df2fa 2007-08-28       aku: the main users of these functions is in the file "checkin.c", see the
10062df2fa 2007-08-28       aku: function 'commit_cmd'.
10062df2fa 2007-08-28       aku: 
10062df2fa 2007-08-28       aku: The logic is clear. The new modified files are simply stored without
10062df2fa 2007-08-28       aku: delta-compression, using 'content_put'. And should fosssil have an id
10062df2fa 2007-08-28       aku: for the _previous_ revision of the committed file it uses
10062df2fa 2007-08-28       aku: 'content_deltify' to convert the already stored data for that revision
10062df2fa 2007-08-28       aku: into a delta with the just stored new revision as origin.
10062df2fa 2007-08-28       aku: 
10062df2fa 2007-08-28       aku: In other words, fossil produces reverse deltas, with leaf revisions
10062df2fa 2007-08-28       aku: stored just zip-compressed (plain) and older revisions using both zip-
10062df2fa 2007-08-28       aku: and delta-compression.
10062df2fa 2007-08-28       aku: 
10062df2fa 2007-08-28       aku: Of note is that the underlying logic in 'content_deltify' gives up on
10062df2fa 2007-08-28       aku: delta compression if the involved files are either not large enough,
10062df2fa 2007-08-28       aku: or if the achieved compression factor was not high enough. In that
10062df2fa 2007-08-28       aku: case the old revision of the file is left plain.
10062df2fa 2007-08-28       aku: 
10062df2fa 2007-08-28       aku: The scheme can thus be called a 'truncated reverse delta'.
10062df2fa 2007-08-28       aku: 
10062df2fa 2007-08-28       aku: The manifest is created and committed after the modified files. It
10062df2fa 2007-08-28       aku: uses the same logic as for the regular files. The new leaf is stored
10062df2fa 2007-08-28       aku: plain, and storage of the parent manifest is modified to be a delta
10062df2fa 2007-08-28       aku: with the current as origin.
10062df2fa 2007-08-28       aku: 
10062df2fa 2007-08-28       aku: Further note that for a checkin of a merge result oonly the primary
10062df2fa 2007-08-28       aku: parent is modified in that way. The secondary parent, the one merged
10062df2fa 2007-08-28       aku: into the current revision is not touched. I.e. from the storage layer
10062df2fa 2007-08-28       aku: point of view this revision is still a leaf and the data is kept
10062df2fa 2007-08-28       aku: stored plain, not delta-compressed.
10062df2fa 2007-08-28       aku: 
10062df2fa 2007-08-28       aku: 
10062df2fa 2007-08-28       aku: 
10062df2fa 2007-08-28       aku: Now the "reconstruct" can be done like so:
10062df2fa 2007-08-28       aku: 
10062df2fa 2007-08-28       aku: - Scan the files in the indicated directory, and look for a manifest.
10062df2fa 2007-08-28       aku: 
10062df2fa 2007-08-28       aku: - When the manifest has been found parse its contents and follow the
10062df2fa 2007-08-28       aku:   chain of parent links to locate the root manifest (no parent).
103c397e4b 2007-08-28       aku: 
10062df2fa 2007-08-28       aku: - Import the files referenced by the root manifest, then the manifest
10062df2fa 2007-08-28       aku:   itself. This can be done using a modified form of the 'commit_cmd'
10062df2fa 2007-08-28       aku:   which does not have to construct a manifest on its own from vfile,
10062df2fa 2007-08-28       aku:   vmerge, etc.
103c397e4b 2007-08-28       aku: 
10062df2fa 2007-08-28       aku: - After that recursively apply the import of the previous step to the
10062df2fa 2007-08-28       aku:   children of the root, and so on.
103c397e4b 2007-08-28       aku: 
10062df2fa 2007-08-28       aku: For an incremental "reconstruct" the collection of files would not be
10062df2fa 2007-08-28       aku: a single tree with a root, but a forest, and the roots to look for are
10062df2fa 2007-08-28       aku: not manifests without parent, but with a parent which is already
10062df2fa 2007-08-28       aku: present in the repository. After one such root has been found and
10062df2fa 2007-08-28       aku: processed the unprocessed files have to be searched further for more
10062df2fa 2007-08-28       aku: roots, and only if no such are found anymore will the remaining files
10062df2fa 2007-08-28       aku: be considered as superfluous.
103c397e4b 2007-08-28       aku: 
10062df2fa 2007-08-28       aku: We can use the functions in "manifest.c" for the parsing and following
10062df2fa 2007-08-28       aku: the parental chain.
103c397e4b 2007-08-28       aku: 
10062df2fa 2007-08-28       aku: Hm. But we have no direct child information. So the above algorithm
10062df2fa 2007-08-28       aku: has to be modified, we have to scan all manifests before we start
10062df2fa 2007-08-28       aku: importing, and we have to create a reverse index, from manifest to
10062df2fa 2007-08-28       aku: children so that we can perform the import from root to leaves.