Fossil: File Annotation

103c397e4b 2007-08-28       aku: 
103c397e4b 2007-08-28       aku: To perform CVS imports for fossil we need at least the ability to
103c397e4b 2007-08-28       aku: parse CVS files, i.e. RCS files, with slight differences.
103c397e4b 2007-08-28       aku: 
103c397e4b 2007-08-28       aku: For the general architecture of the import facility we have two major
103c397e4b 2007-08-28       aku: paths to choose between.
103c397e4b 2007-08-28       aku: 
103c397e4b 2007-08-28       aku: One is to use an external tool which processes a cvs repository and
103c397e4b 2007-08-28       aku: drives fossil through its CLI to insert the found changesets.
103c397e4b 2007-08-28       aku: 
103c397e4b 2007-08-28       aku: The other is to integrate the whole facility into the fossil binary
103c397e4b 2007-08-28       aku: itself.
103c397e4b 2007-08-28       aku: 
103c397e4b 2007-08-28       aku: I dislike the second choice. It may be faster, as the implementation
103c397e4b 2007-08-28       aku: can use all internal functionality of fossil to perform the import,
103c397e4b 2007-08-28       aku: however it will also bloat the binary with functionality not needed
103c397e4b 2007-08-28       aku: most of the time. Which becomes especially obvious if more importers
103c397e4b 2007-08-28       aku: are to be written, like for monotone, bazaar, mercurial, bitkeeper,
103c397e4b 2007-08-28       aku: git, SVN, Arc, etc. Keeping all this out of the core fossil binary is
103c397e4b 2007-08-28       aku: IMHO more beneficial in the long term, also from a maintenance point
103c397e4b 2007-08-28       aku: of view. The tools can evolve separately. Especially important for CVS
103c397e4b 2007-08-28       aku: as it will have to deal with lots of broken repositories, all
103c397e4b 2007-08-28       aku: different.
103c397e4b 2007-08-28       aku: 
103c397e4b 2007-08-28       aku: However, nothing speaks against looking for common parts in all
103c397e4b 2007-08-28       aku: possible import tools, and having these in the fossil core, as a
103c397e4b 2007-08-28       aku: general backend all importer may use. Something like that has already
103c397e4b 2007-08-28       aku: been proposed: The deconstruct|reconstruct methods. For us, actually
103c397e4b 2007-08-28       aku: only reconstruct is important. Taking an unordered collection of files
103c397e4b 2007-08-28       aku: (data, and manifests) it generates a proper fossil repository.  With
103c397e4b 2007-08-28       aku: that method implemented all import tools only have to generate the
103c397e4b 2007-08-28       aku: necessary collection and then leave the main work of filling the
103c397e4b 2007-08-28       aku: database to fossil itself.
103c397e4b 2007-08-28       aku: 
103c397e4b 2007-08-28       aku: The disadvantage of this method is however that it will gobble up a
103c397e4b 2007-08-28       aku: lot of temporary space in the filesystem to hold all unique revisions
103c397e4b 2007-08-28       aku: of all files in their expanded form.
103c397e4b 2007-08-28       aku: 
103c397e4b 2007-08-28       aku: It might be worthwhile to consider an extension of 'reconstruct' which
103c397e4b 2007-08-28       aku: is able to incrementally add a set of files to an exiting fossil
103c397e4b 2007-08-28       aku: repository already containing revisions. In that case the import tool
103c397e4b 2007-08-28       aku: can be changed to incrementally generate the collection for a
103c397e4b 2007-08-28       aku: particular revision, import it, and iterate over all revisions in the
103c397e4b 2007-08-28       aku: origin repository. This is of course also dependent on the origin
103c397e4b 2007-08-28       aku: repository itself, how good it supports such incremental export.
103c397e4b 2007-08-28       aku: 
103c397e4b 2007-08-28       aku: This also leads to a possible method for performing the import using
103c397e4b 2007-08-28       aku: only existing functionality ('reconstruct' has not been implemented
103c397e4b 2007-08-28       aku: yet). Instead generating an unordered collection for each revision
103c397e4b 2007-08-28       aku: generate a properly setup workspace, simply commit it. This will
103c397e4b 2007-08-28       aku: require use of rm, add and update methods as well, to remove old and
103c397e4b 2007-08-28       aku: enter new files, and point the fossil repository to the correct parent
103c397e4b 2007-08-28       aku: revision from the new revision is derived.
103c397e4b 2007-08-28       aku: 
103c397e4b 2007-08-28       aku: The relative efficiency (in time) of these incremental methods versus
103c397e4b 2007-08-28       aku: importing a complete collection of files encoding the entire origin
103c397e4b 2007-08-28       aku: repository however is not clear.
103c397e4b 2007-08-28       aku: 
103c397e4b 2007-08-28       aku: ----------------------------------
103c397e4b 2007-08-28       aku: 
103c397e4b 2007-08-28       aku: reconstruct
103c397e4b 2007-08-28       aku: 
103c397e4b 2007-08-28       aku: It is currently not clear to me when and how fossil does
103c397e4b 2007-08-28       aku: delta-compression. Does it use deltas, or reverse deltas, or a
103c397e4b 2007-08-28       aku: combination of both ? And when does it generate the deltas ?
103c397e4b 2007-08-28       aku: 
103c397e4b 2007-08-28       aku: The trivial solution is that it uses deltas, i.e. the first revision
103c397e4b 2007-08-28       aku: of a file is stored without delta compression and all future versions
103c397e4b 2007-08-28       aku: are deltas from that, and the delta is generated when the new revision
103c397e4b 2007-08-28       aku: is committed. With the obvious disadvantage that newer revisions take
103c397e4b 2007-08-28       aku: more and more time to be decompressed as the set of deltas to apply
103c397e4b 2007-08-28       aku: grows.
103c397e4b 2007-08-28       aku: 
103c397e4b 2007-08-28       aku: And during xfer it simply sends the deltas as is, making for easy
103c397e4b 2007-08-28       aku: integration on the remote side.
103c397e4b 2007-08-28       aku: 
103c397e4b 2007-08-28       aku: However reconstruct, that method sees initially only an unordered
103c397e4b 2007-08-28       aku: collection of files, some of which may be manifests, others are data
103c397e4b 2007-08-28       aku: files, and if it imports them in a random order it might find that
103c397e4b 2007-08-28       aku: file X, which was imported first and therefore has no delta
103c397e4b 2007-08-28       aku: compression, is actually somewhere in the middle of a line of
103c397e4b 2007-08-28       aku: revisions, and should be delta-compressed, and then it has to find out
103c397e4b 2007-08-28       aku: the predecessor and do the compression, etc.
103c397e4b 2007-08-28       aku: 
103c397e4b 2007-08-28       aku: So depending on how the internal logic of delta-compression is done
103c397e4b 2007-08-28       aku: reconstruct might need more logic to help the lower level achieve good
103c397e4b 2007-08-28       aku: compression.
103c397e4b 2007-08-28       aku: 
103c397e4b 2007-08-28       aku: Like, in a first pass determine which files are manifests, and read
103c397e4b 2007-08-28       aku: enough of them to determine their parent/child structure, and in a
103c397e4b 2007-08-28       aku: second pass actually imports them, in topological order, with all
103c397e4b 2007-08-28       aku: relevant non-manifest files for a manifest imported as that time
103c397e4b 2007-08-28       aku: too. With that the underlying engine would see the files basically in
103c397e4b 2007-08-28       aku: the same order as generated by a regular series of commits.
103c397e4b 2007-08-28       aku: 
103c397e4b 2007-08-28       aku: Problems for reconstruct: Files referenced, but not present, and,
103c397e4b 2007-08-28       aku: conversely, files present, but not referenced. This can done as part
103c397e4b 2007-08-28       aku: of the second pass, aborting when a missing file is encountered, with
103c397e4b 2007-08-28       aku: (un)marking of used files, and at the end we know the unused
103c397e4b 2007-08-28       aku: files. Could also be a separate pass between first and second.