103c397e4b 2007-08-28 aku: 103c397e4b 2007-08-28 aku: To perform CVS imports for fossil we need at least the ability to 103c397e4b 2007-08-28 aku: parse CVS files, i.e. RCS files, with slight differences. 103c397e4b 2007-08-28 aku: 103c397e4b 2007-08-28 aku: For the general architecture of the import facility we have two major 103c397e4b 2007-08-28 aku: paths to choose between. 103c397e4b 2007-08-28 aku: 103c397e4b 2007-08-28 aku: One is to use an external tool which processes a cvs repository and 103c397e4b 2007-08-28 aku: drives fossil through its CLI to insert the found changesets. 103c397e4b 2007-08-28 aku: 103c397e4b 2007-08-28 aku: The other is to integrate the whole facility into the fossil binary 103c397e4b 2007-08-28 aku: itself. 103c397e4b 2007-08-28 aku: 103c397e4b 2007-08-28 aku: I dislike the second choice. It may be faster, as the implementation 103c397e4b 2007-08-28 aku: can use all internal functionality of fossil to perform the import, 103c397e4b 2007-08-28 aku: however it will also bloat the binary with functionality not needed 103c397e4b 2007-08-28 aku: most of the time. Which becomes especially obvious if more importers 103c397e4b 2007-08-28 aku: are to be written, like for monotone, bazaar, mercurial, bitkeeper, 103c397e4b 2007-08-28 aku: git, SVN, Arc, etc. Keeping all this out of the core fossil binary is 103c397e4b 2007-08-28 aku: IMHO more beneficial in the long term, also from a maintenance point 103c397e4b 2007-08-28 aku: of view. The tools can evolve separately. Especially important for CVS 103c397e4b 2007-08-28 aku: as it will have to deal with lots of broken repositories, all 103c397e4b 2007-08-28 aku: different. 103c397e4b 2007-08-28 aku: 103c397e4b 2007-08-28 aku: However, nothing speaks against looking for common parts in all 103c397e4b 2007-08-28 aku: possible import tools, and having these in the fossil core, as a 103c397e4b 2007-08-28 aku: general backend all importer may use. Something like that has already 103c397e4b 2007-08-28 aku: been proposed: The deconstruct|reconstruct methods. For us, actually 103c397e4b 2007-08-28 aku: only reconstruct is important. Taking an unordered collection of files 103c397e4b 2007-08-28 aku: (data, and manifests) it generates a proper fossil repository. With 103c397e4b 2007-08-28 aku: that method implemented all import tools only have to generate the 103c397e4b 2007-08-28 aku: necessary collection and then leave the main work of filling the 103c397e4b 2007-08-28 aku: database to fossil itself. 103c397e4b 2007-08-28 aku: 103c397e4b 2007-08-28 aku: The disadvantage of this method is however that it will gobble up a 103c397e4b 2007-08-28 aku: lot of temporary space in the filesystem to hold all unique revisions 103c397e4b 2007-08-28 aku: of all files in their expanded form. 103c397e4b 2007-08-28 aku: 103c397e4b 2007-08-28 aku: It might be worthwhile to consider an extension of 'reconstruct' which 103c397e4b 2007-08-28 aku: is able to incrementally add a set of files to an exiting fossil 103c397e4b 2007-08-28 aku: repository already containing revisions. In that case the import tool 103c397e4b 2007-08-28 aku: can be changed to incrementally generate the collection for a 103c397e4b 2007-08-28 aku: particular revision, import it, and iterate over all revisions in the 103c397e4b 2007-08-28 aku: origin repository. This is of course also dependent on the origin 103c397e4b 2007-08-28 aku: repository itself, how good it supports such incremental export. 103c397e4b 2007-08-28 aku: 103c397e4b 2007-08-28 aku: This also leads to a possible method for performing the import using 103c397e4b 2007-08-28 aku: only existing functionality ('reconstruct' has not been implemented 103c397e4b 2007-08-28 aku: yet). Instead generating an unordered collection for each revision 103c397e4b 2007-08-28 aku: generate a properly setup workspace, simply commit it. This will 103c397e4b 2007-08-28 aku: require use of rm, add and update methods as well, to remove old and 103c397e4b 2007-08-28 aku: enter new files, and point the fossil repository to the correct parent 103c397e4b 2007-08-28 aku: revision from the new revision is derived. 103c397e4b 2007-08-28 aku: 103c397e4b 2007-08-28 aku: The relative efficiency (in time) of these incremental methods versus 103c397e4b 2007-08-28 aku: importing a complete collection of files encoding the entire origin 103c397e4b 2007-08-28 aku: repository however is not clear. 103c397e4b 2007-08-28 aku: 103c397e4b 2007-08-28 aku: ---------------------------------- 103c397e4b 2007-08-28 aku: 103c397e4b 2007-08-28 aku: reconstruct 103c397e4b 2007-08-28 aku: 103c397e4b 2007-08-28 aku: It is currently not clear to me when and how fossil does 103c397e4b 2007-08-28 aku: delta-compression. Does it use deltas, or reverse deltas, or a 103c397e4b 2007-08-28 aku: combination of both ? And when does it generate the deltas ? 103c397e4b 2007-08-28 aku: 103c397e4b 2007-08-28 aku: The trivial solution is that it uses deltas, i.e. the first revision 103c397e4b 2007-08-28 aku: of a file is stored without delta compression and all future versions 103c397e4b 2007-08-28 aku: are deltas from that, and the delta is generated when the new revision 103c397e4b 2007-08-28 aku: is committed. With the obvious disadvantage that newer revisions take 103c397e4b 2007-08-28 aku: more and more time to be decompressed as the set of deltas to apply 103c397e4b 2007-08-28 aku: grows. 103c397e4b 2007-08-28 aku: 103c397e4b 2007-08-28 aku: And during xfer it simply sends the deltas as is, making for easy 103c397e4b 2007-08-28 aku: integration on the remote side. 103c397e4b 2007-08-28 aku: 103c397e4b 2007-08-28 aku: However reconstruct, that method sees initially only an unordered 103c397e4b 2007-08-28 aku: collection of files, some of which may be manifests, others are data 103c397e4b 2007-08-28 aku: files, and if it imports them in a random order it might find that 103c397e4b 2007-08-28 aku: file X, which was imported first and therefore has no delta 103c397e4b 2007-08-28 aku: compression, is actually somewhere in the middle of a line of 103c397e4b 2007-08-28 aku: revisions, and should be delta-compressed, and then it has to find out 103c397e4b 2007-08-28 aku: the predecessor and do the compression, etc. 103c397e4b 2007-08-28 aku: 103c397e4b 2007-08-28 aku: So depending on how the internal logic of delta-compression is done 103c397e4b 2007-08-28 aku: reconstruct might need more logic to help the lower level achieve good 103c397e4b 2007-08-28 aku: compression. 103c397e4b 2007-08-28 aku: 103c397e4b 2007-08-28 aku: Like, in a first pass determine which files are manifests, and read 103c397e4b 2007-08-28 aku: enough of them to determine their parent/child structure, and in a 103c397e4b 2007-08-28 aku: second pass actually imports them, in topological order, with all 103c397e4b 2007-08-28 aku: relevant non-manifest files for a manifest imported as that time 103c397e4b 2007-08-28 aku: too. With that the underlying engine would see the files basically in 103c397e4b 2007-08-28 aku: the same order as generated by a regular series of commits. 103c397e4b 2007-08-28 aku: 103c397e4b 2007-08-28 aku: Problems for reconstruct: Files referenced, but not present, and, 103c397e4b 2007-08-28 aku: conversely, files present, but not referenced. This can done as part 103c397e4b 2007-08-28 aku: of the second pass, aborting when a missing file is encountered, with 103c397e4b 2007-08-28 aku: (un)marking of used files, and at the end we know the unused 103c397e4b 2007-08-28 aku: files. Could also be a separate pass between first and second.