103c397e4b 2007-08-28 aku: 103c397e4b 2007-08-28 aku: To perform CVS imports for fossil we need at least the ability to 103c397e4b 2007-08-28 aku: parse CVS files, i.e. RCS files, with slight differences. 103c397e4b 2007-08-28 aku: 103c397e4b 2007-08-28 aku: For the general architecture of the import facility we have two major 103c397e4b 2007-08-28 aku: paths to choose between. 103c397e4b 2007-08-28 aku: 103c397e4b 2007-08-28 aku: One is to use an external tool which processes a cvs repository and 103c397e4b 2007-08-28 aku: drives fossil through its CLI to insert the found changesets. 103c397e4b 2007-08-28 aku: 103c397e4b 2007-08-28 aku: The other is to integrate the whole facility into the fossil binary 103c397e4b 2007-08-28 aku: itself. 103c397e4b 2007-08-28 aku: 103c397e4b 2007-08-28 aku: I dislike the second choice. It may be faster, as the implementation 103c397e4b 2007-08-28 aku: can use all internal functionality of fossil to perform the import, 103c397e4b 2007-08-28 aku: however it will also bloat the binary with functionality not needed 103c397e4b 2007-08-28 aku: most of the time. Which becomes especially obvious if more importers 103c397e4b 2007-08-28 aku: are to be written, like for monotone, bazaar, mercurial, bitkeeper, 103c397e4b 2007-08-28 aku: git, SVN, Arc, etc. Keeping all this out of the core fossil binary is 103c397e4b 2007-08-28 aku: IMHO more beneficial in the long term, also from a maintenance point 103c397e4b 2007-08-28 aku: of view. The tools can evolve separately. Especially important for CVS 103c397e4b 2007-08-28 aku: as it will have to deal with lots of broken repositories, all 103c397e4b 2007-08-28 aku: different. 103c397e4b 2007-08-28 aku: 103c397e4b 2007-08-28 aku: However, nothing speaks against looking for common parts in all 103c397e4b 2007-08-28 aku: possible import tools, and having these in the fossil core, as a 103c397e4b 2007-08-28 aku: general backend all importer may use. Something like that has already 103c397e4b 2007-08-28 aku: been proposed: The deconstruct|reconstruct methods. For us, actually 103c397e4b 2007-08-28 aku: only reconstruct is important. Taking an unordered collection of files 103c397e4b 2007-08-28 aku: (data, and manifests) it generates a proper fossil repository. With 103c397e4b 2007-08-28 aku: that method implemented all import tools only have to generate the 103c397e4b 2007-08-28 aku: necessary collection and then leave the main work of filling the 103c397e4b 2007-08-28 aku: database to fossil itself. 103c397e4b 2007-08-28 aku: 103c397e4b 2007-08-28 aku: The disadvantage of this method is however that it will gobble up a 103c397e4b 2007-08-28 aku: lot of temporary space in the filesystem to hold all unique revisions 103c397e4b 2007-08-28 aku: of all files in their expanded form. 103c397e4b 2007-08-28 aku: 103c397e4b 2007-08-28 aku: It might be worthwhile to consider an extension of 'reconstruct' which 9d704470c3 2008-12-13 kejoki: is able to incrementally add a set of files to an existing fossil 103c397e4b 2007-08-28 aku: repository already containing revisions. In that case the import tool 103c397e4b 2007-08-28 aku: can be changed to incrementally generate the collection for a 103c397e4b 2007-08-28 aku: particular revision, import it, and iterate over all revisions in the 103c397e4b 2007-08-28 aku: origin repository. This is of course also dependent on the origin 9d704470c3 2008-12-13 kejoki: repository itself, how well it supports such incremental export. 103c397e4b 2007-08-28 aku: 103c397e4b 2007-08-28 aku: This also leads to a possible method for performing the import using 103c397e4b 2007-08-28 aku: only existing functionality ('reconstruct' has not been implemented 103c397e4b 2007-08-28 aku: yet). Instead generating an unordered collection for each revision 103c397e4b 2007-08-28 aku: generate a properly setup workspace, simply commit it. This will 103c397e4b 2007-08-28 aku: require use of rm, add and update methods as well, to remove old and 103c397e4b 2007-08-28 aku: enter new files, and point the fossil repository to the correct parent 103c397e4b 2007-08-28 aku: revision from the new revision is derived. 103c397e4b 2007-08-28 aku: 103c397e4b 2007-08-28 aku: The relative efficiency (in time) of these incremental methods versus 103c397e4b 2007-08-28 aku: importing a complete collection of files encoding the entire origin 103c397e4b 2007-08-28 aku: repository however is not clear. 103c397e4b 2007-08-28 aku: 103c397e4b 2007-08-28 aku: ---------------------------------- 103c397e4b 2007-08-28 aku: 103c397e4b 2007-08-28 aku: reconstruct 103c397e4b 2007-08-28 aku: 10062df2fa 2007-08-28 aku: The core logic for handling content is in the file "content.c", in 10062df2fa 2007-08-28 aku: particular the functions 'content_put' and 'content_deltify'. One of 10062df2fa 2007-08-28 aku: the main users of these functions is in the file "checkin.c", see the 10062df2fa 2007-08-28 aku: function 'commit_cmd'. 10062df2fa 2007-08-28 aku: 10062df2fa 2007-08-28 aku: The logic is clear. The new modified files are simply stored without 10062df2fa 2007-08-28 aku: delta-compression, using 'content_put'. And should fosssil have an id 10062df2fa 2007-08-28 aku: for the _previous_ revision of the committed file it uses 10062df2fa 2007-08-28 aku: 'content_deltify' to convert the already stored data for that revision 10062df2fa 2007-08-28 aku: into a delta with the just stored new revision as origin. 10062df2fa 2007-08-28 aku: 10062df2fa 2007-08-28 aku: In other words, fossil produces reverse deltas, with leaf revisions 10062df2fa 2007-08-28 aku: stored just zip-compressed (plain) and older revisions using both zip- 10062df2fa 2007-08-28 aku: and delta-compression. 10062df2fa 2007-08-28 aku: 10062df2fa 2007-08-28 aku: Of note is that the underlying logic in 'content_deltify' gives up on 10062df2fa 2007-08-28 aku: delta compression if the involved files are either not large enough, 10062df2fa 2007-08-28 aku: or if the achieved compression factor was not high enough. In that 10062df2fa 2007-08-28 aku: case the old revision of the file is left plain. 10062df2fa 2007-08-28 aku: 10062df2fa 2007-08-28 aku: The scheme can thus be called a 'truncated reverse delta'. 10062df2fa 2007-08-28 aku: 10062df2fa 2007-08-28 aku: The manifest is created and committed after the modified files. It 10062df2fa 2007-08-28 aku: uses the same logic as for the regular files. The new leaf is stored 10062df2fa 2007-08-28 aku: plain, and storage of the parent manifest is modified to be a delta 10062df2fa 2007-08-28 aku: with the current as origin. 10062df2fa 2007-08-28 aku: 10062df2fa 2007-08-28 aku: Further note that for a checkin of a merge result oonly the primary 10062df2fa 2007-08-28 aku: parent is modified in that way. The secondary parent, the one merged 10062df2fa 2007-08-28 aku: into the current revision is not touched. I.e. from the storage layer 10062df2fa 2007-08-28 aku: point of view this revision is still a leaf and the data is kept 10062df2fa 2007-08-28 aku: stored plain, not delta-compressed. 10062df2fa 2007-08-28 aku: 10062df2fa 2007-08-28 aku: 10062df2fa 2007-08-28 aku: 10062df2fa 2007-08-28 aku: Now the "reconstruct" can be done like so: 10062df2fa 2007-08-28 aku: 10062df2fa 2007-08-28 aku: - Scan the files in the indicated directory, and look for a manifest. 10062df2fa 2007-08-28 aku: 10062df2fa 2007-08-28 aku: - When the manifest has been found parse its contents and follow the 10062df2fa 2007-08-28 aku: chain of parent links to locate the root manifest (no parent). 103c397e4b 2007-08-28 aku: 10062df2fa 2007-08-28 aku: - Import the files referenced by the root manifest, then the manifest 10062df2fa 2007-08-28 aku: itself. This can be done using a modified form of the 'commit_cmd' 10062df2fa 2007-08-28 aku: which does not have to construct a manifest on its own from vfile, 10062df2fa 2007-08-28 aku: vmerge, etc. 103c397e4b 2007-08-28 aku: 10062df2fa 2007-08-28 aku: - After that recursively apply the import of the previous step to the 10062df2fa 2007-08-28 aku: children of the root, and so on. 103c397e4b 2007-08-28 aku: 10062df2fa 2007-08-28 aku: For an incremental "reconstruct" the collection of files would not be 10062df2fa 2007-08-28 aku: a single tree with a root, but a forest, and the roots to look for are 10062df2fa 2007-08-28 aku: not manifests without parent, but with a parent which is already 10062df2fa 2007-08-28 aku: present in the repository. After one such root has been found and 10062df2fa 2007-08-28 aku: processed the unprocessed files have to be searched further for more 10062df2fa 2007-08-28 aku: roots, and only if no such are found anymore will the remaining files 10062df2fa 2007-08-28 aku: be considered as superfluous. 103c397e4b 2007-08-28 aku: 10062df2fa 2007-08-28 aku: We can use the functions in "manifest.c" for the parsing and following 10062df2fa 2007-08-28 aku: the parental chain. 103c397e4b 2007-08-28 aku: 10062df2fa 2007-08-28 aku: Hm. But we have no direct child information. So the above algorithm 10062df2fa 2007-08-28 aku: has to be modified, we have to scan all manifests before we start 10062df2fa 2007-08-28 aku: importing, and we have to create a reverse index, from manifest to 10062df2fa 2007-08-28 aku: children so that we can perform the import from root to leaves.