Fossil: Artifact Content

Artifact e5b344f6b26dffaaaaa0c4991b39d84e6372b1e9

Wiki page [Import CVS Repositories] by anonymous on 2007-12-04 20:25:42.
D 2007-12-04T20:25:42
L Import\sCVS\sRepositories
P b261253ccef6558a1bcf5a653efa9687b1fa5b02
U anonymous
W 2742
Spiritual ancestor: [http://cvs2svn.tigris.org/|cvs2svn].

Similarities:
  *  Using identical highlevel architecture (pass-based)
  *  Using some specific algorithms (graph traversal).

Differences:
  *  Not using any code (Different languages for one thing, [http://www.python.org/|Python] there, [http://www.tcl.tk/|Tcl]here.).
  *  Persistent state completely different, using [http://www.sqlite.org/|sqlite] database for all things.

Status:
  *  Pass CollAr: Collect archives - ok.
  *  Pass CollRev: Collect revisions, tags, branches (file level) - ok.
  *  Pass CollSym: Collate symbol (project level) from the file level data - ok.
  *  Pass FilterSym: Filter symbols, exclude symbols and lines of development - ok.
  *  Pass InitCsets: Create initial changesets - ok. <b>Memory Hog, Slow commit</b>
  *  Pass CsetDeps: Compute changeset dependencies from revision dependencies - ok.
  *  Pass BreakRCycle: Break cycles among revision changesets - ok.
  *  Pass RevTopSort:  Topologically sort revision changesets - ok.
  *  Pass BreakSCycle: Break cycles among symbol changesets - ok.
  *  Pass BreakACycle: Break cycles over all changesets - <b>May still change the order of revision changesets over the result of pass 7.</b>
  *  Pass ATopSort: Should be ok.

Passes to do:
  *  Put changeset order from the top.sort passes and tree of symbols from the coll|FilterSym passes together into a tree of changesets. Note that it might not be a tree if there is an NTDB around.
  * Perform the actual import.

Notes regarding the actual import:
<ul>
<li>cvs2svn is either slow, or hungry for diskspace. The reason: It is importing changeset by changeset and so has to either regenerate the needed revisions of the files on-demand over and over, or it caches the needed revisions when created until the last user is gone.
</li>
<li>We can do better, if we get help from fossil. We would need commands to perform the following actions:
<ul>
<li> Import a file as blob, return its internal id.
</li>
<li> Deltify a known file respective to a second known file.
</li>
<li> Generate a manifest file for a list of files (paths, ids), parent manifest references, user, timestamp, log message. Could be signed or not.
</li>
</ul>
With these actions (possible in combination) we can import the archive files first, needing only space for the revisions of a single file (bounded by the largest file in terms of size and history), with their delta-links mirroring the RCS structure. After that we can independently generate, import, and deltify the manifests for changesets. to finalize we simply 'rebuild' the repository. This should be fast without needing much temporary disk space either.
</li>
</ul>

Z 09665385026692bf8fe7ab94e4048702