File Annotation
Not logged in
f166b0a63c 2007-08-31       aku: ===============================================================================
f166b0a63c 2007-08-31       aku: 
f166b0a63c 2007-08-31       aku: First experimental codes ...
f166b0a63c 2007-08-31       aku: 
f166b0a63c 2007-08-31       aku: toosl/import-cvs.tcl
f166b0a63c 2007-08-31       aku: tools/lib/rcsparser.tcl
f166b0a63c 2007-08-31       aku: 
f166b0a63c 2007-08-31       aku: No actual import, right now only working on getting csets right. The
f166b0a63c 2007-08-31       aku: code uses CVSROOT/history as foundation, and augments that with data
f166b0a63c 2007-08-31       aku: from the individual RCS files (commit messages).
f166b0a63c 2007-08-31       aku: 
f166b0a63c 2007-08-31       aku: Statistics of a run ...
f166b0a63c 2007-08-31       aku: 	3516 csets.
f166b0a63c 2007-08-31       aku: 
f166b0a63c 2007-08-31       aku: 	1545 breaks on user change
f166b0a63c 2007-08-31       aku: 	 558 breaks on file duplicate
f166b0a63c 2007-08-31       aku: 	  13 breaks on branch/trunk change
f166b0a63c 2007-08-31       aku: 	1402 breaks on commit message change
f166b0a63c 2007-08-31       aku: 
f166b0a63c 2007-08-31       aku: Time statistics ...
f166b0a63c 2007-08-31       aku: 	3297 were processed in <= 1 seconds (93.77%)
f166b0a63c 2007-08-31       aku: 	 217 were processed in between 2 seconds and 14 minutes.
f166b0a63c 2007-08-31       aku: 	   1 was  processed in ~41 minutes
f166b0a63c 2007-08-31       aku: 	   1 was  processed in ~22 hours
f166b0a63c 2007-08-31       aku: 
f166b0a63c 2007-08-31       aku: Time fuzz - Differences between csets range from 0 seconds to 66
f166b0a63c 2007-08-31       aku: days. Needs stats analysis to see if there is an obvious break. Even
f166b0a63c 2007-08-31       aku: so the times within csets and between csets overlap a great deal,
f166b0a63c 2007-08-31       aku: making time a bad criterium for cset separation, IMHO.
f166b0a63c 2007-08-31       aku: 
f166b0a63c 2007-08-31       aku: Leaving that topic, back to the current cset separator ...
f166b0a63c 2007-08-31       aku: 
f166b0a63c 2007-08-31       aku: It has a problem:
f166b0a63c 2007-08-31       aku: 	The history file is not starting at the root!
f166b0a63c 2007-08-31       aku: 
f166b0a63c 2007-08-31       aku: Examples:
f166b0a63c 2007-08-31       aku: 	The first three changesets are
f166b0a63c 2007-08-31       aku: 
f166b0a63c 2007-08-31       aku: 	=============================/user
f166b0a63c 2007-08-31       aku: 	M {Wed Nov 22 09:28:49 AM PST 2000} ericm 1.4 tcllib/modules/ftpd/ChangeLog
f166b0a63c 2007-08-31       aku: 	M {Wed Nov 22 09:28:49 AM PST 2000} ericm 1.7 tcllib/modules/ftpd/ftpd.tcl
f166b0a63c 2007-08-31       aku: 	files: 2
f166b0a63c 2007-08-31       aku: 	delta: 0
f166b0a63c 2007-08-31       aku: 	range: 0 seconds
f166b0a63c 2007-08-31       aku: 	=============================/cmsg
f166b0a63c 2007-08-31       aku: 	M {Wed Nov 29 02:14:33 PM PST 2000} ericm 1.3 tcllib/aclocal.m4
f166b0a63c 2007-08-31       aku: 	files: 1
f166b0a63c 2007-08-31       aku: 	delta:
f166b0a63c 2007-08-31       aku: 	range: 0 seconds
f166b0a63c 2007-08-31       aku: 	=============================/cmsg
f166b0a63c 2007-08-31       aku: 	M {Sun Feb 04 12:28:35 AM PST 2001} ericm 1.9 tcllib/modules/mime/ChangeLog
f166b0a63c 2007-08-31       aku: 	M {Sun Feb 04 12:28:35 AM PST 2001} ericm 1.12 tcllib/modules/mime/mime.tcl
f166b0a63c 2007-08-31       aku: 	files: 2
f166b0a63c 2007-08-31       aku: 	delta: 0
f166b0a63c 2007-08-31       aku: 	range: 0 seconds
f166b0a63c 2007-08-31       aku: 
f166b0a63c 2007-08-31       aku: All csets modify files which already have several revisions. We have
f166b0a63c 2007-08-31       aku: no csets from before that in the history, but these csets are in the
f166b0a63c 2007-08-31       aku: RCS files.
f166b0a63c 2007-08-31       aku: 
f166b0a63c 2007-08-31       aku: I wonder, is SF maybe removing old entries from the history when it
f166b0a63c 2007-08-31       aku: grows too large ?
f166b0a63c 2007-08-31       aku: 
f166b0a63c 2007-08-31       aku: This also affects incremental import ... I cannot assume that the
f166b0a63c 2007-08-31       aku: history always grows. It may shrink ... I cannot keep an offset, will
f166b0a63c 2007-08-31       aku: have to record the time of the last entry, or even the full entry
f166b0a63c 2007-08-31       aku: processed last, to allow me to skip ahead to anything not known yet.
f166b0a63c 2007-08-31       aku: 
f166b0a63c 2007-08-31       aku: I might have to try to implement the algorithm outlined below,
f166b0a63c 2007-08-31       aku: matching the revision trees of the individual RCS files to each other
f166b0a63c 2007-08-31       aku: to form the global tree of revisions. Maybe we can use the history to
f166b0a63c 2007-08-31       aku: help in the matchup, for the parts where we do have it.
f166b0a63c 2007-08-31       aku: 
f166b0a63c 2007-08-31       aku: Wait. This might be easier ... Take the delta information from the RCS
f166b0a63c 2007-08-31       aku: files and generate a fake history ... Actually, this might even allow
f166b0a63c 2007-08-31       aku: us to create a total history ... No, not quite, the merge entries the
f166b0a63c 2007-08-31       aku: actual history may contain will be missing. These we can mix in from
f166b0a63c 2007-08-31       aku: the actual history, as much as we have.
f166b0a63c 2007-08-31       aku: 
f166b0a63c 2007-08-31       aku: Still, lets try that, a fake history, and then run this script on it
f166b0a63c 2007-08-31       aku: to see if/where are differences.
f166b0a63c 2007-08-31       aku: 
f166b0a63c 2007-08-31       aku: ===============================================================================
f166b0a63c 2007-08-31       aku: 
103c397e4b 2007-08-28       aku: 
103c397e4b 2007-08-28       aku: Notes about CVS import, regarding CVS.
103c397e4b 2007-08-28       aku: 
103c397e4b 2007-08-28       aku: - Problem: CVS does not really track changesets, but only individual
103c397e4b 2007-08-28       aku:   revisions of files. To recover changesets it is necessary to look at
103c397e4b 2007-08-28       aku:   author, branch, timestamp information, and the commit messages. Even
103c397e4b 2007-08-28       aku:   so this is only heuristic, not foolproof.
103c397e4b 2007-08-28       aku: 
103c397e4b 2007-08-28       aku:   Existing tool: cvsps.
103c397e4b 2007-08-28       aku: 
103c397e4b 2007-08-28       aku:   Processes the output of 'cvs log' to recover changesets. Problem:
103c397e4b 2007-08-28       aku:   Sees only a linear list of revisions, does not see branchpoints,
103c397e4b 2007-08-28       aku:   etc. Cannot use the tree structure to help in making the decisions.
103c397e4b 2007-08-28       aku: 
103c397e4b 2007-08-28       aku: - Problem: CVS does not track merge-points at all. Recovery through
103c397e4b 2007-08-28       aku:   heuristics is brittle at best, looking for keywords in commit
103c397e4b 2007-08-28       aku:   messages which might indicate that a branch was merged with some
103c397e4b 2007-08-28       aku:   other.
103c397e4b 2007-08-28       aku: 
103c397e4b 2007-08-28       aku: 
103c397e4b 2007-08-28       aku: Ideas regarding an algorithm to recover changesets.
103c397e4b 2007-08-28       aku: 
103c397e4b 2007-08-28       aku: Key feature: Uses the per-file revision trees to help in uncovering
103c397e4b 2007-08-28       aku: the underlying changesets and global revision tree G.
103c397e4b 2007-08-28       aku: 
103c397e4b 2007-08-28       aku: The per-file revision tree for a file X is in essence the global
103c397e4b 2007-08-28       aku: revision tree with all nodes not pertaining to X removed from it. In
103c397e4b 2007-08-28       aku: the reverse this allows us to built up the global revision tree from
103c397e4b 2007-08-28       aku: the per-file trees by matching nodes to each other and extending.
103c397e4b 2007-08-28       aku: 
103c397e4b 2007-08-28       aku: Start with the per file revision tree of a single file as initial
103c397e4b 2007-08-28       aku: approximation of the global tree. All nodes of this tree refer to the
103c397e4b 2007-08-28       aku: revision of the file belonging to it, and through that the file
103c397e4b 2007-08-28       aku: itself. At each step the global tree contains the nodes for a finite
103c397e4b 2007-08-28       aku: set of files, and all nodes in the tree refer to revisions of all
103c397e4b 2007-08-28       aku: files in the set, making the mapping total.
103c397e4b 2007-08-28       aku: 
103c397e4b 2007-08-28       aku: To add a file X to the tree take the per-file revision tree R and
103c397e4b 2007-08-28       aku: performs the following actions:
103c397e4b 2007-08-28       aku: 
103c397e4b 2007-08-28       aku: - For each node N in R use the tuple <author, branch, commit message>
103c397e4b 2007-08-28       aku:   to identify a set of nodes in G which may match N. Use the timestamp
103c397e4b 2007-08-28       aku:   to locate the node nearest in time.
103c397e4b 2007-08-28       aku: 
103c397e4b 2007-08-28       aku: - This process will leave nodes in N unmapped. If there are unmapped
103c397e4b 2007-08-28       aku:   nodes which have no neighbouring mapped nodes we have to
103c397e4b 2007-08-28       aku:   abort.
103c397e4b 2007-08-28       aku: 
103c397e4b 2007-08-28       aku:   Otherwise take the nodes which have mapped neighbours. Trace the
103c397e4b 2007-08-28       aku:   edges and see which of these nodes are connected in the local
103c397e4b 2007-08-28       aku:   tree. Then look at the identified neighbours and trace their
103c397e4b 2007-08-28       aku:   connections.
103c397e4b 2007-08-28       aku: 
103c397e4b 2007-08-28       aku:   If two global nodes have a direct connection, but a multi-edge
103c397e4b 2007-08-28       aku:   connection in the local tree insert global nodes mapping to the
103c397e4b 2007-08-28       aku:   local nodes and map them together. This expands the global tree to
103c397e4b 2007-08-28       aku:   hold the revisions added by the new file.
103c397e4b 2007-08-28       aku: 
103c397e4b 2007-08-28       aku:   Otherwise, both sides have multi-edge connections then abort. This
103c397e4b 2007-08-28       aku:   looks like a merge of two different branches, but there are no such
103c397e4b 2007-08-28       aku:   in CVS ... Wait ... sort the nodes over time and fit the new nodes
103c397e4b 2007-08-28       aku:   in between the other nodes, per the timestamps. We have overlapping
103c397e4b 2007-08-28       aku:   / alternating changes to one file and others.
103c397e4b 2007-08-28       aku: 
103c397e4b 2007-08-28       aku:   A last possibility is that a node is only connected to a mapped
103c397e4b 2007-08-28       aku:   parent. This may be a new branch, or again an alternating change on
103c397e4b 2007-08-28       aku:   the given line. Symbols on the revisions will help to map this.
103c397e4b 2007-08-28       aku: 
103c397e4b 2007-08-28       aku: - We now have an extended global tree which incorporates the revisions
103c397e4b 2007-08-28       aku:   of the new file. However new nodes will refer only to the new file,
103c397e4b 2007-08-28       aku:   and old nodes may not refer to the new file. This has to be fixed,
103c397e4b 2007-08-28       aku:   as all nodes have to refer to all files.
103c397e4b 2007-08-28       aku: 
103c397e4b 2007-08-28       aku:   Run over the tree and look at each parent/child pair. If a file is
103c397e4b 2007-08-28       aku:   not referenced in the child, but the parent, then copy a reference
103c397e4b 2007-08-28       aku:   to the file revision on the parent forward to the child. This
103c397e4b 2007-08-28       aku:   signals that the file did not change in the given revision.
103c397e4b 2007-08-28       aku: 
103c397e4b 2007-08-28       aku: - After all files have been integrated in this manner we have global
103c397e4b 2007-08-28       aku:   revision tree capturing all changesets, including the unchanged
103c397e4b 2007-08-28       aku:   files per changeset.
103c397e4b 2007-08-28       aku: 
103c397e4b 2007-08-28       aku: 
103c397e4b 2007-08-28       aku: This algorithm has to be refined to also take Attic/ files into
103c397e4b 2007-08-28       aku: account.
103c397e4b 2007-08-28       aku: