File Annotation
Not logged in
103c397e4b 2007-08-28       aku: 
103c397e4b 2007-08-28       aku: Notes about CVS import, regarding CVS.
103c397e4b 2007-08-28       aku: 
103c397e4b 2007-08-28       aku: - Problem: CVS does not really track changesets, but only individual
103c397e4b 2007-08-28       aku:   revisions of files. To recover changesets it is necessary to look at
103c397e4b 2007-08-28       aku:   author, branch, timestamp information, and the commit messages. Even
103c397e4b 2007-08-28       aku:   so this is only heuristic, not foolproof.
103c397e4b 2007-08-28       aku: 
103c397e4b 2007-08-28       aku:   Existing tool: cvsps.
103c397e4b 2007-08-28       aku: 
103c397e4b 2007-08-28       aku:   Processes the output of 'cvs log' to recover changesets. Problem:
103c397e4b 2007-08-28       aku:   Sees only a linear list of revisions, does not see branchpoints,
103c397e4b 2007-08-28       aku:   etc. Cannot use the tree structure to help in making the decisions.
103c397e4b 2007-08-28       aku: 
103c397e4b 2007-08-28       aku: - Problem: CVS does not track merge-points at all. Recovery through
103c397e4b 2007-08-28       aku:   heuristics is brittle at best, looking for keywords in commit
103c397e4b 2007-08-28       aku:   messages which might indicate that a branch was merged with some
103c397e4b 2007-08-28       aku:   other.
103c397e4b 2007-08-28       aku: 
103c397e4b 2007-08-28       aku: 
103c397e4b 2007-08-28       aku: Ideas regarding an algorithm to recover changesets.
103c397e4b 2007-08-28       aku: 
103c397e4b 2007-08-28       aku: Key feature: Uses the per-file revision trees to help in uncovering
103c397e4b 2007-08-28       aku: the underlying changesets and global revision tree G.
103c397e4b 2007-08-28       aku: 
103c397e4b 2007-08-28       aku: The per-file revision tree for a file X is in essence the global
103c397e4b 2007-08-28       aku: revision tree with all nodes not pertaining to X removed from it. In
103c397e4b 2007-08-28       aku: the reverse this allows us to built up the global revision tree from
103c397e4b 2007-08-28       aku: the per-file trees by matching nodes to each other and extending.
103c397e4b 2007-08-28       aku: 
103c397e4b 2007-08-28       aku: Start with the per file revision tree of a single file as initial
103c397e4b 2007-08-28       aku: approximation of the global tree. All nodes of this tree refer to the
103c397e4b 2007-08-28       aku: revision of the file belonging to it, and through that the file
103c397e4b 2007-08-28       aku: itself. At each step the global tree contains the nodes for a finite
103c397e4b 2007-08-28       aku: set of files, and all nodes in the tree refer to revisions of all
103c397e4b 2007-08-28       aku: files in the set, making the mapping total.
103c397e4b 2007-08-28       aku: 
103c397e4b 2007-08-28       aku: To add a file X to the tree take the per-file revision tree R and
103c397e4b 2007-08-28       aku: performs the following actions:
103c397e4b 2007-08-28       aku: 
103c397e4b 2007-08-28       aku: - For each node N in R use the tuple <author, branch, commit message>
103c397e4b 2007-08-28       aku:   to identify a set of nodes in G which may match N. Use the timestamp
103c397e4b 2007-08-28       aku:   to locate the node nearest in time.
103c397e4b 2007-08-28       aku: 
103c397e4b 2007-08-28       aku: - This process will leave nodes in N unmapped. If there are unmapped
103c397e4b 2007-08-28       aku:   nodes which have no neighbouring mapped nodes we have to
103c397e4b 2007-08-28       aku:   abort.
103c397e4b 2007-08-28       aku: 
103c397e4b 2007-08-28       aku:   Otherwise take the nodes which have mapped neighbours. Trace the
103c397e4b 2007-08-28       aku:   edges and see which of these nodes are connected in the local
103c397e4b 2007-08-28       aku:   tree. Then look at the identified neighbours and trace their
103c397e4b 2007-08-28       aku:   connections.
103c397e4b 2007-08-28       aku: 
103c397e4b 2007-08-28       aku:   If two global nodes have a direct connection, but a multi-edge
103c397e4b 2007-08-28       aku:   connection in the local tree insert global nodes mapping to the
103c397e4b 2007-08-28       aku:   local nodes and map them together. This expands the global tree to
103c397e4b 2007-08-28       aku:   hold the revisions added by the new file.
103c397e4b 2007-08-28       aku: 
103c397e4b 2007-08-28       aku:   Otherwise, both sides have multi-edge connections then abort. This
103c397e4b 2007-08-28       aku:   looks like a merge of two different branches, but there are no such
103c397e4b 2007-08-28       aku:   in CVS ... Wait ... sort the nodes over time and fit the new nodes
103c397e4b 2007-08-28       aku:   in between the other nodes, per the timestamps. We have overlapping
103c397e4b 2007-08-28       aku:   / alternating changes to one file and others.
103c397e4b 2007-08-28       aku: 
103c397e4b 2007-08-28       aku:   A last possibility is that a node is only connected to a mapped
103c397e4b 2007-08-28       aku:   parent. This may be a new branch, or again an alternating change on
103c397e4b 2007-08-28       aku:   the given line. Symbols on the revisions will help to map this.
103c397e4b 2007-08-28       aku: 
103c397e4b 2007-08-28       aku: - We now have an extended global tree which incorporates the revisions
103c397e4b 2007-08-28       aku:   of the new file. However new nodes will refer only to the new file,
103c397e4b 2007-08-28       aku:   and old nodes may not refer to the new file. This has to be fixed,
103c397e4b 2007-08-28       aku:   as all nodes have to refer to all files.
103c397e4b 2007-08-28       aku: 
103c397e4b 2007-08-28       aku:   Run over the tree and look at each parent/child pair. If a file is
103c397e4b 2007-08-28       aku:   not referenced in the child, but the parent, then copy a reference
103c397e4b 2007-08-28       aku:   to the file revision on the parent forward to the child. This
103c397e4b 2007-08-28       aku:   signals that the file did not change in the given revision.
103c397e4b 2007-08-28       aku: 
103c397e4b 2007-08-28       aku: - After all files have been integrated in this manner we have global
103c397e4b 2007-08-28       aku:   revision tree capturing all changesets, including the unchanged
103c397e4b 2007-08-28       aku:   files per changeset.
103c397e4b 2007-08-28       aku: 
103c397e4b 2007-08-28       aku: 
103c397e4b 2007-08-28       aku: This algorithm has to be refined to also take Attic/ files into
103c397e4b 2007-08-28       aku: account.
103c397e4b 2007-08-28       aku: