Diff
Not logged in

Differences From:

File cvs2fossil.txt part of check-in [be2f99e6a4] - Merge with aku's branch. by drh on 2008-02-13 14:44:50. Also file cvs2fossil.txt part of check-in [6d5de5f1c1] - Tuned the handling of the vendor branch in case we have multiple different symbols representing it. The import pass now effectively merges these symbols into a single line of development. by aku on 2008-02-13 04:57:43. [view]

To:

File cvs2fossil.txt part of check-in [27ed4f7dc3] - Extended pass InitCsets and underlying code with more log output geared towards memory introspection, and added markers for special locations. Extended my notes with general observations from the first test runs over my example CVS repositories. by aku on 2008-02-16 06:46:41. [view]

@@ -9,8 +9,54 @@
 
 *	We have to look into the pass 'InitCsets' and hunt for the
 	cause of the large amount of memory it is gobbling up.
 
+	Results from the first look using the new memory tracking
+	subsystem:
+
+	(1) The general architecture, workflow, is a bit wasteful. All
+	    changesets are generated and kept in memory before getting
+	    persisted. This means that allocated memory piles up over
+	    time, with later changesets pushing the boundaries. This
+	    is made worse that some of the preliminary changesets seem
+	    to require a lot of temporary memory as part of getting
+	    broken down into the actual ones. InititializeBreakState
+	    seems to be the culprit here. Its memory usage is possibly
+	    quadratic in the number of items in the changeset.
+
+	(2) A number of small inefficiencies. Like 'state eval' always
+	    pulling the whole result into memory before processing it
+	    with 'foreach'. Here potentially large lists.
+
+	(3) We maintain an in-memory map from tagged items to their
+	    changesets. While this is needed later in the sorting
+	    passes during the creation this is wasted space. And also
+	    wasted time, to maintain it during the creation and
+	    breaking.
+
+	Changes:
+
+	(a) Re-architect to create, break, and persist changesets one
+	    by one, completely releasing all associated in-memory data
+	    before going to the next. Should be low-hanging fruit with
+	    high impact, as we have all the necessary operations
+	    already, just not in that order, and that alone should
+	    already keep the pile from forming, making the spikes of
+	    (2) more manageable.
+
+	(b) Look into the smaller problems described in (2), and
+	    especially (3). These should still be low-hanging fruit,
+	    although of lesser effect than (a). For (3) disable the
+	    map and its maintenace during construction, and put it
+	    into a separate command, to be used when loading the
+	    created changesets at the end.
+
+	(c) With larger effect, but more difficult to achieve, go into
+	    command 'InitializeBreakState' and the preceding
+	    'internalsuccessors', and rearchitect it. Definitely not a
+	    low-hanging fruit. Possibly also something we can skip if
+	    doing (a) had a large enough effect.
+
 *	Look at the dependencies on external packages and consider
 	which of them can be moved into the importer, either as a
 	simple utility command, or wholesale.
 
@@ -38,5 +84,5 @@
 	snit
 		In toto
 
 	sqlite3
-		In tota
+		In toto