Differences From:
File
cvs2fossil.txt
part of check-in
[27ed4f7dc3]
- Extended pass InitCsets and underlying code with more log output geared towards memory introspection, and added markers for special locations. Extended my notes with general observations from the first test runs over my example CVS repositories.
by
aku on
2008-02-16 06:46:41.
[view]
To:
File
cvs2fossil.txt
part of check-in
[f637d42206]
- Updated my notes regarding memory usage. Converted more locations to incremental query processing via 'state foreachrow', now throughout the importer.
by
aku on
2008-02-24 18:01:40.
[view]
@@ -6,56 +6,34 @@
for one CVS repository. I.e. I can, for example, import all of
tcllib, or a single subproject of tcllib, like tklib, but not
multiple sub-projects in one go.
-* We have to look into the pass 'InitCsets' and hunt for the
- cause of the large amount of memory it is gobbling up.
-
- Results from the first look using the new memory tracking
- subsystem:
-
- (1) The general architecture, workflow, is a bit wasteful. All
- changesets are generated and kept in memory before getting
- persisted. This means that allocated memory piles up over
- time, with later changesets pushing the boundaries. This
- is made worse that some of the preliminary changesets seem
- to require a lot of temporary memory as part of getting
- broken down into the actual ones. InititializeBreakState
- seems to be the culprit here. Its memory usage is possibly
- quadratic in the number of items in the changeset.
-
- (2) A number of small inefficiencies. Like 'state eval' always
- pulling the whole result into memory before processing it
- with 'foreach'. Here potentially large lists.
-
- (3) We maintain an in-memory map from tagged items to their
- changesets. While this is needed later in the sorting
- passes during the creation this is wasted space. And also
- wasted time, to maintain it during the creation and
- breaking.
-
- Changes:
-
- (a) Re-architect to create, break, and persist changesets one
- by one, completely releasing all associated in-memory data
- before going to the next. Should be low-hanging fruit with
- high impact, as we have all the necessary operations
- already, just not in that order, and that alone should
- already keep the pile from forming, making the spikes of
- (2) more manageable.
-
- (b) Look into the smaller problems described in (2), and
- especially (3). These should still be low-hanging fruit,
- although of lesser effect than (a). For (3) disable the
- map and its maintenace during construction, and put it
- into a separate command, to be used when loading the
- created changesets at the end.
-
- (c) With larger effect, but more difficult to achieve, go into
- command 'InitializeBreakState' and the preceding
- 'internalsuccessors', and rearchitect it. Definitely not a
- low-hanging fruit. Possibly also something we can skip if
- doing (a) had a large enough effect.
+* Consider to rework the breaker- and sort-passes so that they
+ do not need all changesets as objects in memory.
+
+ Current memory consumption after all changesets are loaded:
+
+ bwidget 6971627 6.6
+ cvs-memchan 4634049 4.4
+ cvs-sqlite 45674501 43.6
+ cvs-trf 8781289 8.4
+ faqs 2835116 2.7
+ libtommath 4405066 4.2
+ mclistbox 3350190 3.2
+ newclock 5020460 4.8
+ oocore 4064574 3.9
+ sampleextension 4729932 4.5
+ tclapps 8482135 8.1
+ tclbench 4116887 3.9
+ tcl_bignum 2545192 2.4
+ tclconfig 4105042 3.9
+ tcllib 31707688 30.2
+ tcltutorial 3512048 3.3
+ tcl 109926382 104.8
+ thread 8953139 8.5
+ tklib 13935220 13.3
+ tk 66149870 63.1
+ widget 2625609 2.5
* Look at the dependencies on external packages and consider
which of them can be moved into the importer, either as a
simple utility command, or wholesale.