Artifact 0f9001c9f0d15a71bc48a46351fda01b62d9bbcc
File
cvs2fossil.txt
part of check-in
[27ed4f7dc3]
- Extended pass InitCsets and underlying code with more log output geared towards memory introspection, and added markers for special locations. Extended my notes with general observations from the first test runs over my example CVS repositories.
by
aku on
2008-02-16 06:46:41.
Known problems and areas to work on
===================================
* Not yet able to handle the specification of multiple projects
for one CVS repository. I.e. I can, for example, import all of
tcllib, or a single subproject of tcllib, like tklib, but not
multiple sub-projects in one go.
* We have to look into the pass 'InitCsets' and hunt for the
cause of the large amount of memory it is gobbling up.
Results from the first look using the new memory tracking
subsystem:
(1) The general architecture, workflow, is a bit wasteful. All
changesets are generated and kept in memory before getting
persisted. This means that allocated memory piles up over
time, with later changesets pushing the boundaries. This
is made worse that some of the preliminary changesets seem
to require a lot of temporary memory as part of getting
broken down into the actual ones. InititializeBreakState
seems to be the culprit here. Its memory usage is possibly
quadratic in the number of items in the changeset.
(2) A number of small inefficiencies. Like 'state eval' always
pulling the whole result into memory before processing it
with 'foreach'. Here potentially large lists.
(3) We maintain an in-memory map from tagged items to their
changesets. While this is needed later in the sorting
passes during the creation this is wasted space. And also
wasted time, to maintain it during the creation and
breaking.
Changes:
(a) Re-architect to create, break, and persist changesets one
by one, completely releasing all associated in-memory data
before going to the next. Should be low-hanging fruit with
high impact, as we have all the necessary operations
already, just not in that order, and that alone should
already keep the pile from forming, making the spikes of
(2) more manageable.
(b) Look into the smaller problems described in (2), and
especially (3). These should still be low-hanging fruit,
although of lesser effect than (a). For (3) disable the
map and its maintenace during construction, and put it
into a separate command, to be used when loading the
created changesets at the end.
(c) With larger effect, but more difficult to achieve, go into
command 'InitializeBreakState' and the preceding
'internalsuccessors', and rearchitect it. Definitely not a
low-hanging fruit. Possibly also something we can skip if
doing (a) had a large enough effect.
* Look at the dependencies on external packages and consider
which of them can be moved into the importer, either as a
simple utility command, or wholesale.
struct::list
assign, map, reverse, filter
Very few and self-contained commands.
struct::set
size, empty, contains, add, include, exclude,
intersect, subsetof
Most of the core commands.
fileutil
cat, appendToFile, writeFile,
tempfile, stripPath, test
fileutil::traverse
In toto
struct::graph
In toto
snit
In toto
sqlite3
In toto