Artifact Content
Not logged in

Artifact 93bae4c111fa5a78ca82d7161801710184f98b3d

File www/concepts.wiki part of check-in [e5aac82dd5] - Fix a typo on the concepts.wiki page. by drh on 2008-05-16 15:58:30.

Fossil Concepts

1.0 Introduction

Fossil is a software configuration management system. Fossil is software that is designed to control and track the development of a software project and to record the history of the project. There are many such systems in use today. Fossil strives to distinguish itself from the others by being extremely simple to setup and operate.

This document is intended as a quick introduction to the concepts behind fossil.

2.0 Composition Of A Project

A software project normally consists of a "source tree". A source tree is a hierarchy of files that are used to generate the end product. The source tree changes over time as the software grows and expands and as features are added and bugs are fixed. A snapshot of the source tree at any point in time is called a "version" or "revision" or a "baseline" of the product. In fossil, we use the name "baseline".

A "repository" is a database that contains copies of all historical versions or baselines for a project. Baselines are normally stored in the repository in a highly space-efficient compressed format (delta encoding). But that is an implementation detail that you the user need not worry over. Think of the repository as a safe place where all your old baselines are securely stored away and available for retrieval whenever you need them.

A repository in fossil is a single file on your disk. This file might be rather large (dozens or hundreds of megabytes for a large or long running project) but it is nevertheless just a file. You can move it around, rename it, write it out to a memory stick, or do anything else you normally do with files.

Each source tree that is controlled by fossil is associated with a single repository on the local disk drive. You can tie two or more source trees to a single repository if you want (though one tree per repository is the most common configuration.) So a single repository can be associated with many source trees, but each source tree is associated with only one repository.

Fossil source trees may not overlap. A fossil source tree is identified by a file named "_FOSSIL_" in the root directory of the source tree. Every file that is a sibling of _FOSSIL_ and every file in every subfolder is considered potentially a part of the source tree. The _FOSSIL_ file contains (among other things) the pathname of the repository with which the source tree is associated. On the other hand, the repository has no record of its source trees. So you are free to delete a source tree or move it around without consequence. But if you move or rename or delete a repository, then any source trees associated with that repository will no longer be able to locate their repository and will stop working.

When multiple developers are working on the same project, each developer typically has his or her own local repository and an associated source tree in which to work. Developers share their work by "syncing" the content of their local repositories either directly or through a central server. Changes can "push" from the local repository into a remote repository. Or changes can "pull" from a remote repository into a local repository. Or one can do a "sync" which is a shortcut for doing both a push and a pull at the same time. Fossil also has the concept of "cloning". A "clone" is like a "pull", except that instead of beginning with an existing local repository, a clone begins with nothing and creates a new local repository that is a duplicate of a remote repository.

Communication between repositories is via HTTP. Remote repositories are identified by URL. You can also point a webbrowser at a repository and get human-readable status, history, and tracking information about the project.

2.1 Identification Of Artifacts

A particular version of a particular file is called an "artifact". Each artifact has a universally unique name which is the SHA1 hash of the content of that file expressed as 40 characters of lower-case hexadecimal. Such a hash is referred to as the Universally Unique Identifier or UUID for the artifact. The SHA1 algorithm is created with the purpose of providing a highly forgery-resistent identifier for a file. Given any file it is simple to find the UUID for that file. But given a UUID it is computationally intractable to generate a file that will have that UUID.

UUIDs look something like this:

6089f0b563a9db0a6d90682fe47fd7161ff867c8
59712614a1b3ccfd84078a37fa5b606e28434326
19dbf73078be9779edd6a0156195e610f81c94f9
b4104959a67175f02d6b415480be22a239f1f077
997c9d6ae03ad114b2b57f04e9eeef17dcb82788

When referring to an artifact using fossil, you can use a unique prefix of the UUID that is four characters or longer. This saves a lot of typing. When displaying UUIDs, fossil will usually only show the first 10 digits since that is normally enough to uniquely identify a file.

Changing (or adding or removing) a single byte in a file results in a completely different UUID. And since the UUID is the name of the artifact, making any change to a file results in a new artifact. In this way, artifacts are immutable.

A repository is really just an unordered collection of artifacts. New artifacts can be added to the repository, but existing artifacts can never be removed. Fossil is designed in such a way that it can be handed a set of artifacts in any order and it can figure out the relationship between those artifacts and reconstruct the complete development history of a software project.

2.2 Manifests

At the root of a source tree is a special file called the "manifest". The manifest is a listing of all other files in that source tree. The manifest contains the (complete) UUID of the file and the name of the file as it appears on disk, and thus serves as a mapping from UUID to disk name. The UUID of the manifest is the UUID that identifies a baseline. When you look at a "timeline" of changes in fossil, the UUID associated with each check-in or commit is really just the UUID of the manifest for that baseline.

Fossil automatically generates a manifest whenever you "commit" a new baseline. So this is not something that you, the developer, need to worry with. The format of a manifest is intentionally designed to be simple to parse, so that if you want to read and interpret a manifest, either by hand or with a script, that is easy to do. But you will probably never need to do so.

In addition to identifying all files in the baseline, a manifest also contains a check-in comment, the date and time when the baseline was established, who created the baseline, and links to other baselines from which the current baseline is derived. There is also a couple of checksums used to verify the integrity of the baseline. And the whole manifest might be PGP clearsigned.

2.3 Key concepts

3.0 Fossil - The Program

Fossil is software. The implementation of fossil is in the form of a single executable named "fossil". To install fossil on your system, all you have to do is obtain a copy of this one executable file (either by downloading a precompiled version or compiling it yourself) and then putting that file somewhere on your PATH.

Fossil is completely self-contained. It is not necessary to install any other software in order to use fossil. You do not need CVS, gzip, diff, rsync, Python, Perl, Tcl, Java, apache, PostgreSQL, MySQL, SQLite, patch, or any similar software on your system in order to use fossil effectively. You will want to have some kind of text editor for entering check-in comments. Fossil will use whatever text editor is identified by your VISUAL environment variable. Fossil will also use GPG to clearsign your manifests if you happen to have it installed, but fossil will skip that step if GPG missing from your system. You can optionally set up fossil to use external "diff" programs, though a perfectly functional "diff" algorithm is built it and works fine for most people.

To uninstall fossil, simply delete the executable.

To upgrade an older version of fossil to a newer version, just replace the old executable with the new one. You might need to run a one-time command to restructure your repositories after an upgrade. Check the instructions that come with the upgrade for details.

To use fossil, simply type the name of executable in your shell, followed by one of the various built-in commands and arguments appropriate for that command. For example:

fossil help

In the next section, when we say things like "use the help command" we mean to use the command name "help" as the first token after the name of the fossil executable, as shown above.

4.0 Workflow

  1. Establish a local repository using either the new command to start a new project, or the clone command to make a clone of a repository for an existing project.

  2. Establish one or more source trees by changing your working directory to where you want the root of the source tree to be, then issuing the open command with the name of the repository file as its argument.

  3. Use the update command followed by a UUID to cause your source tree to change to the baseline identified by that UUID. The timeline or leaves commands might help you to identify an appropriate baseline.

  4. Edit the code. Add new files to the source tree using the add command. Omit files from future baselines using the rm command. (Even when you remove files from future baselines, those files continue to exist in historical baselines.) Test your changes.

  5. Create a new baseline using the commit command. You will be prompted for a check-in comment and also for your GPG key if you have GPG installed. The commit copies the edits you have made in your local source tree into your local repository.

  6. Share your changes with others using the push command. Push causes the edits you committed into your local repository to be pushed out into other repositories.

  7. When your coworkers make their own changes, you can pull those changes into your local repository using the pull command. Note that the pull command only pulls the changes into your local repository, not into your local source tree.

  8. After the changes of others are in your local repository, you can move them into your local source tree using update. If you have made parallel changes, you can merge your changes together with your coworkers changes by do an update to your latest baseline, then doing a merge with your coworkers latest baseline. After your verify that the merged code is still functional, you can commit a new baseline that contains both yours and your coworkers changes and then push the new baseline back to your coworker.

  9. Repeat all of the above until you have generated great software.

4.1 Variations

The settings lets you view and modify various operating properties of fossil. Among the available settings is "autosync" mode. When autosync is enabled, the push and pull of content from your local server is largely automated. Whenever you use the update command, fossil first does a pull to see if other users have perhaps added new baselines to the central repository. When you commit, fossil also does a pull and issues a warning if your check-in would cause a fork. After a commit, fossil automatically does a push to send your changes up to the central server.

With autosync enabled, fossil works like CVS or Subversion. When autosync disabled, fossil works more like Monotone, GIT, or Mercurial. The fun thing about fossil is that it will work either way, depending on your needs of the moment. You can freely switch between these operating modes using commands like:

fossil setting autosync off
fossil setting autosync on

For additional information about autosync and other settings using the help command.

5.0 Setting Up A Fossil Server

With other configuration management software, setting up a server is a lot of work and normally takes time, patience, and a lot of system knowledge. Fossil is designed to avoid this frustration. Setting up a server with fossil is ridiculously easy. You have three options:

  1. Setting up a stand-alone server

    From within your source tree just use the server command and fossil will start listening for incoming requests on TCP port 8080. You can point your webbrowser at http://localhost:8080/ and begin exploring. Or your coworkers can do pushes or pulls against your server. Use the --port option to the server command to specify a different TCP port. If you do not have a local source tree, use the -R command-line option to specify the repository file.

    A stand-alone server is a great way to set of transient connections between coworkers for doing quick pushes or pulls. But you can also set up a permanent stand-alone server if you prefer. Just make arrangements for fossil to be launched with appropriate arguments after every reboot.

  2. Setting up a CGI server

    If you have a webserver running on your machine already, you can set up fossil to be run from CGI. Simply create an executable script that looks something like this:

    #!/usr/local/bin/fossil
    repository: /home/me/bigproject.fossil
    

    Edit this script to use whatever pathnames are appropriate for your project. Then point your webbrowser at the script and off you go.

  3. Setting up an inetd server

    If you have inetd or xinetd running on your system, you can set those services up to launch fossil to deal with inbound TCP/IP connections on whatever port you want. Set up inetd or xinetd to launch fossil like this:

    /usr/local/bin/fossil http /home/me/bigproject.fossil
    

    As before, change the filenames to whatever is appropriate for your system. You can have fossil run as any user that has write permission on the repository and on the directory that contains the repository. But it is safer to run fossil as root. When fossil sees that it is running as root, it automatically puts itself into a chroot jail and drops all privileges prior to reading any information from the client. Since fossil is a stand-alone program, you do not need to put anything in the chroot jail with fossil in order for it to do its job.