Wednesday, January 06, 2010

How to plan software as a configuration item

Yesterday I got a question from a colleague about configuration items. In our projects we use a deliverable list to identify which items are being delivered by the project and which are being received from outside the project. To help projects plan their configuration items we have defined a deliverable list template.

The deliverable list template lists documents like various types of requirement specs, design specs, test specs, manuals, training materials, etcetera. It also contains an single entry for Software. Now the question was:

What software deliverables (and receivables) should a project plan for?
Obviously, the deliverable list template does not give much clues. But we know that the software consist of components, some of which are received from other projects or third party suppliers - and possibly modified by the project - and some of which are created by the project. Then, how do you know which software deliverables the project is going to deliver?

The answer is that it depends on the agreement with the customer of the project. The requirements agreed with the customer state what he will get, in terms of functionality, performance, quality and deliverables. And these deliverables should be listed in the deliverable list. If the requirements don't state the (software) deliverables, then apparently the customer does not care as a long as the functional and non-functional requirements are satisfied.
In that case, the architecture or high level design determines the software deliverables. Even more, the architecture determine which of the software components are reused from other projects or third party suppliers and which are created anew.

Okay, now that requirements and architecture/design define which software deliverables and receivables are applicable,
how do we plan in which release they are delivered?
For projects with a single (final) release, it is simple: everything is released and delivered at the end. For projects with multiple releases, like incremental development or iterative development, it is more complicated but still quite simple.

The total work of a project is divided in "work packages", chunks of work that deliver a number of (possibly internal) deliverables. These work packages are assigned to a release and this way it is determined that the deliverables of those work packages are delivered at that release. More precisely, the deliverables are packed together as a baseline and the baseline is released.

Now in many software development organizations, the work packages are handled just like problem reports and change requests. They are registered in the change control tool/database, and assigned to people and releases (or iterations). We could call them "work items"; the list of work items planned for a release or iteration (in a release plan or iteration plan) determines which software deliverables are released. You could list those deliverables in the deliverable list, but it is more practical and more accurate to generate a report from the change control database listing the work packages, problem reports and change requests assigned to a particular release or iteration.

Monday, June 08, 2009

SCM tools are not configuration management tools

The main reason why most so-called SCM tools are not really SCM tools is because they don't support managing software configurations. Making software is more than writing source code and converting them into executable code and data models for databases. A real SCM tool would be able to capture everything that is important for deploying and maintaining the software. This includes requirements, designs, models, sources, tools, infrastructure, knowledge, skills, test scripts, test data, manuals, scripts and other information.

Most SCM tools are able to capture files in a structure and control changes to the files and the structure. New files and new versions of existing files are all merely new files. Typically, the structure is 3-dimensional:

  • Directory (or folders) structure
  • Version (or revision) structure
  • Branching structure
The directory structure may seem to be a 2-dimensional structure (i.e. nested directories and directory next to eachother at the same nesting level), but if we consider the pathname + filename as the single identifier for a file, then the directory may be considered as 1-dimensional. The version structure is 1-dimensional: the successive versions supersede their predecessors. How about parallel versions? Parallel versions are partial contributions to a single succesor version. The actual successor is a merge of these partial contributions. Branches look similar to parallel versions, but the essential difference is that parallel versions are partial contributions to a single successor while branches are full contributions (version structures) to alternative successors.

If we look at the implementation of these dimensions, then the most simple implementation is 1-dimensional: all 3 dimensions are projected onto the same implementation, e.g. as directory structure (or path+filename). The "version control tool" could be an ordinary file system. For example:

main/gui/generic/foo-v1.c
main/webgui/unix/foo-v1.c
main/webgui/unix/foo-v2.c
main/webgui/unix/foo-v3.c
main/webgui/winxp/foo-v1.c
main/webgui/winxp/foo-v2.c
R1.0.0.0/gui/generic/foo-v1.c
R1.0.0.0/webgui/unix/foo-v2.c
R1.0.0.0/webgui/winxp/foo-v2.c

The problem with this version control system is that path+filenames changes for every new version. So users and the build process have to do more work to figure out which version they should us.

The next better implemention would be a 2-dimensional implementation: directory and branching are combined into a single dimension (path+filename), and versioning is the other dimension. Simple version control tools like Subversion works this way. For example:

main/gui/generic/foo.c (versions: 1)
main/webgui/unix/foo.c (versions: 1, 2 and 3)
main/webgui/winxp/foo.c (versions: 1 and 2)
R1.0.0.0/gui/generic/foo.c (versions: 1)
R1.0.0.0/webgui/unix/foo.c (versions: 2)
R1.0.0.0/webgui/winxp/foo.c (versions: 2)

Advantage is that the path+filename remains the same for all versions within a branch. But for different branches, the path+filename is different. And since directories and branches are resolved in the same dimension (the path), it is not possible to distinguish between a directory and a branch. For example, are main and R1.0.0.0 different branches? Are unix and winxp different branches? Are gui and webgui different branches? Or are they different directories within the same branch? So users have to make agreements about naming conventions to distinguish between branches. The SCM tool only takes care of deciding (automated) which version is used.

One step further is a 3-dimensional solution, where directory (path+filename), version and branch are independent of each other. More advanced version control tools like ClearCase or Synergy are needed. For example:

gui/foo.c (versions: 1 on branch: main)
webgui/foo.c (versions: 1 and on branch: main; versions: 2 and 3 on branch: unix; versions: 2 on branch winxp)

Advantage is now that the path+filename remains the same for all versions and all branches. This simplifies the implementation of an automated build process and the description in design documents and models. But the counterside is that SCM has to have the information to decide which branch a user is working on in order to select the correct version for the user to work on. And the user may be unaware of the branch he is working on - introducing the risk that he is working on the wrong branch.
So on one side SCM makes life easier for the user and the organization (e.g. automation), but on the other side it introduces extra work to reduce the risk mistakes or to repair them.

As you can see, I have left out the baseline (R1.0.0.0) from the last example. In the first and second example (1 and 2 dimensional), the baseline was combined with the directory dimension. In the last example, the baseline could be combined with the branch dimension, but it could also be implemented as a 4th dimension: labeling or tagging.

And this brings me to the point where version control enters the domain of configuration management. An essential feature of configuration management is to identify dependencies. A dependency defines which objects belong together. There are many different dependencies that can be (or need to be) identified, for example:
  • Directory dependency: all files within the same directory tree
  • Branch dependency: all files on the same branch
  • Version dependency: all latest versions
  • Status dependency: all files with the same status (e.g. release R1.0.0.0)
  • Content dependency: all files with compatible content (e.g. requirements-design-code consistency)
The status dependency is typically modeled in a so-called promotion model. All files go through a predefined series of statuses, and files (versions) with similar status and context (e.g. branches, directories) belong together as a configuration. These statuses are for exampe: working, integration testing, system testing, released. Tools support promotion by branching (e.g. ClearCase/UCM by deliver and rebase, or Subversion by "smart" copying of directories called branching) or by selection rules (e.g. Synergy by reconfigure property templates).

But one of the biggest shortcomings in SCM tools that I know is the absense of support for content dependencies. How do you identify the impact of the change of the design on code, requirements and tests? How do you identify the impact of code changes on other code, interfaces, design models? How to maintain the content information efficiently? How to you know that the content dependency is compromised? How do you know that release 3.2 of product X does work with release 1.2 of the framework, but not with release 1.1 of the framework? How do you know that release 6.1 of product Y cannot work with product X because it does not work with framework 1.2?

Another big shortcoming of SCM tools is that they only support control on file level. They don't control requirements, components in the design model, test cases in a test specification, tool versions (e.g. compilers, IDEs, webservices), hardware versions (e.g. 32-bit architecture). Consequently, many organizations try to capture those items in files, e.g. by creating a requirement specification document that "baselines" a set of requirements. But then again, those individual requirements - although versioned in a requirement management tool - cannot be identified as separate objects in the SCM tool, let alone that dependencies on requirements level can be identified or that individual requirements can be identified to a baseline.

The only solution that I am aware of that comes close to an "SCM tool" is the Jazz platform, starting the Rational Team Concert, but integrated with the requirements, test and project management applications. Since all information is stored in a composite repository, where information objects are actually identified as objects (not as files), it becomes possible to identify relationships (such as dependencies) between objects (not only files). Yet, I doubt whether it will be capable of identifying dependencies between configurations, e.g. content dependencies between software packages.

Rational ClearCase is not an SCM tool

If Telelogic Synergy is not an SCM tool and Subversion is not an SCM tool, it is easy to conclude that IBM Rational ClearCase is not an SCM tool either. Not even ClearCase/UCM is an SCM tool.