Monday, June 08, 2009

SCM tools are not configuration management tools

The main reason why most so-called SCM tools are not really SCM tools is because they don't support managing software configurations. Making software is more than writing source code and converting them into executable code and data models for databases. A real SCM tool would be able to capture everything that is important for deploying and maintaining the software. This includes requirements, designs, models, sources, tools, infrastructure, knowledge, skills, test scripts, test data, manuals, scripts and other information.

Most SCM tools are able to capture files in a structure and control changes to the files and the structure. New files and new versions of existing files are all merely new files. Typically, the structure is 3-dimensional:

  • Directory (or folders) structure
  • Version (or revision) structure
  • Branching structure
The directory structure may seem to be a 2-dimensional structure (i.e. nested directories and directory next to eachother at the same nesting level), but if we consider the pathname + filename as the single identifier for a file, then the directory may be considered as 1-dimensional. The version structure is 1-dimensional: the successive versions supersede their predecessors. How about parallel versions? Parallel versions are partial contributions to a single succesor version. The actual successor is a merge of these partial contributions. Branches look similar to parallel versions, but the essential difference is that parallel versions are partial contributions to a single successor while branches are full contributions (version structures) to alternative successors.

If we look at the implementation of these dimensions, then the most simple implementation is 1-dimensional: all 3 dimensions are projected onto the same implementation, e.g. as directory structure (or path+filename). The "version control tool" could be an ordinary file system. For example:

main/gui/generic/foo-v1.c
main/webgui/unix/foo-v1.c
main/webgui/unix/foo-v2.c
main/webgui/unix/foo-v3.c
main/webgui/winxp/foo-v1.c
main/webgui/winxp/foo-v2.c
R1.0.0.0/gui/generic/foo-v1.c
R1.0.0.0/webgui/unix/foo-v2.c
R1.0.0.0/webgui/winxp/foo-v2.c

The problem with this version control system is that path+filenames changes for every new version. So users and the build process have to do more work to figure out which version they should us.

The next better implemention would be a 2-dimensional implementation: directory and branching are combined into a single dimension (path+filename), and versioning is the other dimension. Simple version control tools like Subversion works this way. For example:

main/gui/generic/foo.c (versions: 1)
main/webgui/unix/foo.c (versions: 1, 2 and 3)
main/webgui/winxp/foo.c (versions: 1 and 2)
R1.0.0.0/gui/generic/foo.c (versions: 1)
R1.0.0.0/webgui/unix/foo.c (versions: 2)
R1.0.0.0/webgui/winxp/foo.c (versions: 2)

Advantage is that the path+filename remains the same for all versions within a branch. But for different branches, the path+filename is different. And since directories and branches are resolved in the same dimension (the path), it is not possible to distinguish between a directory and a branch. For example, are main and R1.0.0.0 different branches? Are unix and winxp different branches? Are gui and webgui different branches? Or are they different directories within the same branch? So users have to make agreements about naming conventions to distinguish between branches. The SCM tool only takes care of deciding (automated) which version is used.

One step further is a 3-dimensional solution, where directory (path+filename), version and branch are independent of each other. More advanced version control tools like ClearCase or Synergy are needed. For example:

gui/foo.c (versions: 1 on branch: main)
webgui/foo.c (versions: 1 and on branch: main; versions: 2 and 3 on branch: unix; versions: 2 on branch winxp)

Advantage is now that the path+filename remains the same for all versions and all branches. This simplifies the implementation of an automated build process and the description in design documents and models. But the counterside is that SCM has to have the information to decide which branch a user is working on in order to select the correct version for the user to work on. And the user may be unaware of the branch he is working on - introducing the risk that he is working on the wrong branch.
So on one side SCM makes life easier for the user and the organization (e.g. automation), but on the other side it introduces extra work to reduce the risk mistakes or to repair them.

As you can see, I have left out the baseline (R1.0.0.0) from the last example. In the first and second example (1 and 2 dimensional), the baseline was combined with the directory dimension. In the last example, the baseline could be combined with the branch dimension, but it could also be implemented as a 4th dimension: labeling or tagging.

And this brings me to the point where version control enters the domain of configuration management. An essential feature of configuration management is to identify dependencies. A dependency defines which objects belong together. There are many different dependencies that can be (or need to be) identified, for example:
  • Directory dependency: all files within the same directory tree
  • Branch dependency: all files on the same branch
  • Version dependency: all latest versions
  • Status dependency: all files with the same status (e.g. release R1.0.0.0)
  • Content dependency: all files with compatible content (e.g. requirements-design-code consistency)
The status dependency is typically modeled in a so-called promotion model. All files go through a predefined series of statuses, and files (versions) with similar status and context (e.g. branches, directories) belong together as a configuration. These statuses are for exampe: working, integration testing, system testing, released. Tools support promotion by branching (e.g. ClearCase/UCM by deliver and rebase, or Subversion by "smart" copying of directories called branching) or by selection rules (e.g. Synergy by reconfigure property templates).

But one of the biggest shortcomings in SCM tools that I know is the absense of support for content dependencies. How do you identify the impact of the change of the design on code, requirements and tests? How do you identify the impact of code changes on other code, interfaces, design models? How to maintain the content information efficiently? How to you know that the content dependency is compromised? How do you know that release 3.2 of product X does work with release 1.2 of the framework, but not with release 1.1 of the framework? How do you know that release 6.1 of product Y cannot work with product X because it does not work with framework 1.2?

Another big shortcoming of SCM tools is that they only support control on file level. They don't control requirements, components in the design model, test cases in a test specification, tool versions (e.g. compilers, IDEs, webservices), hardware versions (e.g. 32-bit architecture). Consequently, many organizations try to capture those items in files, e.g. by creating a requirement specification document that "baselines" a set of requirements. But then again, those individual requirements - although versioned in a requirement management tool - cannot be identified as separate objects in the SCM tool, let alone that dependencies on requirements level can be identified or that individual requirements can be identified to a baseline.

The only solution that I am aware of that comes close to an "SCM tool" is the Jazz platform, starting the Rational Team Concert, but integrated with the requirements, test and project management applications. Since all information is stored in a composite repository, where information objects are actually identified as objects (not as files), it becomes possible to identify relationships (such as dependencies) between objects (not only files). Yet, I doubt whether it will be capable of identifying dependencies between configurations, e.g. content dependencies between software packages.

6 comments:

Anonymous said...

Hi Steve

thanks for the insight. I am just curious to know what you think are the best tools in the market current for managing SCM Data

Unknown said...

In my opinion, the RTC (Rational Team Concert) from IBM Rational on the Jazz platform is currently the most advanced and most comprehensive SCM solution.

However, I am still curious how RTC handles multiple components, especially when different part of a same system use different versions of the same component.

Project Management Software said...

Some of the tools in the project management software are applicable to several business models. They are the driving force that makes online project management software successful. It's important to know everything associated with your project whether it is a logistical problem like supplies and money, or whether it is a time tracking issue.

Nathaniel @ project management test said...

Great post! You have thoroughly explained SCM tools and it's importance.

Glad I have found this blog. Thanks for sharing!

johnson said...

A real SCM tool would be able to capture everything that is important for deploying and maintaining the software.

Project Management Software

Prajwala Perumbudur said...

Hi Frank,

Your post is really great.
May be my question is not much relevant but, is there any SCM tool with allows to draw a dependency between any 2 CIs within a system. For example, changes in requirements doc will effect test plan. If I make changes to requirement doc it show trigger test plan such that it has impact. At least the tool should list the dependent artifacts.