Sunday, January 25, 2009

Build Process Integration

Introduction

This post isn't going to be Erlang or Language oriented at all. One of my other hobbies revolves around the build process and build process tools. Over the last few years I have been spending a lot of time thinking about improving them, making the build process more transparent etc. I have a way to do that, I believe. Unfortunately, it would take the cooperation of build too implementors to get it off the ground. That, or a new implementation of the existing tools. In any case, let me describe to you what I am talking about.

Many of you may be familiar with a product called Trac. This is an open source project management tool. Actually it calls itself a 'enhanced wiki and issue tracking system for software development projects' but it includes facilities for project management, source control artifact integration, authorization and authentication etc. Its a very interesting and reasonably complete tool. However, there is one single feature that makes this product especially interesting. That feature is the ability to link between the various kinds of information that trac keeps track up. Usually this linking is done with very simple, easy to remember micro-formats. For example, you can easily link a ticket to a changeset with the notation of 'changeset:' in the ticket. Trac understands this format and will produce a click-able link when the ticket is rendered anywhere. This type of linking works between any of the artifacts stored in Trac. The other features are really just there to support that single killer feature. Unfortunately, you only get to use this feature while you are in Trac.

Thats a problem. Because this linking only works with information owned by a Trac system, Trac is forced into an 'own the world' mentality. That is if the designers want to give you the ability to link between a wiki page and an issue the product must provide functionality that implements a wiki and an issue tracking system. If it wants to give you the ability to link to artifacts in a build system it must provide a build system. This is true of any artifact that Trac, or systems, like Trac want to provide. This forces the implementors to spread their time and efforts over a range of products instead of sticking a single product and getting it right. It doesn't allow the implementors of any one product to focus on that one product. It also means that a consumer of this product doesn't have the ability to swap out parts of the product for something he may like better. This is why Trac implements a wiki, a project management system, and a source control display system. It must do this even though quite good systems already address this space in the open source world. It also means that if the developers of Trac or similar systems want to add linking to a new type of artifact they must implement the artifact in their system.

A Better Approach

There is a better approach although it will take a bit of effort to see it realized. Fundamentally this better approach is to let each system handle focus on its purpose and provide some system agnostic way to tie these artifacts together. That is, that some other system should exist that understands how artifacts are related to one another. This system would then allow users to create 'links' or relationships and query those relationships at will.

We can do this by utilizing REST based services (other ways probably work as well) and using the REST semantic. By vending an artifact at a specific unchanging url we cane provide a universally unique identifier that we can use for linking. For example, lets say we were building up a Trac like system. We put our issue artifacts in an issue tracking service at a specific url, say 'http://issues.com/api/issue/'. We also put our projects at a specific url, say
'http://projects.com/api/project/'. Both of these apis vend data according to some predetermined format. With these in place it becomes very easy to link these two artifacts together. This implicitly indicates the existence of a couple of things. First and foremost that consumable formats exist for issues and project. Secondly that there exists some means of resolving these relationships. Basically, that you somewhere to put the fact that these issues are related. I think that a separate type of system should be set up that manages these relationships. For nowe I will call that a relationship management service. So with these facts established, if we wanted to associate issue 13 with project Foo we would just create a relationship between http://issues.com/api/issue/13 and http://projects.com/api/project/foo in our relationship management service. With this service in place each individual system wouldn't need to understand linking at all nor store link information. Only clients that wanted to consume the information would need to understand linking.

There are a few and advantages here and some disadvantages. The biggest advantage is that systems don't need to understand how linking occurs with other systems. We can drop any system we want into the mix and get reasonable linking semantics. The second big advantage is we can manipulate the links in one location, traversing the entire graph of links with jumping around to each service that stores information. The disadvantage is that we have yet another service that we must manage.

How To Do It

Of course, actually accomplishing the task of building these relationships is not so simple in the pragmatic world. Several prerequisites have to be met before we can get started.

  • Systems must vend their data in some generally, consumable, system agnostic way.

  • Systems must actually vend the artifacts that other systems may find interesting.

  • There must be a uniform, unchanging way to resolve artifacts so that they can be linked to.


Each of these prerequisites are more complex then they may first appear. For example, the first prerequisite says that systems must vend their artifacts in some system agnostic way. However, in reality you want them to vend it in as simple a way as possible. You also want them to vend their data with either some type of schema or self describing data. Finally you want all of your systems to vend data in a similar way to reduce the complexity of linking and consumption. Similar issues exist with the second prerequisite. Artifacts that should be exposed may be sub artifacts of previously exposed artifacts. For example, in a project management system, projects are probably an exposed artifact but so are milestones which are lower in the hierarchy then projects. Exposing these in a consistent consumable way will require some forethought. Finally and most importantly, artifacts must be resolvable for linking to have any meaning at all. That means that each artifact must have a consistent and unchanging URI that can be used to identify and consume that artifact. Designing such an api will take some significant forethought on the part of the system implementors.

What we are essentially talking about is evolving a set of tools from a broad array of isolated stacks to a set of components that can plug in to a generic, flexible, component architecture. 'Plug in to' in this case just means forming consumable relationships between artifacts vended by each component. This is very similar to the way the familiar World Wide Web is designed. In our case, we are linking artifacts instead of documents and storing the links outside of the artifacts, but the design philosophy is very similar.

To that end, a set of principles that inform the creation and management of components and the relationships between the artifacts that are vended by those components needs to be designed. This must not and should not be some heavy handed attempt to mandate interoperability. It should be a set of general guidelines about what types of services must must be vended by a system. The relationship management should occur within a dedicated system as well.

This approach would free product developers to concentrate on a specific feature or set of features. It would then allow individuals or entities to consume artifacts as they will and forming relationships between arbitrary artifacts. This is an important meme that I think should be propagated along with the REST approach to web services.