Wednesday, December 19, 2007

Erlware Progress Update

I have had very little time to post of late. The
other maintainers and I have been coding feverishly to get
the erlware system to version 1.0 by the end of January. This has
taken up a huge amount of my time, leaving me very little to
time to blog. It been so long now that I feel bad. So
I decided that I needed to spend a bit of time talking how
development in our new model is going.

All I can say is wow! This Open Development Model has
proven to be a huge boon to the project. Not only has it
encouraged user participation by a large margin, it has also
encouraged quite a bit more design process. Rather
spontaneously the maintainers have started to submit design
specs when they are preparing to do major changes. These design
specs are usually one pagers that give a high level overview of
the problem and describe the nature of the fix the developer is
implementing. It lets the community and the other maintainers
know and comment on whats upcoming. It also makes the developer
think more closely about the changes he is planning to make to
the system. All in all it doesn't take a large amount of time
and its helps increase the quality of our offering. Since each
design spec goes to the mailing list it fits right in with the
low level patch model. I like it.

Another major development is that we have moved our website to
the Open Development Model. Originally, we used dokuwiki for
our website. Dokuwiki is an great product, but its a wiki with
the attendant problems that a wiki has. In our case, we had kept
wiki access pretty limited. Only the core maintainers actually
had access to it. This made it hard for the community to fix
issues and expand docs. What we really wanted was to combine the
easy editing of the wiki with our, somewhat more careful, patch
based approach to changes. We spent some time thinking about the
problem and decided that it would be best if our site was
updatable, via git, just like the rest of our projects. However,
we really didn't want to hand edit html and do all the manual
work involved. We liked the readable usable wiki syntax. We
needed some mix of wiki with static source files. We went to
google with low expectations. We where pleasently surprised by
the number of good offerings in various languages. Eventually we
settled on webgen. This is a nice
little ruby framework for static site generation. It supports
various markups, including our choice Markdown. What
we ended up with is an open site with very nice wikish syntax
that is easy to extend and change via a model that our
contributors are familiar with.

Once a change is in our canonical git repo getting the changes
onto the sight is completly automated. A cronjob on our server
pulls the site from the repository and runs the generator. Then
runs rsync to push the changes into the area that they are
served from. This is a very recent change so we don't actually
know if it will accomplish our purpose of more contributions to
the site. One unexpected side benefit is that our site is now
really fast. All the work is happening at generation time so
the server just needs to serve static html files.

Thursday, October 25, 2007

Erlware and the Git Development Model

Introduction

Recently the core commiters for Erlware, mostly Martin and I, have decided to migrate Erlware's source control system from the previous Google hosted subversion repository to self-hosted git. Linus' Tech Talk on Git convinced us that this was a good idea. This isn't just a source control change, but a methodology change. It changes the way code gets from it's source (the Erlware community) into Erlware itself. The main things we really want to get out of this migration is

  1. Greater visibility into the code base for participants
  2. Increased code quality
  3. Faster, cleaner incremental development

It's yet to be seen whether we will achieve our third goal, but we have already achieved the first two, which is encouraging.

We have had a few false starts. That is just to be expected. When we first started, we took the easy default and just replaced the central subversion repository with a git repository. We then treated that git repository as if it were just another subversion repo. That makes for a nasty commit history and it doesn't leverage git's new development model very well at all. So I started searching for information about how to do a large project in git in a truly distributed fashion. Let me tell you, there just isn't that much information out there about organising a project around git. Its just not there! Some people are working on that, I hear. In any case, I finally pulled aside a friend (hey Scott!), who was familiar with git, and asked them how this whole distributed development thing is supposed to work. Much to my amazement he actually knew! He could even explain it to a dim bulb like me!

What follows describes how we are implementing git in Erlware. It is based on what Scott told me, lurking on the git mailing list and widely varied sources out there on the interweb.

The Development Model

The model we are using is actually dirt-simple. Each person has one or more personal git repositories. This is where they do their hacking and keep track of their commit history. There is one canonical repository. However, this canonical repository never has commits pushed to it directly. To get some change into the canonical repository (and into the project) you have to send a patch representing your change to the erlware-dev@googlegroups.com mailing list. This means that every single change is pushed out and viewed by everyone on the mailing list. If we foster the community well, people will give good feedback and code reviews. Hopefully, we can build a community that is as engaged as the community that surrounds git itself.

In any case, once the patch arrives on the list, gets commented upon, changed as required, etc, it will be applied to the canonical repository and become part of Erlware. Think about this for a second, every change to Erlware goes through the Erlware dev list as a patch. Every member of the community has a chance to comment, critique and discuss it. The direction of the project is obvious to anyone who has access to the mailing list, which is anyone that wants access. It makes it extraordinarily easy for anyone to contribute to the project without ever having to code. They can just subscribe to the list and provide their knowledge and insight on the code passing through to the implementers of the patch. Of course, the commiters have the final say into what actually makes it into the canonical repository. However, the community has a huge amount of leeway in making sure that the code is correct and of the highest quality.


You may think that having to submit a patch would surely slow down development on the project. However, you would be wrong. Each developer has his own personal repository with which he can do anything he wants. We encourage developers, and require commiters, to make their repository publicly available. So that developers and anyone working with that developer can have easy access to each other's code. Their development velocity can be anything they are comfortable with. When they finally have something that they think is ready for commit to the canonical repository, they can create patches and submit them. Of course, they will need to spend some time actually refactoring their changes into a nice set of small interrelated patches. This, though, is time well spent and will give them one final chance to refactor their code.

Overall, this should be a huge boost to our productivity as a project and our transparency as project leaders.

The Nuts and Bolts

Actually getting this entire thing set up isn't a trivial project. You need to have some public place to put your git repo, http://git.erlware.org/git or git://git.erlware.org/git in our case). You need to set up that repo with, at the very least, an http server fronting it. You should probably also set up the git-daemon. It makes cloning a git repository very, very fast. Much faster then cloning a repository over http. git-daemon doesn't really have any idea of permissions or users though. So you should set this up for read access only. You do that by making sure that the user git-daemon is running as doesn't have authority to write to any of the files in the repo.

Setting Up a Git Repo


These instructions work whether you are setting up your own repository or the canonical repository.

1) Create a directory for your repos. You can just use mkdir for this, though if you are going to have multiple submitters you probably want to set the sticky group bit on the directory.


$> mkdir repo_dir
$> chown me:group_that_every_commiter_is_in repo_dir
$> chmod g+s repo_dir
$> cd repo_dir
$> mkdir project_git_dir
$> cd project_git_dir
$> git --bare init

This is going to act as a public server, so we want to enable a bit of index generation. To do that we simply make the post_update hook executable. Git will do the right thing with that.

$> chmod a+x hooks/post_update

If you look in post update you will see the command 'git-update-server-info'. This command allows git to update the indices that it needs when cloning over a 'dumb' protocol like http.

Point your http server at the repo and you are done! If you want to do some fancy stuff like send an email on every commit then you need to get then copy the post_receive_email script from the contrib/hooks directory of the git source tar ball to hooks/post_receive in your git repo. Then make that file executable. You need to fiddle with the config a bit, but the instructions for how to do that are in the file itself so there isn't any need to repeat it here.

$> cp /contrib/hooks/post_receive_email ./hooks/post-receive
$> chmod a+x ./hooks/post-receive

Working as a Member of The Project

Working with a git project means that there are three commands that you are going to be using quite a lot. These are git-format-patch, git-send-email, and git-am. Getting good with these commands will let you interact well with Erlware and any other git based project.

git-format-patch takes some command line options to detail which commits you want to turn into patches and then writes the appropriate patches out a set of files, one patch per commit. Its dirt simple, though you will need to learn how to structure your commits in a reasonable way. This isn't hard, but it is a bit involved so I will save that process for another blog. In any case, you will learn how to do it pretty quick once you start supplying patches.

git-send-email lets you take the patches created by git-format-patch and send them to a specified email address. The process is very well documented in the man page linked above so there really isn't any need to go into much detail. One note though, if you are a gmail user and don't already have sendmail or procmail setup to use gmail, I suggest you use msmtp. There are detail instructions on how to do it on the GitTips page of the GitWiki. If you are as blunt as I am, following the instructions there will save you a huge amount of time.

Finally, git-am allows you to take patches from the mailing list and apply them to your local git repository. It understands both mbox format and raw patches generated by git-format-patch. This makes it pretty damn useful. You can set up a pull from your mail server to a local mbox and then just run git-am on it. You can also just pull the patches manually and run git-am on those. The whole process is really well thought out and very, very simple.

The only real problem that I have had so far is that git-send-email doesn't let you prepend any information to the subject line of the sent email. They all end up with [PATCH] . This would be fine for most projects but we run several projects under Erlware and it would be really nice if we could do something so that the subject looked something like [PROJECT][PATCH] . Anything that would let us indicate the project would be great. When and if I get some time I intend to remedy this and send a patch back to the git community.


Conclusion


That's it. No doubt I have missed a huge number of things. Hopefully, the commenters will be nitpicky and point them out so I can fix the problems. I am really excited about this new model and expect it to do some really awesome things for our project. If it doesn't do anything but spread knowledge about the codebase to all our commiters I will be overjoyed.

Saturday, October 6, 2007

New faxien

So its been a little bit since I posted. We have been working hard and heavy on several changes to the erlware repo formats. One of the big time sinks in this project has been writing a bootstrapper/otp release launcher for faxien. Well let me break out the two before I go into details.

The Repository
A few months ago we made some changes to the repository to reduce the complexity and reduce the total number of release versions we had to keep around. Originally we organized the OTP apps in our repository around the erts version number, major, minor and patch. This worked but it meant we had a lot of copies of the same apps laying around. We had hoped that we would be able to organize it around the erts major, minor version instead. This would have saved a huge amount of space. so we made some changes in support of that. Well after we made those changes we found out that the patch version number wasn't as unimportant as we thought. In fact, among other things it seems that the OTP guys feel free to modify the wire protocol in patch versions. They also change the magic version numbers in the c lib so that they wont communicate with with an erts with a patch version different then what they where compiled for. What this means for us is that we had to go back to supporting major, minor and patch versions. We also changed the way release packages are stored in the repo. Not much, but enough to require some code changes. All this took more time then we would have liked.

The Faxien Bootstrapper
The other problem we had is around matching erts versions with releases that faxien pulls down. In the past we used whatever version or erts/erlang was available on the local box. This caused problems if faxien was built for a version of erts that was greater then the one present on the box. Of course, this would be a problem for any and every bit of erlang code that faxien pulls down as well. So we had to come up with a way to pull down an erts to run on as well as the code to run. This meant that we had to come up with some way to bootstrap the system. After much thought and quite a few experiments we decided to write a minimal bootstrapper in ocaml. What this means is that folks can download a small binary that will pull down the required erts version, faxien and all its dependencies. The bootstrapper will then launch faxien to complete the install process.

With this approach the user doesn't need to have erlang on their system at all. They just pull down 'faxien' and it pulls down everything thats required. Thats pretty cool. It also only pulls whats needed so instead of going out and getting 20 megs of erlang distribution they will get what they need and just what they need.

We put in a few other nifty features to make command line OTP releases easier and to make cross platform OTP launch scripts easer to write. However, I will talk about those in a different post.

Tuesday, July 10, 2007

Sinan 0.8.4 Alpha is Out

The latest version of Sinan is out. This version adds support for spawning a new Erlang node with shell for the current project. This makes debugging and exploratory programming much, much easier. It also spawns the node with an sname 'sinan_shell' so that the new shell will interact well with distel. This is a change that a lot of people have asked for and I am pleased to be able to integrate it.

I am still having issues with the analyze task that I am working through. However, I hope to have it fixed and working shortly.

Monday, June 25, 2007

Distributed Bug Tracking - Again

This is a bit of a clarification and expansion on a previous topic of this blog. To recap what I was talking about was a distributed issue tracking system making use of an underlying distributed version control system for its versioning, but augmented by command line tools that support necessary issue tracking features; like searching, merging, etc. Distributed issue tracking is a very new thing and has a few hurtles to overcome. I am going to talk about what these hurtles are and offer some ideas on how to overcome them.

In the comments of the last blog post on this topic Alex and I had a reasonably long conversation around merging. He ended up posting his thoughts here. Alex has some interesting and useful ideas, though we differ in some specifics.

Merging and Discovery

In the last article I used the term 'merging' in a ambiguous way. I used it to refer to both merging two issues into one another and finding the duplicate issues. For the rest of this article I am going to refer to merging as merging two issues and discovery as finding issue duplications. This should reduce the ambiguity a bit.

Merging Multiple Changes To The Same Issue

Merging is actually a pretty strait forward concept. I think you can treat issues the same way you treat a source file when a merge conflict occurs. By automatically merging what you can and allowing the user to resolve conflicts manually you get reasonable merge behavior with a high probability of a correct result. There is some overhead for the user but, as with source changes, it shouldn't be onerous.

Merging Two Issues Into A Single Issue

This problem is slightly more complex but its really just an extension of the last topic we talked about. In this case we just apply that merge algorithm to two disparate issues instead of two versions of a single issue. There may be some ambiguity around which issue becomes the canonical issue and how to merge history for these two files, however, these issues are mostly solved in distributed version control systems and those solutions would work just fine in this instance as well.

Discovery

Discovery is by far the most complex issue here and its a problem that occurs in any issue tracking system. Unfortunately, in a distributed issue tracking system the problem has the potential to be much much worse then in an issue tracking system with a central repository. This is due to the fact that each and every user has his own canonical version of the issue repository. For example: User Y sees a bug in the system and enters Issue X to describe it and User Z sees the exact same bug at a similar time and enters Issue W to describe it. Because User X and Z both have canonical versions of the issue repository and they have yet to sync their repositories there is no way for either user to detect that a issue has already been created for that bug. So when they replicate suddenly there are multiple issues in both repositories.

In more normal issue tracking systems this can be mitigated to some extent by encouraging your users to search for existing bugs first and having people familiar with the issue repository reviewing new issues as they are entered. However, this approach wont work with a distributed issue tracking system because each user has a private canonical set until he syncs with some other user. I believe that this problem will be one of the fundamental problems that will plague new distributed issue tracking system for some time.

There are ways to mitigate this. There are very good document similarity algorithms out there and applying them to this problem wouldn't be too difficult. Unfortunately, the text associated with issues tends to be very short and this doesn't give these similarity algorithms much room to work. There are ways we can mitigate these problems though.

First we can reduce the total document corpus by using attributes of the issue to subset the issues for similarity searching. For example, we might only search for similarities within issues that have a specific component tag. Actually, generalizing this statement we can just use emantic properties of the issue to subset the issues that we need to process for the similarity search.

Second we need to give the user a fast and easy way to run through the output of a similarity search and approve/disapprove the merge. This should be something that allows the user to view the issues side by side and hit a single button or key combination to approve or disprove the merge, then the next set pops up. This cycle would allow the user to quickly move through all possible matches. If this worked well we may be able to loosen the similarity constraints a bit to allow for more matches.

Hopefully, a combination of approaches will make the discovery issue more tractable.

Referencing

The third and last major problem (there are undoubtedly others that I can't think of right now) is simple referencing. There needs to be a way to reference an issue regardless of the repository its on or where it was created. The easiest way to do that would be to make use of simple UUIDs for issue identifiers. They are a bit unwieldy but their inherent uniqueness makes them usable for our purposes. We can reduce the level of pain in using UUIDs manually by allowing the user to specify the unique part of a UUID in the tools that support this distributed issue tracking system. I think monotone and, maybe git, allow something similar for change set identifiers.

Tuesday, May 29, 2007

The Shape Of Your Mind

In several cultures through out history, like the Karen Paduang of Southeast Asia, the ancient Han of China and the ancient native tribes of the Paracas region of Peru, various body modification practices where quite common. In fact, among the members of these cultures unusual body shapes were (and in some cases still are) considered very beautiful. Parents went to great lengths to achieve these shapes using mechanical devices to mold an infant's growth into a particular shape. For example, among the Karen Paduang this procedure resulted in women with very, very long necks. In the case of the Han it was women with very tiny, actually unusable feet. For the natives of the Paracas region of Peru it was large cone shaped skulls. Although grotesque by Western standards, the individual subjected to these procedures was considered significantly more beautiful then a person shaped along more natural lines. However, if you removed these people from their culture and time to cast them into many modern societies they would be considered very, very strange at the least and grotesque at worst.

Fascinating, you say, but what does this have to do with computer science and more specifically languages? In many cases, the communities that form around languages are very similar to the insular communities that created these practices. To take this analogy one step further, in many ways programming languages act quite a lot like the devices used to shape the skulls of infants in Paracas, the feet of Han women or the necks of Karen Paduang women. In the case of these languages, instead of shaping the skull they tend to shape the way we think about problems, the way we form ideas and the way those ideas are applied to a particular problem. For example, if you have only every written code in early versions of Fortran (Fortran 77 and before) then you probably don't even know recursion exists. Also, if you have only ever coded in Haskell you probably would know very little about imperative style loops.

If I might take a bit my own history as an example, very early in my forays into coding I came across a problem requiring a search through a directory structure. I quickly came up with an imperative-loop-based solution that did the job. Unfortunately, the code was ugly and (if I may borrow a term from the refactoring community) it didn't quite smell right. I didn't know what the right solution was but I knew I didn't have it. At that time, the public internet was a new fangled thing but I had already found it to be useful for gathering information. So I searched for solutions that others had found to similar problems. Eventually, I came across a piece of code that used recursion to walk the tree instead of imperative looping. It took me a little while to get my mind around this new concept, having never been exposed to non-imperative code. However, once I did, I realized that the recursive solution was a much cleaner and more natural solution for this problem. Recursion isn't always a more natural solution then imperative iteration, but in this case it was. That made me think about this type of problem in a completely different light. It added another tool to my toolbox. If I had been exposed to functional languages before this point, it would have saved me a great deal of time and trouble.

This is a minor example, but it does help prove a point: what a language allows and does not allow affects how you think about problems. This is an extremely important realization because it means that, with certain caveats, the more programming languages you know the more insight you may have into the solution to a particular coding problem. So, if you accept the fact that languages tend to mold your way of thinking about problems then you can easily see how languages can be compared to these body modification devices we spoke of earlier. If Programing Languages can be compared to these devices then we, the users of programming languages, can be compared to the subjects that undergo modification.

Granted, the comparison is not exact. As engineers, we don't start out with a nice round head. We have to work to achieve it using the same tools that provided the initial distortion. To elaborate, we all start out learning a single language and that language affects the shapes our 'mind' so to speak. This initial language pushes our mind out in one direction, perhaps upward. So after we learn a single language most of us are walking around with a big cone shaped mind.

Unfortunately, many of us never go on to learn any other languages. We keep our big cone shaped mind for the entirety of our career. That may not be a bad thing. If you are solidly embedded in a particular 'language culture' then cone-shaped minds are probably considered quite beautiful. In fact, you may be considered some type of elder because of the cone-iness of your mind. We tend to call these people Gurus and they are deserving of some respect.

These cone-shaped minds are probably not considered all that beautiful out side of their specific 'language culture'. C gurus aren't going to be very useful in Scheme community and Scheme gurus aren't going to be very useful to the C community. That's bad because both languages have ideas and features that are generally useful to understand. Those Cone minds that keep to a single language throughout their entire career are never realize their full potential. That's unfortunate because a programmer with a well-shaped mind is generally more efficient and better able to find the most elegant solution to a problem. In fact, he ceases to be a programmer and becomes an Engineer. If he is diligent and studies hard he may even become a Good Engineer.

Now, about this time you are probably thinking to yourself, 'I don't really like the idea of walking around with a big cone-shaped mind.' If thats the case, great! Fortunately, unlike the natives of Paracas, you can do something about they way your algorithmic 'mind' is shaped. How do you go about reshaping your mind? Well, it's not a simple process, you basically need to force your mind into a new shape using the devices that warped your mind in the first place. You must learn more and distinctly different languages. Each language forces your mind to grow in a different direction. Learn enough languages and your mind will have a nice round shape.

So how many languages do you have to learn and what languages are the best? There is no fixed number. I usually suggest five as a minimum number and recommend ten or fifteen. That may sound like a lot but after the first couple languages picking up new ones starts becoming much easier. In that regard its a little like picking up a new spoken language. For example, if you know Spanish then Portuguese isn't all that hard. If you know Spanish and Portuguese, then Italian is pretty simple. If you know Spanish, Portuguese and Italian, then picking up French is a snap. This goes on and on. In the case of programing languages there are a small number of additional rules you need to apply to get the most out of this process, but the overall process is the same. The additional rules are listed below.

1. Each language must come from a different family oflanguages. See this history of languages information.

2. All three major paradigms (Procedural, Object Oriented, and Functional) must be covered.

3. At least two minor paradigms (Concurrent and Logic/Declarative) must be covered.

4. Both static typing and dynamic typing need to be represented.

If you have never heard of functional programing and don't have a clue what procedural means don't worry. I will help you out a bit by providing a list of languages broken down by family and paradigm at the end of this little missive. As you learn new languages you will soon have a good idea of the different families of languages and the paradigms they represent. Before you know it, you will know quite a few different languages and you'll be able to think about problems from many different angles and your mind will have a nice round shape. You will be able think clearly about a programing problem in any number of ways instead of the small number of ways your previous cone-shaped mind allowed.

Dave Thomas of Pragmatic Programmers (not Wendy's) fame came up with the idea of learning a language each year. They started a group to accomplish this back in 2002. Unfortunately it seems to have started and stopped all in that same year.


The Languages


A language breakdown by family is available on the 'History of Programming Languages' information.

As for the breakdown by type, I am not going to try to do this for every language available. So I am just going to give you a list of ten of fifteen programming languages broken down according to the rules I provided previously. This should give you enough of a group to pick five that interest you.

Descriptions are arranged as follows ([paradigms], Typing, Family). If family doesn't exist assume the language is in its own family.

Erlang ([Functional, Concurrent, Distributed], Dynamic Typing)
Forth or Postscript (Dynamic Typing)
Mercury ([Logic, Declarative], Dynamic Typing)
Prolog ([Logic, Declarative], Dynamic Typing)
Mozart-Oz ([Functional, Procedural, Object Oriented, Logic, Distributed, Concurrent], Dynamic Typing)
Lisp ([Functional, Procedural, Object Oriented, Logic], Dynamic
Typing, Lisp)
Scheme ([Functional, Object Oriented, Logic], Dynamic Typing, Lisp)
Ada ([Procedural, Object Oriented, Concurrent], Static Typing, Pascal) (Another resource)
Python ([Procedural, Object Oriented, Functional], Dynamic Typing)
Haskell ([Functional, Lazy], Static Typing)
Lua ([Procedural, Object Oriented], Dynamic Typing)
Ruby ([Object Oriented], Dynamic Typing, Smalltalk)
Smalltalk ([Object Oriented], Dynamic Typing, Smalltalk)
SML ([Functional], Static Typing, SML)
Ocaml ([Functional], Static Typing, SML)
Clean ([Functional], Static Typing)
D ([Procedural, Object Oriented], Static Typing, Algol)

I didn't include the languages that are common (C,C++,Perl,Java) because there is a good chance you already know them. They also don't count for the purposes of this exercise (C,C++, and Java are all part of the Algol family and would only count once anyway). Feel free to choose other languages that you may be aware of and find interesting. This list is only a 'Getting Started' list.

I strongly suggest that you learn a Lisp dialect and a Forth. These two languages are very good at shaping your mind and the two languages specifically tend to force the shape of your mind in opposite directions. Its a somewhat painful process but well worth the quick results. At the very least make sure that one of these languages is included on your list.

Footnotes

Thanks to a comment by Vince I have found that I am not the only one thinking along these lines, not that I actually thought I was. In the linguistics community there seems to be a hypothesis call the Sapir–Whorf hypothesis
that describes something similar. Also Kenneth Iverson gave his Turing Award lecture around the same topic. It was called "Notation as a tool of thought". Unfortunately, I can't seem to find a good link for it right now.

Friday, May 18, 2007

Off Line Development

After numerous requests I have finally implemented off-line development. You no longer have to be connected to build your projects. In the past the dependency analysis task ran every build and if it thought that there was a chance that the dependencies had changed it connected to the repository to check the dependencies. This no longer happens. There is now a check_depends task that checks to make sure that dependencies have been run at some time in the past. It then checks if the dependencies need to be updated. If the dependencies do need to be updated it asks the user if he wants to update the dependencies (by connecting to the server). If the answer is affirmative then the update occurs if not then it continues with the existing dependencies. The user may run dependencies at any time by running the depends task directly. This approach gives users much, much more control over when and how dependencies are resolved. It also allows the user to control when and how the build system connects to the repository. I hope I have done this in such a way as to not add any additional burden to the user.

Wednesday, May 2, 2007

Distributed Bug Tracking

I came across a project called DisTract last week. DisTract is basically a distributed bug tracking system. Specifically, its a file based bug tracking system who's directory structure sits inside of a repository managed by a distributed version control system. It uses this distributed version control system to manage distribution. This system was preceded by another, somewhat bit rotted, system called Bugs Everywhere.

I really like the idea of distributed bug tracking. It fits in really well with distributed version control and over all distributed development. Being able to create new branches for your bug system along with it source. Merge those branches back into mainline all the while keeping history etc. Thats a very, very powerful thing. However, this approach has a fundamental, maybe intractable, problem when it comes to merging. In a version control system merging is relatively simple. Files identifiers (basically the path in the workspace) are fixed there is no ambiguity if the same file is changed on two different boxes. Its just a matter of merging the contents of that file. Distributed bug tracking still has the problem of merging changed bugs. However, it has an additional problem. That is the fact that there is now why to relate on bug created by one person to a different bug created by another person. This is a problem that every bug tracking system has to some extent. In a distributed bug tracking system its even worse because the bugs created by another person wont be visible until they get a push or do a pull from that other persons repository. There may be a way to solve the problem using some of the modern document similarity algorithms. However, considering the small amount of text usually supplied with a bug report this is probably unworkable. I don't have a solution to this yet but I may play around with some ideas and use some information form some of the larger public repositories to run some tests.

Monday, April 23, 2007

Erlang and the Web

I have been spending a lot of time thinking about leveraging Erlang's concurrency features and OTP in web applications. Right now I don't believe that any of the available frameworks do that very well. Erlyweb tries to be 'rails' for Erlang without really leveraging the features that make Erlang great. Yaws is more a web server then a web app server. It tries to make some amends here by providing things like appmods and yapps but they feel bolted on and they don't really leverage OTP at all.

This has been an open issue in my mind for some time. I created the tercio project as a starting point to solve this problem back in November or December of last year. It languished for awhile, partly because I had yet to figure out a elegant solution. Well, I think that I finally have. Its the logical conclusion of current web development trends. I am surprised that no one has thought of it yet. The idea is two fold.

1) Let client side handle the client side
2) Let the server side handle the server side

The general idea is that you will let the client side handle all client side rendering. The only server side participation in this is serving up files. In turn, the server will handle all server side (business) logic. The two should really have very little knowledge of one another. Hmm, seems a bit too simplistic doesn't it? I thought so, before I realized that the client side in this continuum already has a perfectly good language on which to base things. That languages is
javascript. I can hear the groans of dismay already. I emitted those very same groans back in the late '90s through the mid '00s and I wouldn't have even considered this two years ago. However, the landscape has changed alot in two short years. Ajax has gained prominence, libraries like prototype have been created. Its just a whole different world. We are already in good shape for the server side with OTP.

So what are the mechanics of making this happen. First we need to get away from generating html on the server side. To do that we need to make it easy to generate it on the client side. Manually creating dom objects really isn't the right way to go. I think we can do this with a library called JavascriptTemplates. This provides a reasonable templating language on top of javascript. Since each snippet of template will be small this should provide a reasonable efficient way to go. The second problem is how do we remove the client side knowledge from the server? I think tercio already does a good job here. It provides a javascript<->erlang bridge. With this javascript can send and receive messages to processes that have registered interest in client side message no the backend. Its webserver agnostic, so by providing a small shim you can make it work with any webserver you want. I am building a small startup on top of this framework. I am quite sure there will be a lot of issues to work out. However, I think the fundamental principles are solid and should provide for the right web development experience in Erlang.

Tercio isn't yet ready for prime time or even late night yet. It should be soon though so keep your eyes posted here.

Sunday, April 8, 2007

Build Flavors

I had quite a few requests to support different types of builds within the same project. Usually the request centered around being able to do 'development' and 'release' builds. In these two cases, development would enable debugging information and unit tests while release would strip them out. Providing static development and release build flavors wouldn't really solve the underlaying problem, which is the need to parameterize the build process. I ended up solving this by adding a 'flavors' option to the build config and having the tasks take arguments. Together these two features should allow a pretty wide range of build customizations.

Following is the flavors entry in the default build config. Its pretty self explanatory, 'default_flavor' indicates which flavor should be used when the user doesn't specify a flavor. 'flavors' is assigned to the build flavor definition. Within each flavor you assign an argument to each build task.


default_flavor: development,

flavors : {

development : {
build : "+debug_info -W1"
},

release : {
build : "-DNOTEST=1 -W1"
}


},


Right now only the build and test tasks take arguments. However, over time the other tasks that need arguments will take them as well. On a side note, I have created a module that trys to parse out erlc arguments.

Saturday, March 31, 2007

Busy Times

I have had a bit of a perfect storm this week. I decided to upgrade my Ubuntu to feisty. Although the upgrade went very nicely it took a couple of days to get the configuration the way I like it. It always does. I tend to tweak my setup to no end. Well after all that the OTP folks went and released the new version of Erlang so it took a few days to get the repository updated. At last, though, things have settled down and I am able to get back into sinan development. So I should be releasing a couple of new versions of the alpha over the next few days.

On a side note, Ubuntu feisty just rocks. It has, by far, the best wireless support that I have seen any any distribution. That alone makes it worth the upgrade. If you haven't already done it I suggest that you do the upgrade.

Sunday, March 18, 2007

Full OTP Migration

I just finished migrating the sinan to a full OTP application. It already followed all of the OTP principles it just wasn't setup to run as an application. I ran into a few issues that encouraged me to do the conversion. It wasn't actually difficult, as I said , I already followed OTP for the most part. It was mostly a matter of turning the tasks into gen_servers and figuring out a way for them to work together in a meaningful way. For now there isn't much difference for the user, but it sets things up so that I can rapidly iterate on the current outstanding issues.

The hardest part of all of this was getting the error_logger logging set up right. A lot more loggers then I suspected are involved from the get go. Kernel sets up a very primitive error_logger to start. It also sets up a slightly better tty logger. Then sasl sets up its own set of loggers. Figuring out where this was coming from, getting rid of the loggers and adding the custom logger was more difficult then I actually expected. In the end I got it done via a combination of configs for kernel and sasl and actually removing the primitive logger via the gen_event api.

In any case, I should be able to start knocking my open issues. Thanks to the beta testers for providing the feedback.

Thursday, March 15, 2007

More on Configuration

The first, lightly tested, version of fconf is out. At the moment its undocumented, but that should change pretty quickly. Its a otp app that supports multiple simultaneous configurations and merging configs together. It supports everything that I talked about in my last post. Sinan has already been ported over to fconf from its built in configuration system.

Monday, March 12, 2007

Configurations

OTP applications have their own configuration mechanism in the app config structure. However, this doesn't always suit every applications needs. Currently, I have two distinct config mechanisms right now, one for sinan and one for tercio. They share a lot of similarities and I suspect other applications share similar needs. To that end I have split the config system out into its own project, fconf with the intention of using it in both projects. There isn't much out there yet but I should have the config subsystem pulled out of sinan and refactored in the next day or so. There are a few features that I want to support.


  • Reloadable config files (maybe auto reloaded)

  • Config file syntax agnostic; use the syntax that is right for your user community

  • Good, well defined override semantics



Of course, it all needs to be robust and scalable as well. The existing sinan config subsystem supplies most of this. It needs to be converted over to gen_server, create a supervision tree, and convert it to an OTP application.

Thursday, March 8, 2007

First Version Out to Beta Testers!

I just sent the first really usable version of sinan out to the beta testers who registered their interest. I expect there to plenty of issues. However, the fact that the product is out and being used is great. I can't wait to see the feedback, both positive and negative, that should come in. I will try to relay some of the relevant bits here on this blog.

Saturday, March 3, 2007

Dependencies Again

Just when you think you are finished something pops up and bites you. As I was getting ready to do the release to my beta customers I noticed that some of my unit tests in the dependency code no longer passed. I had made some changes the week before to alter conflict reporting and I guess I for got to run the tests. Well I jumped in and started debugging. An hour or so latter whop-a-mole with bugs I realized that the core algorithm was wrong. Leaky edge cases are almost always a symptom of an underlying fault in the logic. In almost all of these cases about the only thing you can do is through out the existing implementation and start fresh.

Well, in this case, I wrestled with a creating a new solution with little or no luck. Eventually, I went to lunch with some friends to talk out the problem. We, or more specifically Scott Parish, realized that at its core this is a backtracing problem very similar to that Prolog was designed to handle. Fortunately, he had just spent the last month living and breathing a similar backtracing problem and volunteered to code up a solution. He took the core of his solution from chapters 11 and 12 of PAIP. It seems that Erlang, due to its immutability, makes for a very good platform for these types of issues. In any case, Later on that night he sent me the solution that turned out to be a special cut down, special purpose prolog interpreter. Even then it was a complete, fast solution that occupied no more then a hundred lines of code. I am working on getting it integrated back into the build system as I write this. Its amazing how simple and concise an elegant solution can be.

Wednesday, February 28, 2007

Open Beta for Sinan

So we have moved right along and we are very close to release. Before we do a general release we decided to do something of a small beta. We are trying to get together a group of solid people who know Erlang that are interested in using the system. They shouldn't mind going to a bit of extra trouble to provide us with useful information. They also need to be able to put up with potential issues that might arise from using the new build system. If you want to participate either post a comment to this blog letting me know or send an email to me or Martin Logan. You can find both of our email addresses in the erlang-questions archives. Once you do that we will provide you with some nice tarballs and pointers to the documentation.

Tuesday, February 20, 2007

Windows Support

So far support for Windows has been, at most, an after thought for me. I own only one rather old windows box that I use for the occasional gaming. For that reason, windows just isn't a big priority. However, after a discussion with Martin Logan we came to the conclusion that sinan will need to support windows from the initial release. Fortunately, I have used the filename and filelib modules through out the implementation. That should remove any path name issues between the two systems. Unfortunately, I use symlinks pretty heavily through out the build. I have also used two very unix specific os commands as part of development, uname and tar. At first I thought that replacing these commands would pose a big problem. That proves not to be the case as the stdlib and kernel applications provide for my needs, though that functionality is pretty well hidden.

To figure out what platform and architecture I am on I use uname. I need this to pull down the correct version of binary dependencies from the repository. I didn't see any real solution to replace this command. I finally ran across a reference to erlang:system_info(system_architecture). This should solve my needs pretty well. Unfortunately, it only works in R9 and above. So, as you would expect, its a trade off. If I use this I can make the system work in windows but not in pre R9 systems. I think the right choice is to make it work in windows. Hopefully a pre R9 solution will present itself.

I also thought tar would present more of a problem as tar isn't a windows command. There is an erl_tar module in stdlib but it claims to support only Sun's version of tar. However, on examination of the docs for both erl_tar and gnutar this seems not to be the case. The format followed by erl_tar is IEEE Std 1003.1. This is an extended version of tar called ustar and its a POSIX standard. It seems that modern versions of gnutar support IEEE Std 1003.1 as well, as you can see here (check the references at the bottom of the page). So the erlang docs seem either in accurate or out of date, fortunately for me.

Symlinks are a bit of a harder problem. The need for symlinks is greatly reduced now that I don't have to build up tar-able structures. However, I still use them to build up a deployable version of the OTP apps with sources in the _build directory. I think that the only solution to this problem is to copy the sources and related files into the binary structures instead of symlinking them in. It will add a bit of overhead but I think thats worth it.

I do have one final problem, this one just occurred to me. I have written a nice little shell script to kick off a build. This shell script uses a bit of magic (borrowed from firefox) to follow any symlinks back to its deployed area where it gathers the paths for sinan. It then kicks off erl with all the proper paths to the sinan code. Not being a windows person I have no idea how to handle this on windows boxes. Hopefully, some erlang oriented windows guy will step forward and help me out with this.

Wednesday, February 14, 2007

Sinan Documentation

The documentation for Sinan goes apace. I have finished all of the user level documentation and have started the developer documentation.

Why worry about developer documentation? Sinan is a very pluggable system. Most of the current functionality that ships with the system is composed of tasks that are plugged into the engine. Third party tasks will use the exact same mechanism and be first class parts of the system right along with them. I think this is pretty darn important and I want to support it right out of the box. Thats why I am spending a bit of extra time and getting the developer documentation out with the user docs.

Monday, February 5, 2007

Code Complete!

I finished up the last few tasks over the weekend. This means that all the coding is pretty complete at this point. I still need to document the internals a bit more, so third party tasks can be written. I also need to spend some time writing high level user documentation so getting up to speed on the system is strait forward. All this will all probably take me a week or so. Hopefully, I can then do a true release.

On a side note, one of the things that I ended up doing was adding code coverage metrics to the unit test running. The cover module made this possible. Unfortunately, it only provides the number of runs for each line of executable code without any information about global coverage percentages. When I get some time I am going to poke around in system and see if I can come up with a way to get the executable line count from a module. If I can get that information then providing high level project statistics wont be so bad.

Wednesday, January 31, 2007

More Progress!

I resurrected the test, edoc, clean, and release tasks tonight. Everything works as it should with the exception of edoc. There seems to be a conflict between edoc and xmerl in the newest version of erlang (R11B-3). I hope to figure out what the problem is and resolve it soon.

On a side note. I came across the cover module in the tools application of Erlang. Since I was working on the unit tests task at the time it seemed like a good idea to add this functionality to the build system either via a code coverage task or integration into the existing unit tests task. The only real issue is that all the dependent modules need to be recompiled with special coverage information to make cover work. This adds quite a bit of complexity to the feature. In any case, its on my list of new tasks to add.

I added one task additional, an analyze tasks. It seems the dialyzer folks finally made it controllable from erlang code. This allowed me to integrate dialyzer into sinan. Now running dialyzer will be as simple as running a build task. Hopefully, if dialyzer is easier to use it will get used more often.

Building Again!

I finally finished refactoring the build task tonight. For the first time the latest version of the build system is actually building source. It took a huge amount of effort to get the dependent tasks to this point. The engine, discover, depends, the repo puller, all of it was just preparation for the builder. In any case, I am very pleased.

For the moment it just builds *.erl and *.yrl but eventually I want to support all of the erlang compilables. I suspect that refactoring the rest of the tasks (test, release, tar, clean, etc) wont take much longer. Hopefully, I will be able to do an alpha release at the end of this week. I certainly hope so in any case.

Friday, January 26, 2007

Topological Sort Joy

I finally modified sinan to make use of all the dependency analysis code I wrote for ewrepo. One thing that I needed to do was produce a list, in the dependent order, of the graph of dependencies in the internal project applications. This is essential for compile time things like parse transforms are going to work correctly. I wrestled with how to do this correctly for quite a bit before I realized that its a simple topological sort. I should really have realized this immediately, but I didn't. Fortunately, this isn't a difficult algorithm, especially since I didn't even need to implement it. Joe Armstrong wrote a topo sort for his ermake project and made it available in the contribs section of erlang.org. Once I found that it didn't take to long to integrate it into sinan.

Once I got to thinking about this dependency problem I realized that there was at least one more area that a topological sort was needed. Thats in the run order of the task list. Each task may have any number of tasks that it depends on. Right now my code would just run each dependent task multiple times. Thats a bug, fortunately it wont take more then a few minutes to integrate the topo sort into this area as well.

Tuesday, January 23, 2007

Dependency Checking Complete!!

I finally finished up both the shallow and the deep dependency resolution for sinan. This took a bit of time and effort to conceptualize, abstract and implement.

Now given a list of apps and the dependencies (along with the url(s) of the repo); like this

[{app1, "0.1", [{edoc, 'LATEST'},
{syntax_tools, "1.5.0"}]}


It correctly returns a list of all the dependencies required for the project; like this


[{stdlib, "1.14.2",
[compiler,edoc,syntax_tools]},
{syntax_tools,"1.5.0",
[app1,edoc]},
{compiler,"4.4.2",
[edoc]},
{kernel,"2.11.2",
[compiler,stdlib,edoc]},
{edoc,"0.6.9",
[app1]},
{app1,"0.1",[]}]


If it runs into a version conflict it reports what the conflict is and what applications are causing it.

In the output above. Each entry is a dependency, its version and a list of apps the depend on that app.

Friday, January 19, 2007

Released: Ktuo 0.1.1

I just released the first alpha version of my JSON Parser/Encoder Ktuo. I use it in several projects with no problems at all. However, this is still an alpha release, so be aware that you may run into issues. The download area for Ktuo is here. A bit of documentation for the project is here.

I know the question of why another JSON parser is going to come up, so I am just going to go ahead and address it now. The existing JSON parser is still available and still useful. It makes elegant use of continuations to handle cases where it may not have a full JSON expression available. This makes it very flexible and pretty darn interesting. However, these extra features also add significant complexity and no small amount of additional resource usage. While working on my Tercio project I realized that I didn't need these features and I didn't want to pay the cost of the complexity and resource usage overhead. So I wrote my own. Eventually, I realized it was useful in and of itself and created a project space for it.

Wednesday, January 17, 2007

Finished Up Dependency Checking

I finished up the shallow dependency checking. Now I just need to integrate with the repository and repository metadata to do deep dependency analysis and package pulling.

More Dependencies

Most people think that Erlang has just runtime dependencies. For
the most part this is true. However, Erlang also has two types of
compile time dependencies. The first is the ".hrl" files that are
included into a module. The second type is ".erl" files that
implement a parse transform. Both of these types of files
represent compile time dependencies. For the most part its simple
enough to make sure the include and code path information is
set. However, its much more difficult to make sure that if one of
these types of files change all of its dependencies are changed
as well. I think that the easiest way to handle this would be to
simply look at the ast of a compiled file during the compilation
process and extract the includes and the parse transform
information from that ast. Unfortunately, it means knowing much
more about a dependent OTP application then the build system
currently knows. It would definitely need to happen after the high
level runtime dependency mapping takes place. I think, for now, I
will stick with just handling the runtime dependencies and making
sure that the build time dependencies have the right path
information available. Once the system gets out in the wild and I
start getting feedback I may modify it to be a little smarter
about the compile time dependencies.

Tuesday, January 16, 2007

Dependency Detection

I am working on dependency detection for erlware (sinan and the erlware build system) today. Its going fairly well but I think I can significantly reduce the complexity. Any any case, I will put this new code in ewrepo as soon as we get that code out somewhere.

Monday, January 15, 2007

Per App Build Config

I have added the ability to configure the build directives for an application on a per app basis. This should make it much easier to do custom builds. Of course, for those of apps that don't need per app info it can be left off. This is a big deal and one of the bigger requests I have gotten.

Config Refactoring Complete!

I just finished refactoring the config support in sinan using the new ktuo package. It works really well and makes the config much more readable.

Original config with erlang terms

[{repositories, ["http://repo.metadrift.net/repo"]},
{build_dir, "_build"},
{ignore_dirs, ["_", "."]},
{ignore_apps, []},
{default_task, build},
{tasks, [{discover,
[{handler, sin_discover}]},
{verify, [{depends, [discover]},
{handler, sin_verify}]},
{build, [{depends, [verify]},
{handler, sin_erl_builder}]},
{doc, [{depends, [build]},
{handler, sin_edoc}]},
{tar, [{depends, [build]},
{handler, sin_tar}]}]}].

new, more readable, config

repositories : ["http://repo.metadrift.net/repo"],

build_dir : _build,

ignore_dirs : ["_",
"."],

ignore_apps : [],

default_task : build,

tasks : {

discover : {
handler : sin_discover
},

verify : {
depends : [discover],
handler : sin_verify
},

build : {
depends : [verify],
handler : sin_erl_builder
},

doc : {
depends : [build],
handler : sin_edoc
},

tar : {
depends : [build],
handler : sin_tar
}

}


Well, I think its more readable and more maintainable with the nested namespaces.

Thursday, January 11, 2007

Ktuo Error Reporting

I have vastly increased the usefulness of error reporting in ktuo. Before it just failed with badmatch if an error occurred. Now it gives a useful error message along with the line and column number of the problem. Now that I am using the library as a config parsing lib in a couple of places it made sense that it report error usefully.

Wednesday, January 10, 2007

JSON Parser

In the course of developing tercio I built a fast little json parser/encoder. Well I decided to use it in sinan as well so I moved it to its own project. The project, ktuo, is hosted in the usual place. It outputs strict json, but it parses a small superset of json. It allows single words as atoms in the json string as an alternative to a single word enclosed in quotes. This allows the lib to be used as a config parser. It makes the config much more readable. In any case, the library is usable and available now.

Sinan Build System

I have been working on a build system for erlang for some time. I have a working version that I have used. However, it has a couple of shortcomings that rendered it difficult to use. I done some significant work towards version two. I decided to make the source available via google code hosting anyway. There is a large amount of proprietary software and systems going on in the community right now and I wanted to be in the open from the start.

the source is here.

Its named after Ḳoca Mi‘mār Sinān Āġā one of the great architects and builders of the Ottoman Empire. I thought it appropriate to name a build system after him.

Friday, January 5, 2007

JsPkg Discarded, Package Support in Place

I decided to not use jspkg. The code there needs a massive refactoring and it approaches the problem from a viewpoint that is very different from my own. In stead, I wrote a simple packager that doesn't do loading. So for now I have good namespacing support and stubs for package loading. Once I get a better understanding of the semantics of the applications built on this platform I will start filling out the loading stubs.

Monday, January 1, 2007

Into the Client

Well I have moved into client side code. To save a little time I am building the client side code on top of jspkg. It needs quite a bit of refactoring as well as some conversion to work on top of prototype. However, it should fill my needs nicely. Once the package stuff is in place I can build up a nice messaging layer.