version control

Subversion has changelists

Several times a week I recommend to different people to actually go and read The Subversion Book. Obviously, not enough people do it. Including myself. So sometimes I have to fish out a tasty bit from that book to get people interested. Today is just such a day.

Have you heard about Subversion changelists? If you haven’t, chances are you aren’t utilizing your time properly. Here is a brief introduction.

Subversion 1.5 brings a new changelists feature that adds yet another method to the mix. Changelists are basically arbitrary labels (currently at most one per file) applied to working copy files for the express purpose of associating multiple files together. Users of many of Google’s software offerings are familiar with this concept already. For example, Gmail doesn’t provide the traditional folders-based email organization mechanism. In Gmail, you apply arbitrary labels to emails, and multiple emails can be said to be part of the same group if they happen to share a particular label. Viewing only a group of similarly labeled emails then becomes a simple user interface trick. Many other Web 2.0 sites have similar mechanisms—consider the “tags” used by sites such as YouTube and Flickr, “categories” applied to blog posts, and so on. Folks understand today that organization of data is critical, but that how that data is organized needs to be a flexible concept. The old files-and-folders paradigm is too rigid for some applications.

As wonderful as they are, changelists do have some limitations.

Subversion’s changelist feature is a handy tool for grouping working copy files, but it does have a few limitations. Changelists are artifacts of a particular working copy, which means that changelist assignments cannot be propagated to the repository or otherwise shared with other users. Changelists can be assigned only to files—Subversion doesn’t currently support the use of changelists with directories. Finally, you can have at most one changelist assignment on a given working copy file. Here is where the blog post category and photo service tag analogies break down—if you find yourself needing to assign a file to multiple changelists, you’re out of luck.

But even with this limitations, changelists are extremely handy. So I urge you once again to read the book. Or just the changelists section.

Subversion is not dead

Git is on the rise right now, especially in the Open Source Software development circles. Some even went as far as predict the death of Subversion. As much as I appreciate git (here is a link for you, if you don’t) and what it is doing for the Open Source Software, I have to agree with Brandon Savage:

Corporate America needs a centralized version control system. Subversion still offers this: Subversion centralizes the repository and simply checks out a working copy (versus Git, which gives you a complete repository). Corporate America still needs to have cannonical version numbers, and the ability to see the progress of a product over time as a single line – not a bunch of branches and independent repositories.

And this is true not only for the corporate America.

SugarCRM deployment efforts

Since we started working on SugarCRM in the office, one of the hardest tasks that we had was solving the deployment issue. On one hand, SugarCRM comes with some really nice GUI tools, such Studio and Module Builder. On the other hand, the system is large and complex and should be developed and tested in a separate, non-production environment.

We’ve spent a lot of effort over the last couple of month trying to solve the puzzle. The problem is that there is a tricky combination of files updates and database changes, some of which can be just copied over while others have to be executed from the destination machine’s administration.

So, what we did first was complete separation of environments. Each developer had his own machine on which he could install and configure as many instances of SugarCRM as he saw fit. Also, each developer had a separate branch in the Subversion, so that he could work on his own stuff without being afraid to run into conflict with anyone.

After that, we created a development server with a checkout of common trunk. For extra insurance, we did a checkout from a system user, who does not have any write permissions in the repository. In this case, even if someone will accidentally try to commit from the development server, we would be sure that it fails.

Now, each developer had to merge his changes into trunk, and then test them on the development server. This procedure is very similar to the production deployment and consisted of two parts. Firts part was updating all the relevant files (a bit more on this in a moment) with svn update. Second part was logging into SugarCRM and doing Admin -> Repair -> Quick Repair and Rebuild.

The graphical tools that come with SugarCRM are powerful, but a bit confusing. The biggest confusion for me was (and maybe still is) between Module Builder and Studio. Studio can be used to customize core modules that are shipped with SugarCRM. The results of these customizations are stored in custom/modules directory, and when loaded into the database, can be observed in _cstm tables (for example, accounts_cstm). This is where new custom fields and things like that are going. Module Builder is a tool which can help you customize existing modules or build the new ones. The confusion here is because both of these tools can be used to do the same things. But with Module Builder you’d be working closer to the core system and modifying “original” functionality. You can build your own modules too, by the way. The results of the Module Builder work with go into modules/ directory, and changes in the database will take place in the original tables. One thing to remember though, is that you’ll need to push Save & Deploy button every time you are finished with changes in Module Builder. This is like compiling and building a module. If you forget this step, then your module will hang in its source somewhere around custom/modulebuilder directory.

Another thing to keep in mind is the sillyness of the machine trying to figure out another machine. Meaning that Subversion will often have issues trying to figure out the changes from the last commit, and these issues would be often caused by a lot of automatically generated code by SugarCRM. In most of these problematic cases, Subversion will just merge the changes, and this would often result in a broken system. I’ve found at least two reasons behind these: small context size that Subversion uses (3 lines or so) confuses it sometimes, bringing it to a wrong place in the file to do the merging; and rather messy automatically generated stuff by SugarCRM – unnecessary reordering and mixed (DOS and UNIX) ends of lines in a single file. These problems are mostly related to vardef files (vardef.php and anything *def.php) and language dictionaries (anything with *en_us*php, or whatever your locale is). The solution we are using at the moment is simple, although a bit heavy on the manual work – instead of merging the changes and checking them every time we simply remove old versions of files and add the new ones in two separate commits. These way Subversion treats the files as completely different ones and real removes and re-creates them instead of trying to merge.

We follow exactly the same procedure now to deploy to the production server. We just merge code from trunk/ to branches/stable , commit, then update the files on the production server, and then do the Quick Repair and Rebuild.

The thing about Quick Repair and Rebuild is that it takes the update definitions of your forms and layouts and rebuilds compiled templates. It also compares the structure of the database with the update definitions in the files and, if needed, updates the database scheme too. Sometimes you’d get an error of missing table (usually custom tables with _cstm suffix) – just create an empty table manually. Put a couple of standard fields like id_c, date_modified, and date_entered. After that, field modifications should be OK. In case you run into a problem with updates to several fields at once, make sure that SugarCRM put a semicolon (;) at the end of each SQL statement that it shows you in a popup window. For some weird reason, sometimes it just works, and sometimes it tries to execute several queries without separating them one from another.

So far the setup seems to be working for us just fine, but I’m sure that we’ll have a few changes here and there. I’ll let you know once we find any better way of doing things. In the meantime, here are some links that might help your development efforts:

Finding the tree version in the working directory

When using Gnu Arch, once in a while I need to verify that I am in the correct working directory. With long names, patches, and all those branches it is not always that obvious. The shortest way to find the version of the tree in the current working directory is:

tla logs -rf | head -1

Telling Gnu Arch the truth

Yet another problem (and solution) that I’ve stumbled across while using Gnu Arch. We have two branches in our archive: program--vendor--0.1 and program--local--0.1. Vendor’s version has all the source files in SomeDirectory, while our local version has all source files in somedir. Except for the name and few local changes, these two directories are practically identical.

But when we were creating branches and importing code, we weren’t very careful and ended up with these directories and files having different arch IDs. This makes comparing two source trees close to impossible, as arch thinks that directory SomeDirectory was removed together with all its content and directory somedir was added together with a bunch of files.

Telling Arch the truth is very simple. Basically, all that needs to be done is =id and *.id files under all .arch-ids/ directories in one source tree should be copied to the appropriate places in the other source tree. After that tla commit should be done.

In order to minimize the pain of manual labour, I wrote a tiny perl script to find all needed files and copy them appropriately. On the command line just specify two directories, which you know are the same, but which arch considers different. If any of the files weren’t copied, you’ll get their names in the warning. When script finishes, you’ll get the total count of copied files.

The script is here: fix_arch_ids.pl