Person holding a small paper

Source Code Control for Beginners

Update

My introduction to source code control came at University. The name of the program was “update”. It took an “update deck” which described lines to remove, by line number, and lines of code to insert.

This format allowed us to inspect the code that was actually being changed, as well as the surrounding code. Every line of code I wrote for the Systems Group that was installed went through three levels of code review and QA testing before going live in the system.

Having those change decks helped in the review process. As a side note, the author’s initials were attached as a note to the right of every line of code we modified. Easy stuff.

After a change deck was accepted, it became part of the “installed version” of the software.

One of the powerful features of working with change decks is that two (or more) people could be working on the same piece of code and unless their changes overlapped, they could be applied independently.

RCS

When I left University, I started working with the BRL CAD project. This introduced me to the RCS system.

RCS was something like “update” but not quite. And you didn’t think in terms of “change decks”. That was handled behind the scenes.

You had a directory (folder) in which you had your code. You also had hidden files that stored the RCS history of the code.

By default, files were stored read-only. You could read them, you could compile from them, but you could not modify them.

To modify a file, you needed to first check out the file. When you checked out a file, it was “locked” to you and nobody else was allowed to modify the file.

You made the changes you wanted to the checked out files, then you tested. When you were happy that your code worked, you checked in the file you had checked out.

This is great when modifying a single file, but if you are modifying more than one file to accomplish your fix or enhancement, you have to check in each file in a separate operation.

There was no linkage between the files to indicate that all the changed files needed to be processed as a gestalt.

When you were ready to make a release, you had to do some magic to mark each file as being part of that particular tag. Then, at a later time, you could check out that entire tree and work on it as if it was the day of the release.

RCS did magic behind the scenes to figure out the “delta” between the checked out code and the original. This was equivalent to the “update deck” I was used to from University Days.

To work in a collaborative methodology, you would have a single “working directory” with everybody on the team having read/write privileges to the directory. If you were working across multiple machines, each machine had to use the same shared directory via a network file system. (NFS at the time)

At one point, I was working on BRL CAD on my home machine. I did not have enough space on the drive to copy the entire RCS tree to my local drive, so I was using NFS over a 28.8k dial-up modem.

Compile times ran about 3 days. And if anybody changed one of the “big” include files, I would have to start the build over again.

If you were working on a copy of the source code, you would extract a patch file from RCS to submit back to the master RCS directory.

It felt easy at the time, but it wasn’t as easy as it seamed. We just didn’t know what we didn’t know.

CVS

CVS was the first major paradigm change in source code control for us. The basic use was the same as with RCS, but they had changed the layout.

You now had an explicit directory, CVS, which contained the history files. When you checked out files, the lock was done in the CVS directory.

In addition, you could check out the files read-only (no lock) remotely from the CVS directories and then checkout with a lock, edit on the remote system, then check in your changes.

This was a game changer. We no longer required a network file systems.

Unfortunately, we had some of the same issues as we had with RCS. The main one being that only one person could check out/lock a file at a time. With team members working nearly 24 hours per day, it was a pain when the morning dude wasn’t available at 2237 to release a lock.

SVN

SVN solved most of the known problems with CVS. It had the concept of a remote repository, it allowed multiple people to work on the same file at the same time. It had better branch and tag capabilities.

All in all, it was a vast improvement.

The two primary weaknesses were no gestalt for files and very slow check out of branches and tags away from the main trunk.

I remember using SVN. I had to use it just a couple of weeks ago. I don’t think I ever fell in love with it. It was a step-wise improvement over CSV.

git

Git is my favorite source control system. I understand that there is another SCS, but I can recall its name at this point. I’ve not used it.

Git changed the paradigm we use for changing the repository. Whereas all the previously discussed SCS’s work on a file by file basis, git works on a “commit” basis.

Even if you are working in a collaborative environment, you work on your personal repository (repo). We will get to collaborative environments shortly.

In the simplest form, you create a “working directory” which you populate with your code. That could be a book, a program, an application, or a web page. It doesn’t matter. Git doesn’t care what the files contain, only that they be text files.

Git can work with binary files, but that is not our focus.

Once you have your initial contents, you create your repo with git init. With this magic command, git creates all the required files to track the history of your project.

Let’s say you are working on a book. You have placed each chapter of the book in a separate file. One of your characters is named Cillary Hlinton. Your editor tells you that the name is just too close to a real person, and he would rather not be sued. He asks you to change the character’s name.

Under update, RCS, CVS and SVN, you would check out individual files, change the name to “Billy Boy” and then check in your changes. When you have made all the changes, you are happy.

The issue is that there Chapter One is on revision 44, Chapter Two is on revision 37, and Chapter Three is on revision 48. How do you figure out the revision from just before you made the changes?

With git, you do not check out files and lock them. Instead, all files are ready for you to modify. You just edit the files and change the name.

Now you have chapters one, two, and three that have been modified. You group them into a single commit by adding them to the staging area. git add chap1.md chap2.md chap3.md

You can do this on one git add or multiples, in one session or multiple sessions. At some point you will be satisfied with your collection of changed files.

At that point, you commit the changes. You will be required to supply a message.

Each of the following circles represents a commit.

Before Name change
After the name change

If we want to see the version before the name change, we can check out commit 4. When we do, all the files are changed back to the version before adding your name changes.

This makes it easy to find one particular point where the state of the book is one way and in the next commit, all the changes have taken place across the entire book.

The other major improvement that git brought was fast branches.

Branches

Here we see two branches added to the repository. The first “HEAD” is a special branch. It represents the commit associated with the working directory. It is manipulated implicitly instead of explicitly.

“master” is the default branch until “rrracist” was applied, so some repos now use “main” instead of “master” branch.

This ability to create branches rapidly allows us to make and destroy branches at will.

We are going to create a new branch, “editor” for our editor to work on. Meanwhile, you are continuing work on chapter four.

Editor and Master branches

And here is where git shows another of its powers, the merge. With the ‘master’ branch checked out, we merge the editor branch, fixing all the little grammar and spelling errors. git checkout master; git merge master

After Merge

With this merge completed, the master branch contains all the work done in the editor branch, but the editor branch does not have any of the new work done on master. To synchronize the editor branch with the master branch we do git checkout editor; git merge master.

After merging master into editor branches

If there is no more editing to be done, it is acceptable to delete the editor branch. No code will be lost.

Because the ability to branch and merge is so quick and powerful, it is normal procedure to start a new branch for each issue being addressed in a project. When the issue is resolved, the new code is merged into master or discarded.

Remote Repositories

Is a tale for another time.

Conclusion

If you can use a source code control system to track your work and changes, do so. It makes life so much easier in the long term.


Comments

2 responses to “Source Code Control for Beginners”

  1. pkoning Avatar
    pkoning

    Another version control system similar to git but less common is Mercurial (hg). The Python project uses it, I’m not sure what the historical reason is. Perhaps they started long enough ago that git wasn’t yet comfortable to use.

    “Update” as in the CDC mainframe program? I haven’t really used that but I do use (still today) Modify, which is basically the same thing.

    A really important feature of version control systems is that you can see the entire history (who, what, when, why) and also that you can at any time easily restore the state of any file to any earlier point in its history.

  2. Slow Joe Crow Avatar
    Slow Joe Crow

    Could the other SCM product have been Subversion aka SVN? It was common in the early oughts before git, and I had some learning experiences moving code from Perforce into Subversion so it could be accessed without the. cost of a Peforce license. Then everything went to git or commercial front end for git.