Skip to main content

Source Control

Introduction

Version Control Icon

Source Control Management tools are essential for modern software development. They allow developers to track changes to their codebase, collaborate with others, and maintain a history of their work. Sometimes called a Version Control System (VCS) or Revision Control System (RCS), these tools are critical for allowing multiple developers to work on the same codebase without conflicts.

First developed in the 1970s, source control tools have evolved significantly over the years. Today, Git is by far the most popular source control tool, used by millions of developers worldwide.

Types of Version Control Systems

There are three main types of version control systems:

Local Version Control

Local version control systems store all versions and change history on the local machine. This approach is simple and straightforward but lacks the collaboration features of centralized or distributed systems. They are mostly used for personal projects or small teams.

Centralized Version Control

Centralized version control systems such as Subversion (SVN), CVS, and Perforce store the codebase on a central server. Developers check out files from the server, make changes, and commit them back. This approach allows multiple developers to work on the same codebase but can lead to conflicts and bottlenecks. Having a single source of truth also means you have a single point of failure.

Distributed Version Control

In an distributed version control system like Git, every developer has a complete copy of the codebase, and the change history, on their local machine. This approach allows developers to work offline, commit changes locally, and push them to a remote repository when they are ready. Distributed version control systems are more flexible and scalable than centralized systems and are better suited for modern software development workflows.

Benefits of Source Control

Some sort of version control system is used at every company developing software for three main reasons:

  1. History: Any changes made by developers will be visible in the change history years later. This makes going back to previous versions in order to analyze bugs, simple to do.
  2. Collaboration: Branching allows each developer to work independently and not interfere with each other’s work without the need for complex coordination. Merging then brings the developers’ work together and gives tools for managing conflicts.
  3. Traceability: Each change can be connected to project management software and can easily be annotated with a message describing the purpose of the change. This can be essential for regulatory compliance in many industries.

Git

Git is the single most common version control system. It has a strong feature set, a reliable workflow, and is supported by more third-party tools than any other version control system. It is a distributed version control system, which means that every developer has a complete copy of the codebase and its history on their local machine.

Terminology

Before we dive into the details of Git, let's cover some basic terminology:

  • Repository: A repository, or repo, is a collection of files and folders that make up a project. It contains the entire history of the project, including all the changes made to the files.
  • Commit: A commit is a snapshot of the current state of the repository. It is represented by a hash, which is a unique identifier for that commit. You can think about each commit as a node in a linked list, with each commit pointing to its parent commit. Traversing the list will show a history of all the changes made to the repository. If a commit has multiple parents, it is called a merge commit.
  • Branch: A branch is a lightweight movable pointer to one of the commits. Most commonly, a branch gets created in order to work on some feature or bug fix. Once the work is complete, the branch can be merged back into the main branch. Git repositories typically have a main branch, which is the default branch that gets created when you initialize a repository. Which branches get created, how they are named, and when they get merged is dictated by the team's branching strategy.
  • Merge: Merging is the process of combining two branches into one. When you merge a branch into another branch, Git will create a new commit that has both branches' changes. If there are conflicts between the two branches, you will need to resolve them before the merge can be completed.
  • Pull Request: A pull request is a request to merge one branch into another. It is a common practice to create a pull request when you want to merge a feature branch into the main branch. Pull requests are a way to review the changes before they are merged and to discuss any potential issues.
  • Conflict: A conflict occurs when two branches have made changes to the same part of a file. Git cannot automatically resolve the conflict, and you will need to manually resolve it by editing the file and choosing which changes to keep.
  • HEAD: HEAD is a reference to the current commit. It is a pointer to the branch you are currently on. When you switch branches, HEAD gets updated to point to the new branch.

Git Basics

Git does not automatically track every change that happens in a working directory. Instead, you need to explicitly tell Git which changes you want to track. This is done by staging the changes using the git add command. Once you have staged the changes, you can commit them using the git commit command.

Files in Git are not actually stored by their names. Instead each file is represented by a hash of its contents. This allows for trivial renaming of files and moving them around without losing the history of the file.

Basic Commands

Here are some basic Git commands that you will use frequently:

  • git clone <url>: Clones a remote repository to your local machine.
  • git add <file>: Stages changes in a file for the next commit.
  • git commit -m "message": Commits the staged changes with a message.
  • git push: Pushes the changes to a remote repository.
  • git pull: Pulls the changes from a remote repository.
  • git status: Shows the status of the working directory.
  • git log: Shows the commit history.
  • git branch: Shows the list of branches.
  • git checkout -b <branch>: Creates a new branch and switches to it.

Typical Workflow

Here is a typical workflow when working with Git:

  1. Clone the repository to your local machine using git clone.
  2. Create a new branch to work on a feature or bug fix using git checkout -b <branch>.
  3. Make changes to the files in the working directory.
  4. Stage the changes using git add.
  5. Commit the changes using git commit. Always include a meaningful commit message.
  6. Push the changes to the remote repository using git push.
  7. Create a pull request to merge the changes into the main branch.
  8. Review the changes in the pull request and discuss any issues.
  9. Merge the changes into the main branch.

Branching Strategies

Creating branches is a lightweight operation in Git, but without a clear branching strategy, things can quickly get out of hand. There are several branching strategies that teams can adopt, each with its own pros and cons. Your choice should depend on the size of your team, the complexity of your project, and your release cadence, with releases being the most important factor. More frequent releases will require a more streamlined branching strategy.

GitFlow

The GitFlow branching strategy is one of the earliest branching strategies and is still widely used. It works best for projects with a longer release cycle, such as enterprise software. The main branches in GitFlow are main and develop. The main branch contains the production-ready code, while the develop branch contains the latest development code. Feature branches are created off the develop branch and are merged back into the develop branch once the feature is complete. When a release is ready, a release branch is created off the develop branch, and bug fixes are made on this branch. Once the release is ready, it is merged into the main branch, and the changes are merged back into the develop branch.

The main branch always contains the code being used by customers.

This approach has a number of advantages that make it a good choice for some organizations:

  • Easy parallel development without needing to worry about impacting production code.
  • The main branch is always in a stable release state since no development is ever merged directly into it.
  • Developers have clear guidelines on how to work with branches.
  • By using tags, it can easily handle multiple versions of the production code.

However, GitFlow can be overly complex for smaller teams or projects with a faster release cycle. It can also lead to long-lived feature branches, which can be difficult to merge back into the develop branch.

Development teams who have embraced continuous integration and continuous deployment should not use GitFlow. Even the creator of GitFlow, Vincent Driessen, has stated that GitFlow is not suitable for continuous delivery.

GitHub Flow

GitHub Flow is a simpler alternative to the GitFlow model that works well for small teams that don't need to manage multiple releases simultaneously. This strategy skips the develop branch and uses only the main branch. Developers create feature branches off the main branch, make changes, and create pull requests to merge the changes back into the main branch. Sometimes these feature branches will live for days or weeks, but they should be merged back into main as soon as possible.

This approach is more streamlined than GitFlow and works well for teams that are continuously deploying to production. It is also easier to understand and implement than GitFlow.

Not having a develop branch can be a disadvantage if you need to maintain multiple versions of the codebase. It can also lead to conflicts if multiple developers are working on the same area of the codebase. However, for small teams that are continuously deploying to production, GitHub Flow is a good choice.

Trunk-Based Development

Trunk-Based Development is an even simpler branching strategy that works well for teams that are continuously deploying to production. In this model, all developers work off the main branch. Feature branches are created, but they are short-lived and are merged back into the main branch within a day. This approach requires a high level of discipline from the team to avoid conflicts and ensure that the main branch is always in a deployable state.

The idea is to encourage small, frequent changes to the codebase, which can be deployed to production quickly, eliminating the need for long-lived feature branches. Techniques like feature toggles can help to hide incomplete features from users until they are ready.

According to the State of DevOps Report, teams that practice trunk-based development have higher throughput and stability than teams that use other branching strategies. This is because trunk-based development encourages small, frequent changes, which are easier to review and test.

I've found trunk-based development to be extremely effective, but it works best with more experienced developers who understand the importance of keeping the main branch in a deployable state.

Choosing a Branching Strategy

There is no such thing as the perfect branching strategy. The best strategy for your team will depend on the number of developers, their experience, the type of project, and many other factors. In practice, you will rarely get to be the one to choose the branching strategy. Instead, you will need to adapt to the strategy that is already in place.

Best Practices

There are a few best practices that you should follow to get the most out of Git:

  • Commit Often: Commit your changes often and in small, logical chunks. This makes it easier to review your changes and to roll back if something goes wrong. You can always squash your commits before merging them into the main branch to keep the commit history clean. I generally create a commit every time the code is in a stable state, even if the feature is not complete. This gives me a fallback point if I need to revert the changes.
  • Write Meaningful Commit Messages: Write clear, concise commit messages that explain what changes were made and why. Including this description will make the codebase easier to understand for future developers.
  • Sync Regularly: Pull changes from the remote repository regularly to avoid conflicts. This is especially important if you are working on a long-lived feature branch. In projects with multiple developers, it is easy to fall behind the main branch and end up with a lot of conflicts when you try to merge your changes.
  • Use Branches: Use branches to work on new features or bug fixes. This keeps your changes isolated from the main branch until they are ready to be merged. It also makes it easier to review your changes and to revert them if necessary. You should never commit directly to the main branch.
  • Review Changes: Review your changes before committing them. This will help you catch any mistakes or issues before they are merged into the main branch. You can use tools like git diff to see the changes you have made.

Rebasing vs Merging

During normal development a feature branch is created off the main branch. As the main branch is updated, the feature branch can fall behind, and as the feature branch is updated, the main branch becomes more out of sync. When it comes time to merge the feature branch back into the main branch, there are two ways to do it: merging and rebasing.

Merging

Merging is the default way to combine changes from one branch into another. When you merge a branch into another branch, Git will create a new commit that has both branches' changes. If there are conflicts between the two branches, you will need to resolve them before the merge can be completed.

You can merge a branch into another branch using the git merge command. For example, to merge a any changes on the main branch into a feature branch, you would run:

git checkout feature-branch
git merge main

Merging is a safe way to combine changes from one branch into another, but it can lead to a messy commit history with lots of merge commits.

Rebasing

Rebasing is an alternative to merging that creates a linear history by moving the commits from one branch to another. When you rebase a branch onto another branch, Git will replay the commits from the first branch on top of the second branch. This creates a linear history with no merge commits.

You can rebase a branch onto another branch using the git rebase command. For example, to rebase a feature branch onto the main branch, you would run:

git checkout feature-branch
git rebase main

Rebasing is a powerful tool that can help you create a clean, linear history, but it can also be dangerous if not used correctly. Rebasing rewrites the commit history, so you should only rebase branches that have not been shared with others. If you rebase a branch that has been shared with others, you will need to force push the changes, which can cause conflicts for other developers. You should never rebase a branch unless you are the only one working on it.

Conclusion

Source control management is a critical part of every company's software development process. It allows developers to track changes to their codebase, collaborate with others, and maintain a history of their work. Git is the most popular version control system and is used by millions of developers worldwide. By following best practices and choosing the right branching strategy, you can get the most out of Git and improve your team's development process.

Even when not working with a team, using Git can be beneficial. It allows you to track changes to your codebase, experiment with new features, and roll back changes if something goes wrong. By following best practices and using Git effectively, you can become a more productive and efficient developer.

Image Credits

Version control icons created by Uniconlabs - Flaticon