Git Gud with Git
Just like Mr. Krabs protects his recipe from the evil plankton, so too must we developers safeguard the fruits of our labor, the Source Code. Git protects the project from both accidents and the natural degradation of human error and unintended consequences.
Today, git is by far the most widely used version control system, running on nearly any platform, storing all kinds of projects, and adopted by both commercial and open-source projects alike. If you plan to land a job in this industry, Git is a must.
In this post I'll explain the main concepts behind Git, and the step-by-step of the most common operations, using a visual interface called Fork.
In the next post, I'll cover working in collaboration using Git and my goal is to get you good w/ git (that's a mouthful, right?)
Table of Contents:
Version Control
In your life, you have probably left a commented block of code that you were never quite sure if you would need again, but decided to leave it just in case. Or even better, created files with some variation of: "Final", "Final(2)", "PleaseGodLetThisBeTheLastOne".
Version Control is a way out of these problems, without it modern software development would be chaos.
A version control system allows users to keep track of the changes in software development projects, and all the benefits that come with it, like comparing / combining / undoing these changes, and much more, all whilst keeping a safe backup of your repository, that is, the project and its history of changes.
The version control system is also a great documentation tool, providing a view of the entire history of changes of the repository (also referred to as repo), with granular information of who made the changes, when they were made, and what exactly they were.
And finally, it is a great tool for collaboration, allowing developers to work together, often in the same file, and merge their changes when needed. Each feature / developer can have their project at different points in the history of changes, and work independently of the others.
Git
There are many different version control systems, but as I said, git is the most used of them all.
Git is a mature, actively maintained open source project originally developed in 2005 by Linus Torvalds, the famous creator of the Linux operating system. Not only does Git offer the strongest feature set for developers, but it also has the most reliable workflow, efficient in both small and large projects, and in its compatibility with many existing systems and protocols.
The main problem with Git (and other version control systems) is the steep learning curve, but recent graphic interfaces make it doable.
There are other version control systems on par with the features that git provides, but they are all expensive: Perforce | Plastic SCM.
History
Tracking the changes of the project is the cornerstone of any version control system, from where all its features and functionalities expand on. The changes are stored as what we call "Commits", which are basically snapshots of the state of the project at that point in time.
Git can restore your project to any commit in its history, but it doesn't store the whole project for each one; It keeps track only of the changes made between each commit, this way the projects are kept extremely lightweight, and we still keep the features: restore version and compare changes.
The History of Commits can be inspected, allowing the developers to easily understand what were the latest commits, who to talk to about a specific feature, check what changes a commit introduced, and so on.
This is an example repository of the project we teach in the C# Scripting Fundamentals Course; Each commit is a line entry in that list, containing a message, author, id, and date. *This example doesn't cover collaborative work*
Just by looking through the commits, you can get a good idea of what is going on in the repository, right? The history is a great documentation tool, but just like any tool, you need to wield it properly.
If in every commit you make, you implement 3 new features, solve 10 bugs and do a backflip, summarizing it all into a short message will be simply impossible. Not to mention that if you won't be able to revert any of those bugs / features separately. Keep the commits small!
Creating good commit messages is also really important. Some companies like to tie the messages to some external task tracker, like the "Milestone X" used, or Trello / Jira / … card ids; some like to enforce that the messages be in a specific verb tense, but the rule of thumb is: You need to have an idea of what happened in that commit from the message.
Creating Commits
The repository file changes go through 3 steps in order to become a commit: Working Directory, Staging Area, and finally a commit.
The Working directory or Working copy refers to the active state of your project, that is, all the scripts, assets, files that make up the project in the current version. But another way to look at this is, the working copy is the sum of the latest commit version, plus all the changes you are making on top of it. So, as you work on your project normally, the working directory is being changed as you go, and all these new changes are available to create the next commit.
However, sometimes you don't want to commit every modification you made, maybe some were just for tests, or you want to make the commits a bit more granular. This is what the staging area is for: From the new changes in the working copy, you select which ones you want to group together to create a commit. Here the idea is to make sure that all changes you select are correct, and it gives you the possibility to separate the current changes into multiple commits.
With the files selected at the staging area, give it a commit message, and an optional description and you've created a new commit! \o/
Local vs Server
Git is what we call a Distributed Version Control System, meaning that everyone working in that repository doesn't just checkout the latest snapshot of the files; rather, they fully mirror the repository, including its entire history. Every clone is a full backup of all repository data.
To keep this up, the repositories must synchronize with each other in some way, using the Server, or more commonly referred to as "Remote Repository". Every developer sends (push) the commits they created/deleted to the server, and receives back what the other developers worked on (pull).
GitHub
Although I am referring to the Remote repository as a Server, any computer can act as that central repository, but there are already services that take care of this for us, like GitHub. Github is primarily a host for your remote repositories, plus a bunch of other features, like having multiple backups of your repository in case some catastrophe happens.
Github has a free tier that should get you covered for personal use or even some medium projects. There are some alternatives like BitBucket and GitLab.
Workflow
When working with git repositories you fall into a constant cycle:
Make sure you have the latest changes (sync with the remote repository)
Work normally on your project, creating new scripts, changing layouts, …
Select (Stage) some changes, and create a new commit with them
Send your new commits to the remote repository
Repeat
In the next session we will go over how to do each of these steps.
How To
Git is originally a command line only software, but there are now many user interfaces that simplify its usage a lot. I'll focus only on how to perform the main steps using the visual tool fork (for windows) with GitHub as our repo host. Any other git interface software, like GitKraken (Paid), SourceTree and Github Desktop, will have a similar workflow, since they all use git underneath.
Creating a Repository
Creating a repository is super simple, just login into your GitHub account, press the green "new" button in the left side-bar, and you will be directed to this page:
The first and second steps are pretty straightforward, but the third one requires some explanation:
The README file is nothing but an introduction page that usually contains what that repository is for, who is working on it, how to use it, and so on. This will just create a template file for you to fill in later.
The ".gitignore" is a file that lists some files in your project that you don't want to track in the repository. Usually these files are automatically generated, build artifacts, confidential or something like it, and having them in the repository is useless or unadvised. You can choose one from a template list, and you can modify it later.
Finally the Licence tells others what they can and can't do with your code. This is indispensable if you are creating a public or open source project. Again, you can choose one from a template list, and you can modify it later.
You are not obligated to have any of these files in your project, but they are pretty common.
Cloning a Repository
With the repository created, you'll be able to copy the repo url by clicking on the green "Code" button.
With the url in hand, open the Fork interface, and click on "File > Clone...", and fill in the following popup:
This will clone (or "Download") the repository to your machine, at the selected path, creating a new folder with the desired name, that will contain the entire history of that repo. For now, since we are cloning a newly created repository, it will only have a single commit with the requested initial files (README, .gitignore and/or licence).
Checkout the git clone command line documentation for more information.
Creating a Commit
With the repository in your local machine, you can start working on your project as you normally would. Every file you create, modify, delete or rename inside the repo folder, are automatically detected by git and can be added in a commit.
With Fork opened on your repository, select the "Local Changes" tab at the left panel, to get this view:
The "Unstaged" area (#1) will show you every file that was changed in any way that is still not selected to enter the next commit, while the "Staged" area (#2) are the selected ones. As you can see, every file can be tracked, no matter it's type (images, scripts, 3D models, …).
Fork uses a color code to quickly tell you what happened to each file:
Green: New untracked files
Yellow: Modified files
Red: Deleted files
Purple: Renamed files
To move a file from "Unstaged" to "Staged", or vice versa, just select the file and click "Stage" or "Unstage" respectively, double clicking also does the trick.
The third area shows you in detail what was changed in the selected file. In the previous example, we renamed a method and "changed its code" a bit, here red means that the line was removed, while green means they were added.
This area changes depending on what kind of file you have selected, text files will tell you line by line what was changed, images will show a "side-by-side" of the versions, but some binary files (like a .fbx, .mp3, ...) have no preview due to their nature.
You can preview the changes of the files from both the unstaged and staged area, and it is really important to do so before creating the commit, this way you are sure of what is being included, and that everything is correct.
If the desired files staged, all that is left is to input the commit message (#4) and press the commit button (#5).
To learn more about staging, unstaging and committing, checkout the git add, git reset and git commit command line documentations respectively.
Repository History
With Fork opened on your repository, select the "All Commits" tab at the left panel, to get this view:
This view will display the entire history of the repository; Each line represents a commit, with the Commit Message (#1), Author (#2) and Commit Date (#3) displayed for each of them.
Notice that there is a "Master" tag (#4) with a check mark (✓) on it. This indicates that you are currently on the "Branch" master, and that the new branch is on the last commit of this repository. Branches are a really important feature for collaborative work, but when working alone, you will probably stay on the same branch during the entire project development.
For now, know that the branch tag tells you where you are in the history of the repository, and that generally you should always be in the last commit. This is the default behavior if you follow the workflow presented in the post.
In the same "All Commits" view, there is this panel to the right:
Its content changes depending on the commit you have selected in the commits list, and gives more detailed information about the commit. In this example we have the second commit "Milestone 8 - Prettify existing UI numbers" selected.
You can change the way the information is displayed through the 3 options at the top (#1), and they all display in some way the commit general information (#2) and what changes were made (#3).
Checkout the git log command line documentation for more information.
Pushing Commits
Like I said previously, git is a Distributed Version Control System, and so it needs to synchronize all commits between the developers and the remote server, and this is done through pushing and pulling commits.
When you create a commit, it is stored only on your local machine. You can see this state in this picture:
You can see that the "master" branch and the "origin/master" branch are in different commits. That happens because your local repo is currently in the "master" branch at the latest commit, while the Github repository master branch is still in the previous commit. Every branch that starts with "origin/…" indicates that it belongs to the remote repository, notice that it also bears the GitHub icon.
When the local and remote branches are in the same commit, fork abbreviates the origin branch name, showing only the GitHub logo.
To push your latest commits to the remote repository, just press the "Push" button (#1), and this popup will appear. It is really uncommon to have to change any of these configurations, so just press the "Push" button (#2) again, and you should see the origin branch move to the same position as your local branch.
Checkout the git push command line documentation for more information.
Pulling Commits
You won't ever need to do this if you are working alone in a single PC, but if you ever need to get commits that some other PC added to the repo, you will need to perform a pull.
When you pull, git will perform two steps:
First it "downloads" all new commits from the remote repository, this is called "fetch".
Then it moves your local branch to the remote branch pulled from the server.
Fork will automatically fetch new commits from time to time. this way you would know when another developer pushed new commits. Because of this, you may end up in this situation:
Here, your local branch "master" is at the "Milestone 8 - Create static NumberPrettify…" commit, while the "origin/master" branch (at the remote repository) is in the latest commit "Milestone 8 - Muffins per second…". Pulling in this case, would update your local branch to the latest commit as well.
To pull the latest commits from the remote repository, just press the "Pull" button (#1), and this popup will appear. It is really uncommon to have to change any of these configurations, so just press the "Pull" button (#2) again, and you should see your active local branch (#4) move to the same position as the remote branch (#3).
Checkout the git pull command line documentation for more information.
Git With Unity
When using git with Unity, you should take a few precautions:
Creating the Project
When you clone a repository, you need to target an empty or nonexistent folder, and the same is true for creating a Unity project. So, when you are setting up the repository, first clone it somewhere, then create the unity project as a subfolder.
.Gitignore
Make sure to use a unity specific gitignore.
Unity generates A LOT of files that are not necessary to track, and keeping them in the repo will make staging files for a commit a living hell.
Metafiles
For every file and folder in your project, Unity generates a .meta file associated with it, so when you add a new file to the repository, make sure to include its metafile too!
Large File Storage
Git stores only the changes from commit to commit, this way it manages to keep the project lightweight. But for binary files, understanding what were the changes from one version to the other is almost impossible. For these files, git stores the entire file over and over, for each different version of it.
Repository servers, like Github, usually have a maximum size for the hosted repos, and Unity projects have a lot of binary files, like images, audio, 3D models and so on, meaning they often exceed the allowed size.
To solve this problem, git has a feature called LFS (Large File Storage), that will store every file version in a separate server designed for it, leaving only a link to the file in the actual repository.
To use this feature you need to add a .gitattributes file (like this one) to the root directory of the repo (just like the .gitignore), and enable the LFS in your machine by opening the console and running the following command: "git lfs install".
Next Steps
Git is a complex tool with lots of features, and learning them all at the same time is just impossible. The steps presented here should have you covered as you start in this journey, and as you get a better hold of the commit / push workflows, you can start learning the collaborative features, like branching, merge, conflict resolution and others.