I’ve got this ridiculous screenshot saved somewhere on an old hard drive of my friend Rohan trying to explain Git to me when I first started programming. He was a CS major who’d been programming for years before college, and I was a chemical engineering major who’d almost failed my MATLAB course, so naturally everything went over my head, right? No. I mean, it did all go over my head, but not naturally. I fully believe Git is understandable at the level you’ll need for day to day work in a few minutes (hence this blog post).

Git is definitely one of those things where you’ll discover new functionality years, or even decades into your career. But it doesn’t need to be as complicated as a lot of people make it when you’re just starting out. You’re learning how to program, how to use the terminal for the first time, how to deal with ridiculous error messages that in a few months will be cake but as of right now are going to ruin your day. The last thing you need is someone explaining to you oh-so-elegantly how the intricacies of the staging area work and why git bisect is the coolest thing since sliced bread.

Here’s the Git you’ll need on a day to day when you’re starting out.

Basic Concepts

Git is a “version control system.” You know how when you would write essays in high school you’d have a bunch of files on your computer like

english/
  essays/
    i_read_the_odyssey.docx
    i_definitely_read_the_odyssey.docx
    i_definitely_read_the_odyssey_new.docx
    i_definitely_read_the_odyssey_revised.docx
    i_definitely_read_the_odyssey_final.docx
    i_totally_read_the_odyssey_final.docx

Well, Git is a program that makes it so you don’t have to do that anymore. It was terrible enough with one file, but in software it’s common for projects to have hundreds or even thousands of files, all of which need to be updated regularly, and any of which can completely break your project. Oh and also in most cases other people will need to be working on them at the same time.

The way that Git does this is by tracking the history of each file over time. Your computer likely already auto-saves files, but this is different. Using Git, you can quickly jump back and forth between the version of the file that you changed today and the version you downloaded when you started the project. Let’s say your code was working a few minutes ago and now you’re getting some weird error message. Git will let you look at all the changes to every file you’ve changed recently.

Git won’t do this automatically. In order to begin this process, you have to “initialize a git repository” which just means telling Git to track what’s happening in a certain folder. You do this with the command git init wherever you’re interested in tracking. So let’s say you had a hot new idea for an app you wanted to write, you’d go to wherever on your computer you make new projects, create a folder for it, and then go into that folder and run git init.

You’ll get a confirmation message of the form

Initialized empty Git repository in /Users/ayyjohn/code/test/.git/

and now you’re ready to get to work!

Basic Commands

In this section we’ll be covering the following commands

git add
git commit
git status

All git commands come in the form of git something where the something is add, status, etc. You can add extra modifiers and arguments to change the behavior, but they’re not always necessary.

Now, let’s resume where we left off above: you’ve created your new app in the new_app folder and run git init, let’s say you also create a main python script, so you have the following structure

new_app
  main.py (you made this)
  .git/ (Git made this)

Git Add

git add is used to tell Git “Hey, I want to track changes to this file or folder”. If you’ve never done this before, it will be “untracked”.

The first thing you’re going to do is create the base save point of your app in git, by running git add . where the . just refers to everything in the current folder.

Notably, git add doesn’t, I repeat, does not save the changes to the file in git. Most code changes touch many files at once, and it’s rare that you’ll want to save changes in one file at a time. If you’ve written a new function in one file and you want to import it somewhere else, you’ll want both the new function and the import saved at the same time, otherwise if you go back to the state where the import exists but the function doesn’t, your code will break. So rather than create save points file by file, Git lets you group changes together in commits where a commit is just a group of changes that go together in your mind.

Git Commit

To do this, you use git commit which says “take everything that I’ve added and create a save point where all of those changes have been made.” git commit by default opens up a code editor where you can write a “commit message” which is essentially a note to anyone in the future about what happened in these changes. Committing your code often helps keep these small and easy to understand. Rather than use that editor, though, you can run git commit -m "some message" which makes your commit message "some message"

The canonical first commit in any new codebase is initial commit. It’s really useful to have a clean initial commit, especially when you’re working with big frameworks like Django that start you out with hundreds of files. Ideally the initial commit should be before you’ve made any real changes, that way by checking the difference between your code and the initial commit you can see everything you’ve done since you started the project, not including automatic set up.

Git Status

Before I close this section out, I want to talk about git status. git status is a handy command for checking to make sure that you’ve git added what you want before committing. You can run it to get a summary of all changes that will be included in the next commit (staged) and won’t be included in the next commit (unstaged). git add is used to “stage” a file for commit.

If I run git status after running git init I’ll see the following

On branch main

No commits yet

Untracked files:
  (use "git add <file>..." to include in what will be committed)
	main.py

nothing added to commit but untracked files present (use "git add" to track)

and after I run git add . and run git status again I’ll see the following

On branch main

No commits yet

Changes to be committed:
  (use "git rm --cached <file>..." to unstage)
	new file:   main.py

Great! My file is staged, let’s commit it using git commit -m "initial commit", then see what’s going on with git status once more

On branch main
nothing to commit, working tree clean

Which just means “there have been no changes since the last commit”

Basic Workflow

Awesome, ok that’s a first commit. So now let’s walk through what a typical workflow looks like when you’re working on a project by yourself. Typically the cycle looks something like

  • write or change some code
  • test that change
  • stage some or all of that code for committing using git add
  • (optionally) check to make sure everything is staged the way you want it using git status
  • commit your changes, sometimes in multiple commits using git commit -m
  • repeat

A common thing to do when creating a new python project is to create a virtualenv. Let’s say this is going to be a Flask project. I’ll create a new virtualenv venv/ using python -m venv venv (there are many other ways to do this) and after activating it, install Flask using pip install flask Then I’ll create a requirements.txt file so that future people can install the dependencies they need using pip freeze >> requirements.txt and then toss the bones of a flask app into main.py and rename it to app.py as is common and WOW would you look at all these changes I really should commit some of these.

git status tells me

On branch main
Changes not staged for commit:
  (use "git add/rm <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
	deleted:    main.py

Untracked files:
  (use "git add <file>..." to include in what will be committed)
	app.py
	requirements.txt
	venv/

no changes added to commit (use "git add" and/or "git commit -a")

so I’ll start by just doing git add app.py and git add main.py. Git is smart enough to detect that this file has just been renamed, so now it tells me

On branch main
Changes to be committed:
  (use "git restore --staged <file>..." to unstage)
	renamed:    main.py -> app.py

Untracked files:
  (use "git add <file>..." to include in what will be committed)
	requirements.txt
	venv/

so now I’ll commit this with git commit -m "rename main.py to app.py"

and go ahead and make another commit with requirements.txt

git add requirements.txt
git commit -m "create requirements file, add flask as a dependency"

Now the only thing left uncommitted is venv, but you know what? I don’t actually want to save that. It’s a huge folder containing all of the dependencies that I installed, and if I share this project I’d rather just have people install the dependencies themselves using requirements.txt But I also don’t want to have this in my git status all the time

Untracked files:
  (use "git add <file>..." to include in what will be committed)
	venv/

Introducing the gitignore

.gitignore is a file you can create that will tell git “don’t track this”. It doesn’t exist by default, so you can create it using touch .gitignore, and go ahead and add venv/ to it by simply putting venv/ at the top of it.

# .gitignore
venv/

Now when we do git status again,

On branch main
Untracked files:
  (use "git add <file>..." to include in what will be committed)
	.gitignore

nothing added to commit but untracked files present (use "git add" to track)

which shows that venv/ is being ignored by git. We’ve still got to track the gitignore, though, so we’ll do that using git add .gitignore and git commit -m "add gitignore, ignore venv"

one last git status

On branch main
nothing to commit, working tree clean

B E A Utiful.

More Advanced Things (Optional but Encouraged)

The above workflow will get you far. There’s a lot of small improvements that can be made, and a lot more commands to git. For example, git log will show you a list of commits (I use --oneline) to format it like so.

204cddc (HEAD -> main) add gitignore, ignore venv
d7bd579 create requirements file, add flask as a dependency
44013dd rename main.py to app.py
40b1dd1 initial commit

See those numbers at the start? Those are called “commit hashes” and they’re unique identifiers for each of the commits. Remember when I said we could use git to jump around in time? Let’s say the next thing I committed made it so I couldn’t use flask run because I was getting some error. I have a couple of options.

  1. Look at what’s changed between now and then using git show and git diff to see if I can figure out why it broke
  2. Roll back to a commit where I know everything was working using git checkout or git reset and try again

git show by default will show you what changed in the last commit, or you can give it one of those commit hashes to show what happened in a certain commit git show {commithash}

so if I do git show d7bd579 it’ll tell me

commit d7bd57982e11c943a2a5a96c85ffd075429a1030
Author: Alec Johnson <redacted>
Date:   Sun Mar 13 09:04:54 2022 -0700

    create requirements file, add flask as a dependency

diff --git a/requirements.txt b/requirements.txt
new file mode 100644
index 0000000..0c8d93e
--- /dev/null
+++ b/requirements.txt
@@ -0,0 +1,6 @@
+click==8.0.4
+Flask==2.0.3
+itsdangerous==2.1.1
+Jinja2==3.0.3
+MarkupSafe==2.1.0
+Werkzeug==2.0.3

which tells me I created requirements.txt and added those items to it. If I add future dependencies, or uninstall some, they’ll be tracked in this file as well so I can tell what libraries I was using at any point in time!

git diff by default will tell you the difference between what you’ve changed and the last commit, but if you give it one commit hash it will tell you the difference between now and that checkpoint. If you give it two commit hashes it will tell you everything that changed between those two checkpoints.

If using those I can’t figure out why things aren’t working, I can try things from option two above. git checkout is the analogy to having all of those save files around. By giving it a commit hash, the folder you’re in will be set to what it was like at that point in time and you can poke around as if you never made the changes since. git checkout main will bring you back where you came from.

Finally, if you really can’t figure out what happened, git reset is an option. Give it a commit hash, and it will reset your directory to that checkpoint. There are two variations, git reset --soft and git reset --hard. be careful with git reset --hard, you cannot recover things that are hard reset. Whereas using git checkout will let you jump between checkpoints, git reset will let you roll back changes made since then. A --soft reset will un-commit those things but leave the changes around for you to mess with and re-commit if you like. A --hard reset will remove the commits and change the files back to the way they were at the checkpoint you give it, so again, be very careful with this.

Branches and Collaboration

I think teaching too much about branches in an intro to git is a mistake. I may have already included too much jargon in this post and that’s without mentioning branches. Simply put, branches are a way for you to work on one version of a codebase while someone works on a different version. Git has tools that let you intelligently combine the changes once each of you has finished your work, but it’s not magical. If both of you change the same part of the same file, Git is going to ask you to manually specify which changes to keep. For projects where you’re working by yourself, working on the main branch is usually fine.

And lastly, in the beginning there’s a lot of confusion between Git, the tool, and Github, the website. Git is the version control system that exists on your computer. Github is a social networking site for sharing code. It’s got a ton of amazing functionality that allows people working on the same project to review other people’s code and approve or reject changes to open source projects where people will be collaborating from around the world. Instead of creating a brand new project on your computer, it’s common to work collaboratively on something someone else has already started. That’s where git clone comes in. git clone will let you copy the entire git history and all files of a project and do all of the awesomeness that you learned above. You can do this with pretty much any public repository on GitHub, and it’s a great way to learn about tools you currently use. You can even git clone the cPython codebase.

Just remember, once it’s in git, it’s there forever (unless you hard reset) so if you accidentally commit, say, your password, just deleting it isn’t enough! Someone can go to the commit where it was there and steal it.