======================= An introduction to Git ======================= Acknowledgements ---------------- This bootcamp session is heavily inspired by the `Software Carpentry Git lesson `__ (CC-BY-4.0). The initial motivation section is inspired by `The Git Parable `__. Motivation ---------- Even if you haven't used version control software before, you have probably encountered two types of version control: 1. Linear time-based snapshots. This can either be manually done (e.g. you append a date to documents and copy files when you want to make a new version), or automatically done (e.g. you are storing documents in Dropbox, and it stores time snapshots for the last 30 days automatically). 2. Online, live multi-person editing, like in Overleaf and Google Drive. These techniques are useful, but have limitations. Time-based snapshots often fail if you want to have multiple people editing simultaneously without massive headaches, and online live editing only works if you have permanent internet access, intermediate states are meaningful (e.g. saving mid-sentence is fine for manuscripts, but would throw a syntax error for code), and you don't actually care about saving a detailed history. For us, having robust version control without these limitations makes our coding work easier to share, easier to reproduce (how do you regenerate a plot made with a previous code version you didn't know you needed?), and easier to do shared work. In the CS community, this is a solved problem; the field has coalesced around using the version control system called ``git``. There are competing version control systems, including Subversion, Mercurial, and Fossil, but ``git`` has emerged as the clear winner with the emergence of supporting webservices Github, Sourceforge, GitLab, and others. Unfortunately for us, ``git`` is also by far the most powerful and complicated version control system, so before jumping into ``git`` commands, let's consider an example: Git is a simple, but extremely powerful system. Most people try to teach Git by demonstrating a few dozen commands and then yelling “tadaaaaa.” I believe this method is flawed. Such a treatment may leave you with the ability to use Git to perform simple tasks, but the Git commands will still feel like magical incantations. Doing anything out of the ordinary will be terrifying. Until you understand the concepts upon which Git is built, you’ll feel like a stranger in a foreign land. -- Tom Preston-Werner, `The Git Parable `__ Say you are trying to do version control for files related to a paper manuscript. You have some figures, and a main ``.docx`` .. code-block:: :class: box-spacing-override . ├─ figure1.ai ├─ figure2.ai └─ manuscript.docx How should you track the version history of this? First create a new directory/folder, which is called ``working`` here, as it is the directory you would be actively working on/editing. .. code-block:: :class: box-spacing-override . └─ working ├─ figure1.ai ├─ figure2.ai └─ manuscript.docx After some hacking in Illustrator, you're happy with the way the figures look. To save this version, you just copy the entire working folder to a ``snapshot-01`` folder and promise yourself that you won't further edit what is inside the snapshot folders. .. code-block:: :class: box-spacing-override . │─ snapshot-01 │ ├─ figure1.ai │ ├─ figure2.ai │ └─ manuscript.docx └─ working ├─ figure1.ai ├─ figure2.ai └─ manuscript.docx After you do some more work, (some directories are shown collapsed, without showing content), such as adding new figures, you might have lots of snapshots: .. code-block:: :class: box-spacing-override . │─ snapshot-01 │─ snapshot-02 │─ snapshot-03 │─ snapshot-04 │ ├─ figure1.ai │ ├─ figure2.ai │ ├─ figure3.ai │ └─ manuscript.docx └─ working ├─ figure1.ai ├─ figure2.ai ├─ figure3.ai └─ manuscript.docx How are you supposed to remember what we changed in each of these? To help yourself, you decide to add a new file, ``message.txt`` to each snapshot created. Inside the message, you decide to add the date, our name, and some free-form message that describes what happened in that version snapshot. A ``message.txt`` might look like: :: Author: Christopher Johnstone Date: Jan 11 2021 Made Figure 3 300% more aesthetic! The snapshots now look like the following. Note that we add the ``message.txt`` after we copy the working folder to a new snapshot folder. .. code-block:: :class: box-spacing-override . │─ snapshot-01 │─ snapshot-02 │─ snapshot-03 │─ snapshot-04 │─ snapshot-05 │ ├─ message.txt │ ├─ figure1.ai │ ├─ figure2.ai │ ├─ figure3.ai │ └─ manuscript.docx └─ working ├─ figure1.ai ├─ figure2.ai ├─ figure3.ai └─ manuscript.docx After some more work, you end up revising the manuscript while also trying out a new revision of ``figure1.ai``. You want to save your updated manuscript version, but the figure isn't quite ready yet. How do we save the manuscript edits into a snapshot while leaving the in-progress figure edits out of the snapshot? There isn't an obvious way to do this with this folder setup, so you decide to make another special directory called ``stage`` (as in a stage to set and later snapshot). With the edits marked with a ``*``, we initially have: .. code-block:: :class: box-spacing-override . │─ snapshot-01 │─ snapshot-02 │─ snapshot-03 │─ snapshot-04 │─ snapshot-05 │ ├─ message.txt │ ├─ figure1.ai │ ├─ figure2.ai │ ├─ figure3.ai │ └─ manuscript.docx │─ stage │ ├─ figure1.ai │ ├─ figure2.ai │ ├─ figure3.ai │ └─ manuscript.docx └─ working ├─ *figure1.ai ├─ figure2.ai ├─ figure3.ai └─ *manuscript.docx Now instead of directly copying the ``working`` folder into a snapshot, we first copy the changes we want into ``stage``, while leaving files we are still actively editing in ``working``. Copying the modified manuscript to ``stage``: .. code-block:: :class: box-spacing-override . │─ snapshot-01 │─ snapshot-02 │─ snapshot-03 │─ snapshot-04 │─ snapshot-05 │ ├─ message.txt │ ├─ figure1.ai │ ├─ figure2.ai │ ├─ figure3.ai │ └─ manuscript.docx │─ stage │ ├─ figure1.ai │ ├─ figure2.ai │ ├─ figure3.ai │ └─ *manuscript.docx └─ working ├─ *figure1.ai ├─ figure2.ai ├─ figure3.ai └─ *manuscript.docx then we make a new snapshot from ``stage``: .. code-block:: :class: box-spacing-override . │─ snapshot-01 │─ snapshot-02 │─ snapshot-03 │─ snapshot-04 │─ snapshot-05 │ ├─ message.txt │ ├─ figure1.ai │ ├─ figure2.ai │ ├─ figure3.ai │ └─ manuscript.docx │─ snapshot-06 │ ├─ message.txt │ ├─ figure1.ai │ ├─ figure2.ai │ ├─ figure3.ai │ └─ *manuscript.docx │─ stage │ ├─ figure1.ai │ ├─ figure2.ai │ ├─ figure3.ai │ └─ *manuscript.docx └─ working ├─ *figure1.ai ├─ figure2.ai ├─ figure3.ai └─ *manuscript.docx Finally, this whole setup seems to be working well, and you want to go share your work with fellow lab-mates. While you could just copy the entire folder and share created snapshots, this would be inherently linear; only one person could work on this at a certain time! To solve these issues, among others, you decide to assign arbitrary, unique names to snapshots, such as ``2a8aba``. With these arbitrary names, it's hard to tell what order you made the snapshots in, so you also add a ``parent`` field. .. code-block:: :class: box-spacing-override . │─ snapshots │ ├─ c3612a │ │─ 86ac2d │ │─ acd748 │ │─ fb9742 │ │─ 12d276 │ │ ├─ message.txt │ │ ├─ figure1.ai │ │ ├─ figure2.ai │ │ ├─ figure3.ai │ │ └─ manuscript.docx │ └─ 2a8aba │ ├─ message.txt │ ├─ figure1.ai │ ├─ figure2.ai │ ├─ figure3.ai │ └─ *manuscript.docx │─ stage │ ├─ figure1.ai │ ├─ figure2.ai │ ├─ figure3.ai │ └─ *manuscript.docx └─ working ├─ *figure1.ai ├─ figure2.ai ├─ figure3.ai └─ *manuscript.docx After this renaming, the most recent ``message.txt`` (in ``snapshot-06``, renamed to ``2a8aba``) could read: :: Snapshot: 2a8aba Parent: 12d276 Author: Christopher Johnstone Date: Jan 12 2021 Updated the manuscript with method details. Now this gets confusing now; what was the last snapshot we took? To fix that, let's start a file, conveniently called ``branches.txt``. Its contents could be: :: last: 2a8aba That way, when we are ready to make a snapshot, we use the ``last`` pointer and set that as the parent. Once we have the snapshot name, we write over the entry in ``last``. What happens if someone else happens to also be working on these files? This is fine; it just means that each person will have their own ``last`` pointer. We can extend our pointer file such that everyone can **simultaneously edit** and keep track of where they are in history by having a different 'pointer' in the branches file! Summary ******* This example has hopefully indicated a couple useful properties of a version control system: * Taking **snapshots** instead of file-level versions ensures that every file is tracked in its\ correct context. * Having multiple people work on the same project, or working on different parts of a project naturally leads to a system where we have a **non-linear, branched** history. * It is useful to have a separation between the files we are ready to snapshot (e.g. the **staged changes**) and files we are not ready to snapshot (**unstaged** or **untracked**). Keeping this in mind, we're ready to jump into Git; our little toy system is actually pretty representative of how Git keeps track of things. Git by example -------------- We will start you off learning Git by editing the very repository we are working on! Prereqs ******* You should have a graphical Git tool installed, such as GitHub Desktop. You can also use the built-in Git tools in something like VS Code. You also should have **command line git** installed; there are certain functions that are really only possible through the command line interface, even though much of it can be done through the graphical tools. Then, you should check that your Git global settings are set properly. Start off by seeing what the editor, name, and email settings you currently have. An example of the output is as follows: .. code-block:: console $ git config core.editor vim $ git config user.name Christopher Johnstone $ git config user.email meson800@gmail.com If your name and email are not set properly, set them with the following commands. Set your email equal to one that you have connected to your GitHub, so your commits are properly attributed: .. code-block:: console $ git config --global user.name "Your Name" $ git config --global user.email "your_email@invalid.com" Also, for the purposes of this bootcamp session, you should set your editor to a command line editor. Unless you know how to use ``vim``, this should probably be set to ``nano``: .. code-block:: console $ git config --global core.editor "nano -w" Finally, we'll be editing this documentation! To do so, make sure that you have Python installed and the following packages installed: .. code-block:: console $ pip install sphinx sphinx-rtd-theme sphinx-last-updated-by-git Basic Git terms --------------- Thinking back to our example earlier, we had the concept of wanting to track everything within a folder, and we took various snapshots of this folder. In Git, we call the entire set of files and folders that we are interested in a **repository**, or **repo** for short. Typically, repositories are going to include everything related to a single project. This could mean a set of analysis scripts for the same project, an entire R package that is being developed, or more exotic things like paper and figure drafts. On a brief note, Git is a version control system for **plain text files**, which is a broad category of files that are readable by humans as well as machines. While Git does still work on other so-called **binary files** (like pictures, PDFs, docx's, and so on), Git is inefficent at versioning these types of files and you lose a lot of the power of Git, like seamless merging of simultaneous changes. That being said, if it is plain text, such as all code, ``.txt`` files, LaTeX files, it could be a good fit for Git. In our analogy, we had the **working tree**, e.g. the folder we are actively editing next to folders for the **staging area** and the various snapshots we took. In Git, the files that make up the staging area and history are stored within the subdirectory ``.git``. By default, this folder is hidden, though you can show it if you want. The rest of the files in the repository folder form the **working tree**. .. image:: img/git_summary.png :align: center Downloading or starting a repository ------------------------------------ If you wanted to start a new fresh repository, you can start a new repository by calling ``git init``: .. code-block:: console $ mkdir a_new_repo $ cd a_new_repo $ git init Initialized empty Git repository in ~/a_new_repo/.git If you start a fresh repository, you must add and commit some content before trying to push to a remote like Github; otherwise there is nothing to push! You can also do this via a graphical interface. .. image:: img/git_new_repo.png :align: center The graphical interface gives the additional option to start with some content pre-generated, in particular a **copyright license** and a **.gitignore** file. We will cover the ignore file later. If you choose one of these options, you can immediately push to a remote, because there is content to push! For the purposes of this demo, we will be modifying the protocols website! If a repository already exists somewhere else, you can **clone** the repository to make a local version of it. To clone a repository, we have to know where it is located. There's many possible ways to access another remote repository; it could be on a shared drive location, or accessible on a server. By far the most common location for remotes is on websites like Github. To find the URL, we can look on the website and click the green code button: .. image:: img/github_urls.png :align: center If you have added an SSH key to your Github account, you can use the ``ssh`` URL: ``git@github.com:GallowayLabMIT/protocols.git``. If you haven't added an SSH key, you can use the https URL: ``https://github.com/GallowayLabMIT/protocols.git`` To clone a repository, you use ``git clone``. By default, it will create a new folder equal to the repository name: .. code-block:: console $ ls $ git clone https://example.com/example/example_repo.git Cloning into 'example_repo'... remote: Enumerating objects: 14, done. remote: Counting objects: 100% (14/14), done. remote: Compressing objects: 100% (11/11), done. Receie: Total 1144 (delta 3), reused 12 (delta 3), pack-reused 1130 eceiving objects: 100% (1144/1144) ving objects: 100% (1144/1144), 6.53 MiB | 16.39 MiB/s, done. Resolving deltas: 100% (562/562), done. $ ls example_repo If the remote repository is stored on Github, you can also use Github Desktop to clone a repository: .. image:: img/github_desktop_clone.png :align: center .. admonition:: Exercise 1. Create a new repository using the command line. What is the contents of the hidden ``.git`` folder? 2. Create a new repository using a graphical tool like Github Desktop 3. Delete both of these repositories. 4. Clone this protocols repo, using either the terminal or a graphical tool. .. raw:: html
Show/hide answer 1. The ``.git`` folder contains several text files encoding the current branch among other information, with several folders that store the history. .. code-block:: console $ ls .git branches config description HEAD hooks info objects refs 2. Completed with Github Desktop. 3. Completed with delition through the file manager or with ``rm`` in the terminal. 4. Unless you have SSH, the following is the correct command: .. code-block:: console $ git clone https://github.com/GallowayLabMIT/protocols.git Cloning into 'protocols'... remote: Enumerating objects: 14, done. remote: Counting objects: 100% (14/14), done. remote: Compressing objects: 100% (11/11), done. Receie: Total 1144 (delta 3), reused 12 (delta 3), pack-reused 1130 eceiving objects: 100% (1144/1144) ving objects: 100% (1144/1144), 6.53 MiB | 16.39 MiB/s, done. Resolving deltas: 100% (562/562), done. .. raw:: html
Basic history terms and git status: What's happening? ----------------------------------------------------- Now that we've cloned the protocols repository, we can start poking around. Whenever you are feeling lost, you should use the ``git status`` command. The output is generally more concise than the output of many of the graphical tools. In this case, if we run the status command we will get: .. code-block:: console $ git status On branch latest Your branch is up to date with 'origin/latest'. nothing to commit, working tree clean What does this mean? Let's take a look at an example history graph: .. figure:: img/git_history.png :align: center :width: 80% An example history is shown. Each commit has a random name (in gray) that is assigned by Git. Shown with arrows are the various **branches**, which can be thought of as movable commit names. Shown in green is the currently checked out branch; staged changes will be committed on top of this branch. This is a pretty picture; how do we actually understand what is happening in the repository we just checked out? For this, we'd like to actually view the history. We view the history using either graphical tools or using the ``git log`` command. Let's play around with the command line first! When you type ``git log``, you will be in an interactive browser, use the arrow keys to scroll up and down, and press ``q`` to exit. By default, ``git log`` prints out the entire author and commit message of the previous commits: .. code-block:: console $ git log commit 51313751defcf0798fce66c517abd0a20dd1feb1 (HEAD -> latest, origin/latest) Author: Christopher Johnstone Date: Mon Feb 15 23:38:29 2021 -0500 Removed assumptions about working directory from build script commit 5031d35ad03e53a25ab748c57dea4cbf31b8b33b Author: katiegal Date: Mon Feb 15 23:10:30 2021 -0500 N3 table alignment commit 756662c5c9f16a7adca2849863d07123a0a2815d Author: katiegal Date: Mon Feb 15 23:08:01 2021 -0500 N3 table formatting If we want a more concise view, we can tell Git this with the ``--oneline`` option: .. code-block:: console $ git log --oneline 5131375 (HEAD -> latest, origin/latest) Removed assumptions about working directory from build script 5031d35 N3 table alignment 756662c N3 table formatting 5aa87ef formating of N3 74434ec Formatting repsox table 76541e4 typos fixed in repsox protocol fa70197 Formatted RepSox info 77ac72b added RepSox concentrations e0a749e Merge branch 'master' into latest a8b29e4 Updating awesome RNA vs DNA image! 70075b9 Added BSA requirement for neurotrophics 0e96845 Removed settings.json from tracking and .vscode Here, we see that we have a short name for each commit (the first 7 characters of the full name), and just the commit message, with no author attribution. To see the whole branching layout, we can view the log with the ``--graph`` option. If we combine these two options, we can get a short graph view! .. code-block:: console $ git log --graph --oneline * 5131375 (HEAD -> latest, origin/latest) Removed assumptions about working directory from build script * 5031d35 N3 table alignment * 756662c N3 table formatting * 5aa87ef formating of N3 * 74434ec Formatting repsox table * 76541e4 typos fixed in repsox protocol * fa70197 Formatted RepSox info * 77ac72b added RepSox concentrations * e0a749e Merge branch 'master' into latest |\ | * a8b29e4 Updating awesome RNA vs DNA image! | * 70075b9 Added BSA requirement for neurotrophics * | 0e96845 Removed settings.json from tracking and .vscode * | e9ca614 Fixed formatting * | 14a7bad Use currently running python interpreter to run sphinx * | 667a652 (origin/tech_wip) Added details on no-spaces rationale * | 9a3dda0 Switched ISO date to explicitly link to xkcd * | 55bb3eb Updated shell and software tips and tricks with shell extras * | f65c1e5 Added helper script run info In this example graph view, we see that two branches came together at commit ``e0a749e``. We can use various GUI programs to get the same result! In Github Desktop, you can switch to the history tab to see a linearized history (e.g. like the default ``git log``): .. image:: img/ghd_history.png :align: center You can also run ``gitk`` in a terminal to bring up a different graphical view which shows the entire tree: .. image:: img/gitk.png :align: center In summary, ``git log`` and ``git status`` and their GUI equivalents should be your go-to tool to find out what is happening in history. If you are ever feeling confused, take a look at the log! .. admonition:: Exercise Do the following with the ``git log`` command or one of the graphical tools. 1. Find a "fork" in the history graph, where two different commit branches came together, other than the example given. Which commit did the merge occur at? 2. What was the second commit in the protocols repository? 3. Find the embedded emails of two contributors to the protocols repo. .. raw:: html
Show/hide answer 1. One example is this one, shown in graph form with Gitk: .. image:: img/gitk_branch.png :align: center 2. The second commit in the repo is the following: .. code-block:: commit 5a9a7f55fb53333ef9c642d019d9af67aa64a2a4 Author: Christopher Johnstone Date: Fri Feb 14 10:43:13 2020 -0500 Updated navigation system to use a protocols dropdown 3. Answers will vary. You should use the full git log view to find these. .. raw:: html
Branching out, adding, and checking out files --------------------------------------------- Now that we know how to view history, let's start making some history! We'll start by adding a convenient name to the commit we are on, so we don't have to refer to these snapshots/commits by their full name. We view branches and create new branches with the ``git branch`` command. Typing ``git branch`` by itself returns all of the current branches, with a star next to the currently active branch: .. code-block:: console $ git branch * latest tech_wip gh_pages Since we haven't done anything to this repository yet, we start off in the **default branch**. For the protocols repository, the default branch is ``latest``: this naming derives from a relatively common naming scheme for documentation where the most recent branch is called ``latest``. Other default branches you may see include ``master`` and ``main``; the default moving forward on our repos is ``main``. .. admonition:: Sidenote Historically, the default branch name has been called ``master``. In the summer of 2020, the Software Freedom Conservancy `recommended `__ that the branch name ``master`` be replaced with ``main`` because of its possible link to master-slave. Strictly speaking, the etymology is likely referring to **master recording** or **mastering**, both audiovisual production terms, but ultimately changing branch names is a minor change that can be made with major benefits for inclusivity in our community. I wanted to conclude that, at the end of the day, it doesn't matter where the name comes from (something that was touched upon a number of times in the thread). The fact that it has bad connotations, or inspires dread for individuals and whole communities, is reason enough to change it. -- Bastien Nocera, https://mail.gnome.org/archives/desktop-devel-list/2020-June/msg00023.html If you want to create a new branch, you do so by adding a name after the branch command: .. code-block:: console $ git branch * latest $ git branch tutorial_cj $ git branch * latest tutorial_cj Note that creating a new branch **does not switch to it**. To actually switch branches, we need to tell Git to update the directory we actually see (the **working tree**). We do this with the **checkout** command: .. code-block:: console $ git branch * latest tutorial_cj $ git checkout tutorial_cj $ git branch latest * tutorial_cj There are GUI equivalents of these commands. In Github Desktop, you get here by pushing on the branch button: .. image:: img/ghd_branches.png :align: center Inside VS Code, you can access this functionality with the drop down menu inside the source control tab: .. image:: img/vsc_branches.png :align: center .. admonition:: Exercise To do some of the later merging steps, it helps to find a partner for this step! If you don't have a partner, you can clone the repository into a separate folder to simulate multiple peoplele. 1. Have one of the partners create a new branch, deciding on a specific name between the two (suggestion: ``tutorial_xx_yy`` for initials ``xx`` and ``yy``). 2. For future setup, have the partner that created the branch run: .. code-block:: console $ git push --set-upstream origin tutorial_xx_yy Enumerating objects: 28, done. Counting objects: 100% (28/28), done. Delta compression using up to 8 threads Compressing objects: 100% (19/19), done. Writing objects: 100% (22/22), 2.03 KiB | 346.00 KiB/s, done. Total 22 (delta 13), reused 1 (delta 0), pack-reused 0 remote: Resolving deltas: 100% (13/13), completed with 5 local objects. remote: remote: Create a pull request for 'tutorial_xx_yy' on GitHub by visiting: remote: https://github.com/GallowayLabMIT/protocols/pull/new/tutorial_xx_yy remote: To https://github.com/GallowayLabMIT/protocols.git * [new branch] tutorial_xx_yy -> tutorial_xx_yy Branch 'tutorial_xx_yy' set up to track remote branch 'tutorial_xx_yy' from 'origin'. 3. Again, for future setup, have the other partner pull this update and checkout the same branch: .. code-block:: console $ git pull $ git checkout tutorial_xx_yy 3. With both a terminal and another GUI tool, repeatedly checkout/switch between ``latest`` and the new branch you created. Now how do we start making changes? Remember that when we make changes in the (working) directory, we have to explicitly tell Git which changes we would like to snapshot. We inform Git of which changes we want(e.g. we *stage* the changes) using the ``git add`` command. .. note:: Git reports changes in files in two ways. If a file has already been included in a previous snapshot, then changes to that file are marked as changes. If a file has *not* been included in a previous snapshot, then it appears as an **untracked file**. Even though these are two different types of changes, we use the same command, ``git add`` to add both of them. Assume that we are tracking the file ``test.txt`` (e.g. we have ``git add``'ed it in the past, but not ``new.txt``. Then the status command will look like: .. code-block:: console $ git status On branch latest Your branch is up to date with 'origin/latest'. Changes not staged for commit: (use "git add ..." to update what will be committed) (use "git restore ..." to discard changes in working directory) modified: test.txt Untracked files: (use "git add ..." to include in what will be committed) new.txt $ git add test.txt new.txt $ git status On branch latest Your branch is up to date with 'origin/latest'. Changes to be committed: (use "git restore --staged ..." to unstage) modified: test.txt new file: new.txt We can do the same thing in the GUI tools. In Github Desktop, changes are staged automatically: .. image:: img/ghd_staging.png :align: center VS Code has a clearer interface. In the source control tab, the new changes appear under the "Changes" column. By hovering over files, you can use the plus icon to add/stage the changes. .. image:: img/vscode_staging.png :align: center How do we remove our changes from being staged? In Github desktop, you can uncheck changes to unstage them. In VS Code, hovering over a staged change will give a minus button that unstages: .. image:: img/vscode_unstaging.png :align: center If we want to do this from the terminal, we can follow the suggestion given by ``git status``: .. code-block:: console $ git restore --staged test.txt Note that unstaging doesn't actually remove the changes, they are still in your working tree! It just removes it from the set of changes you want to snapshot. .. admonition:: Exercise 1. Edit this file (``docs/bootcamp/iap/git_intro.rst``), adding your name below: :: Christopher Johnstone, Another name? 2. Add a new protocols file. It could be something random, or something the lab needs. 3. Using the terminal, **stage** these changes, e.g. adding them to the list of changes to snapshot. 4. Using a graphical tool, unstage and then re-stage these changes. Snapshotting your work ----------------------- Now that we've staged changes to be snapshotted, we need to actually perform the snapshot. We do this with the ``git commit`` command, or through one of the tools. Let's start with the terminal! To make a new commit, you can just type ``git commit``: .. code-block:: console $ git commit [latest 2cdfda4] Edited an in-progress version of git_intro, up to commiting 2 files changed, 491 insertions(+), 9 deletions(-) When you run ``commit`` in this way, it will open up a text editor (which should be Nano, following the prereqs) and ask you to type in a message. After you are done, you can save and exit from the editor, and the commit completes. You will see a short summary, including the branch name, the short version of the commit name (the random string), the first line of your commit message, and a summary of how many files and line-based changes were made. If you have a short commit message already in mind, you can skip the editor step by directly adding a message after the ``-m`` or ``--message`` flag: .. code-block:: console $ git commit -m "Here's a short commit message!" [latest 73ca1a9] Here's a short commit message! 1 file changed, 100 insertions(+) In the various GUI tools, there is often a box to type in your commit message. For example, in VS Code, you can type in a message in the message box, adding newlines as required with enter, submitting the commit by pushing control-enter: In Github Desktop, if you made a change to a single file, it will auto-suggest a default commit message. **Don't use this default message!** The suggested commit message is not descriptive: .. image::img/ghd_default_commit_message.png :align: center Think of your commit messages as comments to either future you or other collaborators for your overall changes. We wouldn't write code comments like: .. code-block:: python pi = 3.1415 # Sets the pi variable equal to the 5-digit truncated value of pi so don't write commit comments like "Updated such and such file"; instead try to add the **why** you made this commit. .. admonition:: Exercise 1. Commit your changes, adding a relevant commit message using your interface of choice. What is the name of your newly created commit? 2. After commiting all of your changes, try committing again. What happens? 3. Make some more changes in a file, such as this one. Add and commit these changes with another interface. .. raw:: html
Show/hide answer 1. This could be done with: .. code-block:: console $ git commit -m "Your commit message here" The output of this command will include the commit name! 2. When committing with no new changes, you will get a (nice) error message: .. code-block:: console $ git commit -m "Nothing here?" On branch latest nothing to commit, working tree clean 3. Answers will vary. .. raw:: html
Ignoring files -------------- What happens to files we **don't** want to track version history on? Why would we not want to track history? A good example of files that you don't want to track is temporary files. On MacOS, browsing to a folder using Finder creates hidden ``.DS_Store`` files, which store information like the x-y position of files within the Finder window. Programming languages also generate temporary files that do not need to be tracked by Git. For example, Python will sometimes compile ``.pyc`` or ``__pycache__`` files that speed un script execution. These files, however, do not need to be tracked as they are derived from the "real" source code. Similarly, we do **not** want to track the website or PDF output of this repository in source control, given that we build these outputs from the source code anyway. Git will not track these until you add them, but they will always annoyingly sit in the ``git status`` output unless you tell Git to ignore these files. We do this by adding files we want to ignore to a special hidden file, ``.gitignore``. Each line in this file is matched against a file. If it appears on the ignore list, Git pretends that it doesn't exist, making for nice clean history! For example, the current ``.gitignore`` for this repository is as follows: :: .DS_Store # vscode ignores .vscode # Specific build gitignore output/ # Python gitignore # Byte-compiled / optimized / DLL files __pycache__/ *.py[cod] *$py.class # C extensions *.so # Distribution / packaging .Python build/ Because we use Sphinx/Python to build the website, we have several Python-specific temporary files to ignore, as well as the output folder. When creating a new repository on Github, you are given the option to start off with a language-specific ``.gitignore``. These are generally a pretty good starting place; you can always add more ignore patterns as you go along! Recovering history: reverting commits and file-level checkouts --------------------------------------------------------------- Git is a wonderful tool for storing history, so you may be wondering how we go back to certain states. In a typical Git workflow, we do not typically reset the entire branch state back to some previous commit. It is possible to do this, but it is a **destructive operation** that removes history from the tree (e.g. removes some commit nodes), so Git only lets you do it after you override several warnings. Instead, Git provides utilities to **revert entire commits** and restore specific files from previous versions. In either case, we simply add to the history (e.g. add more commits) that simply reverse previous changes. To reverse the changes introduced by a commit, you can give a commit name after ``git revert``. As an example, consider the following terminal example introducing, and then reverting a commit: .. code-block:: console $ git add test.txt $ git commit -m "Just testing" [tutorial_cj 57e1f4e] Just testing $ git revert 57e1f4e [tutorial_cj a0afdee] Revert "Just testing" 1 file changed, 1 addition(+) $ git log commit a0afdeee60ac8acb9be8066abb2a95fde70a2df8 (HEAD -> tutorial_cj) Author: Christopher Johnstone Date: Thu Feb 18 22:24:27 2021 -0500 Revert "Just testing" This reverts commit 57e1f4ea06f25a7c05c6d03fcc297efa458bc8e4. commit 57e1f4ea06f25a7c05c6d03fcc297efa458bc8e4 Author: Christopher Johnstone Date: Thu Feb 18 22:24:21 2021 -0500 Just testing Visually, what has happened is that we have gone from this tree: .. image:: img/git_pre_revert.png :align: center to this one: .. image:: img/git_post_revert.png :align: center If you'd like to use a GUI tool to revert changes, you can do so! Going to the History tab in Github Desktop and right-clicking on a commit gives you the ability to revert it: .. image:: img/ghd_revert.png :align: center How do we revert single file changes? We can use the same **checkout** command that we used to switch between branches, but this time limit it to specific files. If we specify a commit name to checkout from and a list of files, Git will fetch the state of those files from the previous commit. In the previous example, if we wanted to .. code-block:: console $ git checkout 57e1f4 test.txt Updated 1 path from 57e1f4e Most GUI tools do not let you do specific-file checkouts; you have to use the terminal for this. .. admonition:: Exercise 1. Create some change to a file in this repository; just really mash the keyboard, add this change, and commit it (New commit #1) 2. Make a second change to **another location** in the same file, add, and commit it. (New commit #2). 3. Using ``git log`` or any other tool, find name of New commit #1, and revert it. 4. Use a selective ``git checkout`` to checkout the version of the file prior to New commit #1 (e.g. before you edited it for this exercise). 5. Commit this change. 6. You've now made four commits. What does your ``git log`` look like? .. raw:: html
Show/hide answer 1. Say you edit the ``test.txt`` file. Then: .. code-block:: console $ git add test.txt $ git commit -m "Some keyboard mashing" [latest e033746] Keyboard mashing 1 file changed, 1 insertion(+) 2. After making a second change: .. code-block:: console $ git add test.txt $ git commit -m "A second change" [latest 2b2e177] A second change 1 file changed, 1 insertion(+) 3. We could look at our history, but we also see the commit name (``e033746``) after our first commit. We revert this with: .. code-block:: console $ git revert e033746 [latest ecd301c] Revert "Keyboard mashing" 1 file changed, 1 deletion(-) 4. Looking in our ``git log`` output, we see that the commit prior to "Keyboard mashing" is commit ``65c5e3f``. We checkout ``test.txt`` from there and commit: .. code-block:: console $ git checkout 65c5e3f test.txt Updated 1 path from 171189d $ git commit -m "Restored file back to its original version" [latest 2da20a3] Restored file back to its original version 1 file changed, 1 deletion(-) 5. The ``git log`` looks like: .. code-block:: console $ git log commit 2da20a30521b41bbdf973e560c7549e1be3a0a4b (HEAD -> latest) Author: Christopher Johnstone Date: Thu Feb 18 22:55:45 2021 -0500 Restored file back to its original version commit ecd301c798d162fc547661f897de277789b09d4f Author: Christopher Johnstone Date: Thu Feb 18 22:52:53 2021 -0500 Revert "Keyboard mashing" This reverts commit e03374683b76f9f2a3d70bafdb6ff257af90b0e4. commit 2b2e1778e4ce0df4c31a351b54c9e11e52c15678 Author: Christopher Johnstone Date: Thu Feb 18 22:51:00 2021 -0500 A second change commit e03374683b76f9f2a3d70bafdb6ff257af90b0e4 Author: Christopher Johnstone Date: Thu Feb 18 22:49:12 2021 -0500 Keyboard mashing The ``.git`` folder contains several text files encoding the current branch among other information, with several folders that store the history. .. raw:: html
Working with others: remotes and inevitable merge conflicts ----------------------------------------------------------- At this point, how do we share our wonderful work with others? This normally involves transferring our history with some central server. Because Git has immutable history and we tend to only add new commits to the history tree, integrating our changes with others is easy; the hard part is **merging** multiple people's changes together. Taking a quick "big-picture" break, in Git, you can connect to other versions of your repo through something called **remotes**. Any repository you want to interact with that is not your local version is called a **remote**. If you initalize a new repository, it starts off with no remotes; it doesn't know about anything else! In this case, we **cloned** the protocols repo, so we do have a remote; the location on Github! By default, the first remote is called ``origin``, as it is the origin which you cloned from. You can see details about your remotes using ``git remote``. Here, we'll add a ``-v`` flag to give more **verbose** output: .. code-block:: console $ git remote -v origin https://github.com/GallowayLabMIT/protocols.git (fetch) origin https://github.com/GallowayLabMIT/protocols.git (push) We see that Git has two (identical) URLs, one for pushing changes (e.g. local to remote) and one URL for receiving changes (remote to local). In certain weird circumstances you might want these to be different URLs, but generally it's ok that they are the same thing. To share your work with others, you **push** your branch *to the remote*. To get work from others, you **pull** the remote branch to yourself. What happens if two people have both added their own local commits to a branch? Let's find out! .. admonition:: Exercise One partner should push the shared branch first: .. code-block:: console $ git push Enumerating objects: 28, done. Counting objects: 100% (28/28), done. Delta compression using up to 8 threads Compressing objects: 100% (19/19), done. Writing objects: 100% (22/22), 2.03 KiB | 346.00 KiB/s, done. Total 22 (delta 13), reused 1 (delta 0), pack-reused 0 remote: Resolving deltas: 100% (13/13), completed with 5 local objects. * [update] tutorial_xx_yy -> tutorial_xx_yy What happens when the other person now tries to push? .. code-block:: console $ git push To https://github.com/GallowayLabMIT/protocols.git ! [rejected] tutorial_xx_yy -> tutorial_xx_yy (fetch first) error: failed to push some refs to 'https://github.com/GallowayLabMIT/protocols.git' hint: Updates were rejected because the remote contains work that you do hint: not have locally. This is usually caused by another repository pushing hint: to the same ref. You may want to first integrate the remote changes hint: (e.g., 'git pull ...') before pushing again. hint: See the 'Note about fast-forwards' in 'git push --help' for details. To explain this message, let's look at how the history changed. Initially, the branch on Github was where it was initalized at the beginning, with each person having local commits: .. image:: img/git_pre_push.png :align: center The first person's push (A) set the remote branch ``origin/tutorial_xx_yy`` equal to **their** version of ``tutorial_xx_yy``: .. image:: img/git_post_push.png :align: center Then, when the second person (B) went to push, Git rejected the push because setting ``origin/tutorial_xx_yy`` to the location of ``B/tutorial_xx_yy`` would erase A's changes! Git only allows so-called "fast-forward" pushes, where the branch pointer moves down the tree without backtracking. We solve this by having person B **pull** the changes. Starting a ``git pull`` will attempt to create a **merge commit** that has two parents: the old ``B/tutorial_xx_yy`` and ``origin/tutorial_xx_yy``: .. image:: img/git_merge_attempt.png :align: center Git is very smart about merging changes; it is capable of automatically merging relatively complicated scenarios, such as one person moving a function within a file and another person editing within the same function. If this was implemented like "Track Changes" in Word, everything would explode; the move would be marked as a deletion/insertion, which would conflict with the other edits. Luckily, Git tracks this in a sane way. However, sometimes Git cannot automatically merge changes. This most often happens when two commits edit the exact same lines within a file. When this is a case, you will get a scary-looking message when you pull: .. code-block:: console $ git pull Auto-merging docs/bootcamp/iap/git_intro.rst CONFLICT (content): Merge conflict in docs/bootcamp/iap/git_intro.rst Automatic merge failed; fix conflicts and then commit the result. This long error message is just Git informing you that it could not automatically merge. If we run ``git status``, we will get more information on what to do: .. code-block:: console $ git status $ git status On branch tutorial_xx_yy You have unmerged paths. (fix conflicts and run "git commit") (use "git merge --abort" to abort the merge) Unmerged paths: (use "git add ..." to mark resolution) both modified: docs/bootcamp/iap/git_intro.rst no changes added to commit (use "git add" and/or "git commit -a") It tells us that we have to **1)** fix the merge conflicts, **2)**, use ``git add`` to tell Git that we fixed the conflicts, and **3)**, run ``git commit`` to finish the merge. How do we know where the conflicts are? In each file listed as conflicted, Git added **conflict markers**, which are ``<<<<<<<``, ``=======``, and ``>>>>>>>``. For example, a likely conflict you will get is in the name editing portion. This means that the file now looks like: :: 1. Edit this file (``docs/bootcamp/iap/git_intro.rst``), adding your name below: :: <<<<<< HEAD Christopher Johnstone, again? ====== Christopher Johnstone >>>>>> test1 2. Add a new protocols file. It could be something random, or something the lab needs. 3. Using the terminal, **stage** these changes, e.g. adding them to the list of changes to snapshot. 4. Using a graphical tool, unstage and then re-stage these changes. Between the markers, you can see the two different versions of the file that conflicted. It is up to you, the human, to decide what to do. If you are using VS Code, these markers trigger special syntax highlighting: .. image:: img/vs_code_conflict_resolution.png :align: center If it is indeed the case that you simply want to take one change, deleting the other, you can use the options at the top to do so. **However, this is often not what you want to do**. After doing whatever you think the correct merge is, you should delete the marker lines. After we have resolved the conflict, we can finish the commit: .. code-block:: console $ git add docs/bootcamp/iap/git_intro.rst $ git commit Finally, after we have a successful merge commit, the other partner can push their branch; it is now a fast-forward and is accepted: .. image:: img/git_post_merge.png :align: center .. admonition:: Exercise .. code-block:: console $ git push Enumerating objects: 28, done. Counting objects: 100% (28/28), done. Delta compression using up to 8 threads Compressing objects: 100% (19/19), done. Writing objects: 100% (22/22), 2.03 KiB | 346.00 KiB/s, done. Total 22 (delta 13), reused 1 (delta 0), pack-reused 0 remote: Resolving deltas: 100% (13/13), completed with 5 local objects. * [update] tutorial_xx_yy -> tutorial_xx_yy .. admonition:: Exercise 1. The other partner should run ``git pull``, resolve any merge conflicts, then push. 2. After the other partner has pushed, the first partner should pull in the freshly-merged changes. You're done! Conclusion ---------- Git is much more powerful, but these form the basics. Go and celebrate; you can now collaborate with others with code, while keeping the entire history safe :)