=======================
On Terminals and Shells
=======================
Motivation
==========
Knowing the basics of using a command line interface is occasionally helpful. While
a lot of software has nice GUI interfaces, making these interfaces
takes a large amount of work. Software written by a small number of people
(like research groups!) often are only accessible via a CLI.
In many cases, a command line interface can also be a faster way to do certain operations.
Even if you avoid using a terminal in daily computing, a terminal is often the only
way that you can access high performance computing clusters.
The basics
==========
Lots of words get thrown around: CLI, terminals, bash, shells, command prompts.
For these purposes:
* A **terminal** is the program that runs on your computer and
handles all of the low-level input output details. It is responsible
for drawing to the screen, getting keyboard input, handling the clipboard,
selecting fonts, and so on.
Common terminals might be the programs *Terminal* or *iTerm2* on MacOS or
*Windows Terminal*, *Powershell*, and *Command prompt* on Windows (more on this later).
* A **shell** is a command-line program (e.g. instead of interacting with us through a
graphical interface, you type stuff in line by line) that lets the user take actions like
modify files, view program output, run other command-line programs, and so on.
Inside a terminal, you run a shell. Somewhat confusingly, the default terminals
on both MacOS and Windows run a default shell, so that the shell and terminal appear
identical.
Common Unix shells (runnable on MacOS and Linux) are the original shell, ``sh``, along
with ``bash`` (common default on Linux), ``zsh`` (recent default on MacOS), and others
like ``csh`` and ``dash``. The two major Windows shells are ``cmd`` (old-school, going
back to DOS) and ``powershell`` (Windows' modern shell).
.. figure:: img/shell_terminal.png
:width: 80%
An example of the difference between terminals and shells [#]_
When a shell starts it, it displays a **prompt**, showing that it is ready for input. Once you are done
typing in your command, you hit enter to run that command.
When you run a command line program, the shell will show the prompt again when it's ready for more input.
In these notes, we will use ``$`` to represent generic prompts (and ``>`` for the default Powershell prompt);
lines without the ``$`` are example output of what you will see when you enter the command.
When typing in commands, you should *not* type in the dollar sign! Your specific shell may display
a more complicated prompt than a dollar sign, such as showing the current directory that you are currently
in.
An example is:
.. code-block:: console
$ echo "test"
test
If we have an operating-system specific command to show, we'll show them in specific
boxes, and we'll use the Powershell prompt symbol ``>`` for the Powershell examples.
.. code-block:: console
:class: bash-console
$ echo "test"
test
.. code-block:: console
:class: powershell-console
> echo "test"
test
Finally, throughout this we will talk about **files** and **directories**. Directories are more
commonly known as **folders**, but **directory** is the common shell terminology, so will be used here
for consistency.
However, saying folder is totally fine unless someone is being *really* pedantic.
Commands
--------
In a shell, you enter commands to run them. When processing a line, the line you entered is
first separated by the spaces present. Consecutive spaces are treated as a single space, with
text in quotes being represented as a single "word".
Then, the first word is taken as the command name, and looked up in the list of installed
command-line programs (e.g. see if it is present in ``PATH``). The rest of the line is
given to that program as it's **command line arguments**.
By convention, command line arguments that start with dashes are normally **options**.
By convention, these options typically either have long names and start with two dashes,
or have a "shorthand" form with a single dash and a single letter. Arguments that don't start with dashes are typically user-specified.
For example, in the following commands:
.. code-block:: console
$ git commit --message "Testing out a commit"
$ git commit -m "Testing out a commit"
the command ``git`` is called in both cases, and is given the argument list:
(``commit``, ``--message``, ``Testing out a commit``) in the first case or
(``commit``, ``-m``, ``Testing out a commit``) in the second. These two calls
are identical in this case, because ``-m`` is shorthand for ``--message``.
.. admonition:: Demo
For the command line commands:
.. code-block:: console
$ python -m venv env
$ git add testing.py module.py "explanation revised.docx"
what command is called in each case? What arguments are given to
that command?
.. raw:: html
Show/hide answer
In the first example, the program ``python`` is ran with arguments (``-m``, ``venv``, ``env``).
In the second example, the program ``git`` is ran with arguments (``add``, ``testing.py``, ``module.py``, ``explanation revised.docx``).
.. raw:: html
Interface basics
================
While the shell is minimalistic, there are three features that make our
life easier:
1. **Job control**: Pressing Control and C (notated as ``Control-C``, ``Ctrl-C``, or ``^C``) does **not** copy inside
a terminal. Instead, ``^C`` is the command to quit the actively running program. The program is given a chance to
clean up after itself (e.g. this is like hitting the exit button in a program, not force-closing it).
On Unix-derived systems (Linux and MacOS), you can additionally pause a running program by pressing ``Ctrl-Z``.
When you pause the program, you will see ``[1]+ Stopped`` and you will be back at the shell prompt. To resume
the program, type ``fg`` (to bring the program back to the **f**\ ore\ **g**\ round.
2. **Clipboard**: Because control-C does not copy, we need some other way of using the clipboard. On Mac, this is easy;
copy and paste are typically bound to Command-C and Command-V. On Windows and Linux, it depends on what terminal you
are using. A typical copy/paste solution binds the keyboard to right-click. On standard Powershell, you can select
a region with your mouse and press enter to copy that to the clipboard. To paste, right click inside the terminal region.
3. **Tab completion**: Typing out full file names gets tiring, especially when you have long file names or
deeply nested directory structures. Tab completion saves us: if you have a filename partially written, hitting
``Tab`` attempts to auto-fill the rest of the name. If there is a single unique file that matches what
you have typed so far, that name is filled. If there are multiple files that might match, then the
exact behavior differs per shell. Bash and similar shells will typically complete as much of the name as possible,
but then stop. If you double-tap ``Tab`` in Bash, it will print a list of all possible matching files. In Powershell,
pressing ``Tab`` cycles between files that match.
4. **History**: Retyping common commands also gets tiresome. You can access your command line history (e.g. the previous
lines you have typed) by pressing the up and down arrows.
Starting off at home
====================
When you first open a terminal, your shell will likely start off in your **home directory**, also known in shorthand
as ``~``. Each user has its own home directory. All of the user directories that you are used to accessing through
Windows Explorer or Finder, such as ``Desktop``, ``Downloads``, or ``Documents`` are subdirectories of your home directory.
The actual location of your home directory differs,
but is typically something like ``C:\Users\Username`` on Windows, ``/users/Username`` on MacOS, and
``/home/Username`` on most Linuxes.
So that we don't have to type that large thing every time, ``~`` is short-hand notation for whatever your home
directory is. That is, the location of your downloads folder could be written either as ``C:\Users\Username\Downloads``
or more simply as ``~\Downloads``
.. note::
You may have noticed earlier that these directory paths have been written differently between the
two operating systems. In short, due to backwards compatibility, Windows uses the backslash ``\``
as the path separator (written between directory names), whereas all Unix-derived operating systems
including MacOS and Android use the forward slash ``/`` as the path separator.
Most of the time you can just use the forward slash without worry; ``powershell`` on Windows will auto-convert
from forward slashes to backslashes if you use forward slashes, but when programming you should keep this in
mind and not manually use slashes when constructing paths to filenames. It still may work, but you should
ideally use filesystem-aware techniques, like using ``os.path`` or ``pathlib`` in Python.
The shell has a current location; imagine it as having a Finder/Explorer window open to some directory
on your computer. This current location is called the (current) **working directory**. This is
How do we know what directory we are in while using the shell? Our first command we will learn is ``pwd``:
.. admonition:: Command: ``pwd``
``pwd`` stands for **print working directory**, and does just that; it tells you what your current
location is, in full detail (e.g. the entire path, not in shorthand). If you ever get lost, just type ``pwd``!
This output is similar across operating systems; it is a little more verbose in Powershell.
Example output right after launch, so that you are starting in your home directory:
.. code-block:: console
:class: bash-console
$ pwd
/users/username
.. code-block:: console
:class: powershell-console
> pwd
Path
----
C:\Users\Username
If we want to know what is inside the current directory, we can use ``ls``:
.. admonition:: Command: ``ls``
``ls`` stands for **list**, and lists every file and directory inside the current working
directory.
If you were to run it in your home directory, you might get something like:
.. code-block:: console
$ ls
Desktop Downloads Pictures
Documents Music Videos
If you want to see what is inside one of these directories, ``ls``
takes command line arguments specifying which directory you'd like the view:
.. code-block:: console
$ ls Documents
10-50 10-40 10-34
research
To view **hidden files** (on MacOS/Linux, these are files/directories that start with a period;
on Windows, these are files/directories with a hidden attribute set), we need to pass
``ls`` a command line option. This differs between shells, but on bash/zsh/etc, you use ``--all`` or ``-a``
to show hidden files as well:
.. code-block:: console
:class: bash-console
$ ls -a
. Desktop Music Videos
.. Documents Pictures
.bashrc Downloads .profile
In Powershell, we pass the option ``-Force``:
.. code-block:: console
:class: powershell-console
> ls -Force
Directory: C:\Users\username
Mode LastWriteTime Length Name
---- ------------- ------ ----
d--h-- 1/11/2021 7:19 PM .git
d----- 1/11/2021 7:19 PM Desktop
d----- 1/11/2021 7:19 PM Documents
d----- 1/11/2021 7:19 PM Downloads
d----- 1/11/2021 7:19 PM Music
d----- 1/11/2021 7:19 PM Pictures
d----- 1/11/2021 7:19 PM Videos
Moving away from home
=====================
To move what directory we are in, we can use ``cd``:
.. admonition:: Command: ``cd``
``cd`` stands for **change directory**, and switches the current working directory
to whatever directory you give it. This is the major way that you move around
the various directories to find files.
.. code-block:: console
$ pwd # Start off in your home directory
/Users/username/
$ cd Downloads # move into the Downloads directory
$ pwd
/Users/username/Downloads
$ cd ~ # return the the home directory
$ pwd
/Users/username
Relative and absolute paths
===========================
The earlier examples have hinted at the existence of two types of paths/ways
to reference files.
The first is using an **absolute path**; this is what we call specifying
the entire path from the filesystem "root" to the file of interest. On Windows,
this means paths like ``C:\Users\username\Downloads``, where we specify the
drive followed by every path component.
On MacOS and Linux, absolute paths start at the root, which is the special
name given to the path ``/``, so absolute paths look like ``/Users/username/Downloads``.
In contrast, **relative paths** allow you to more concisely reference files and directories,
as the paths are calculated relative to the current working directory.
It is fairly intuitive how this works for going into subdirectories; just specify the
subdirectory name. To be able to reference directories "above" yourself in the tree, we need some
way to reference these parent directories.
Luckily this is standardized; there are two special pseudo-directories accessible everywhere
on the filesystem; the 'current directory' ``.`` and the 'parent directory' ``..``. The current directory
is always a sort of empty operation, but is useful if you want to run scripts in the same directory
as yourself.
When these are passed to a command, they are evaluated starting at the current working directory.
Say that we start off in our downloads directory, ``/Users/username/Downloads``. Then changing directory
to relative directory `..` means going one step "up", to ``/Users/username``
.. code-block:: console
$ pwd
/Users/username/Downloads
$ cd ..
$ pwd
/Users/username
We can go up multiple layers at a time by combining these pseudo-directories together. For example,
to go up two directories to ``/Users`` from ``/users/username/Downloads``, you could just write
``cd ../..``.
You can actually combine absolute and relative paths; the parent directory ``..`` will always
go "up" a directory, effectively removing what comes to the left if combined in this way. For example,
the paths ``/users/username`` and ``/users/username/Desktop/..`` both point to the same thing.
.. admonition:: Demo
If your shell starts in the Downloads directory ``/Users/amanda/Downloads``, which of the following will
navigate to the directory ``/Users/amanda/data``? ``/Users/amanda`` is your home directory. [#]_
1. ``cd .``
2. ``cd /``
3. ``cd /Users/amanda/data``
4. ``cd ../../``
5. ``cd home/data``
6. ``cd ../data``
7. ``cd ~/data``
.. raw:: html
Show/hide answer
The 3rd, 6th, and 7th examples will navigate to the proper directory.
1. ``.`` will stay in the same directory, ``/Users/amanda/Downloads``
2. ``/`` is an absolute path to the filesystem root, not the correct directory.
3. ``/Users/amanda/data`` is the full absolute path to the desired directory, so this works.
4. ``../../`` evaluates to ``/Users``, the wrong directory.
5. ``home/data`` will give an error, as it tries to navigate to ``/Users/amanda/Downloads/home/data``
6. ``../data`` evaluates to ``/Users/amanda/Downloads``, the correct path.
7. ``~/data`` also works, as ``~`` expands to ``/Users/amanda``
.. raw:: html
File operations
===============
Now that we can navigate around, we can learn file operations. The first is conceptually
the simplest, as it creates a new directory:
.. admonition:: Command: ``mkdir``
``mkdir`` stands for **make directory**. It creates a directory
name equal to that of the argument it gets passed.
.. code-block:: console
$ pwd # Start off in the home directory
/Users/username/
$ cd test # try moving into the test directory; it fails!
cd: test: No such file or directory
$ mkdir test # Create the test directory
$ cd test # Now the cd succeeds
Importantly, ``mkdir`` on Linux/MacOS can only make a single directory by default,
so an error will occur if you try to create nested directories in one
command (e.g. if we want to create the directories ``test/inner_test``
without first creating ``test``, we'll get an error). Powershell
does not have this limitation.
If we do want to create multiple nested directories, we can use the ``-p``
or ``--parent`` flag to tell ``mkdir`` that it is allowed to create parent
directories if they don't exist.
.. code-block: console
:class: bash-console
$ mkdir test/inner_test # This fails because directory 'test' doesn't exist yet
mkdir: cannot create directory 'test/inner_test': No such file or directory
$ mkdir -p test/inner_test # This works because we add the parent flag
Now that we can create directories/folders, how do we actually move files around? Using `mv`!
.. admonition:: Command: ``mv``
``mv`` stands for **move**, and takes at least two arguments.
We use ``mv`` to both move and rename files (renaming is just moving!).
The input arguments are ``mv