Table of Contents

Foreword

I’ve often heard that it is easier to port a script from languages such as Ruby or Python than it is a shell script. This is probably true, since shell scripts are heavily dependent on the operating system’s environment. However, this implies that porting a shell script is difficult. I’ve found that such difficulties, more often than not, are due to a number of bad (and unfortunately common) practices that people use when writing shell scripts, rather than the language itself. This article aims to remedy this misconception by demystifying a bit of what happens underneath Bash’s hood, as well as hopefully teach some proper scripting habits.

There are a number of great resources out there to help write better shell scripts. Some of these include:

As implied above, there are a lot of a guides and tutorials out there that teach bad habits that often aren’t corrected. If you’re going to use resources not on this list, you’ll want to do plenty of research and verification on the information they provide, especially if it comes from the following sources:

Expected Knowledge

This article aims to teach how the Bash shell operates on UNIX and UNIX-like systems, including Linux. As a secondary goal, it also aims to teach proper shell scripting techniques from the ground up. I will try to explain as much as I can in this article, but to properly explore certain topics, it will require diving into some deeper technical concepts. If you feel that something can or should be clarified further, please let me know. That being said, coming into this with some basic knowledge in the following topics is a good idea:

  • To be decided

What is (not) a Shell?

It is not uncommon to see variations of phrases like “Linux Commands”, “Terminal Commands”, “Bash Commands”, “Linux Bash Commands”, etc. Technically speaking, these are all incorrect terms for running commands in a shell (for the most part). Before going further, I’d like to define some terms so it becomes more apparent what your shell is, and what your shell isn’t.

  • UNIX - Originally an operating system written for PDP-11 systems (and one of the inspirations for the C programming language), it is now a standard for how an operating system should be designed. This standard, called SUS or Single Unix Specification, provides much of the standardization across popular operating systems today.

  • POSIX - A standard for how to design an operating system that covers a number of topics, including the kernel, what programs should go into the user space, and how those programs should behave. It is part of the larger SUS standard.

  • Kernel - A very low level piece of software that interfaces with the physical hardware and initializes the operating system for more high level software.

  • Syscall - Short for system call, these are provided by the kernel to allow higher level programs to interact with it.

  • Linux - A popular kernel used by many operating systems. While it is possible to have a UNIX-certified Linux, this is usually not the case. Due to this, Linux is often referred to as UNIX-like.

  • GNU - GNU’s Not Unix. An extensive collection of free (as in freedom) software that forms an operating system. It is not uncommon for operating systems with a Linux kernel to have a GNU runtime.

  • Terminal - Originally a physical piece of hardware that sends your input (such as key presses) to the shell, and displays its output (usually on a screen). Nowadays, terminal emulators are software that mimic this functionality, and are usually referred to as terminals. They don’t run or process commands themselves (besides launching the shell). Instead they still just manage user input, and display program output.

  • Shell - A program to allow you to easily interface with and manage a computer. These can be graphical or text-based. Text-based shells (such as Bash) are the programs that actually process and execute your commands.

  • Bash - Bourne Again SHell. A shell developed by the GNU project included by default on a number of operating systems. It will also be the main shell used throughout this tutorial.

A Quick Note about Terminals

The terminals I described above are commonly referred to as dumb terminals. Modern terminal emulators can often provide more features, such as an API that the shell can query, or interpretation of various kinds of output. While these features make terminal emulators smarter than the original terminals, they still do not play any role in processing or executing commands.

Further Reading

While Bash will be the only shell covered in this tutorial, it is good to know about some of the other shells out there.

  • Dash is a POSIX-compliant shell, and is actually the system shell (/bin/sh) for a number of operating systems, including Ubuntu and CRUX.
  • Busybox provides a number of POSIX commands (including its own sh) as a single binary file.
  • Zsh provides many interactive features and a powerful scripting interface.
  • Fish is a slightly unique shell, in that it provides its own take on how to do scripting.
  • Elvish is a shell that attempts to bring modern programming to shell scripting.
  • Oil is another shell that provides more modern programming concepts to scripting.
  • Xonsh is a shell that extends Python

Login and Interactive Shells

When Bash starts up, it can be in login mode, in interactive mode, in both modes, or even in neither mode. Depending on the options passed, Bash will source from 0 or more files on startup. In shell terms, sourcing a file means to load and evaluate it in the current shell.

Login Shells

Login shells are the slightly more complex of the two. They can be requested usually in one of two ways: passing the -l flag, or making the first character of argument 0 a -. Arguments will be described in more detail later, but essentially, the following pseudo code would theoretically launch a login shell:

exec("/bin/bash", ["-bash"]);

as would this:

exec("/bin/bash", ["bash", "-l"])

Bash first tries to source /etc/profile, if it exists. Afterwards, it tries to source (in the following order) ~/.bash_profile, ~/.bash_login, and ~/.profile. The shell stops searching if it is able to successfully read from one of those 3 files.

Interactive Shells

Interactive mode is fairly easy to grasp. This is the term used when Bash presents a prompt that you can enter commands at. There are a number of ways to have Bash start interactively. Just starting Bash with no arguments will put you in interactive mode. Alternatively, it can be explicitly specified with the -i option. There are a couple of other ways this can be done, but those are the two most common. If Bash is just started as an interactive shell, and not a login shell, it will source ~/.bashrc.

In either mode, if the exit builtin is used, ~/.bash_logout is sourced if it exists.

Further Reading

Types of Commands

In Bash, there are currently 5 types of commands:

  • Aliases
  • Functions
  • External Commands
  • Builtins
  • Keywords

These will be discussed in more detail later, however let’s take a quick look at each of them.

Aliases

Aliases are very simple, and can essentially be thought of as macros. They are of the form alias name='value'. This then allows you to use name as a shorthand for value. As an example, a common alias is:

alias ll='ls -lA'

You can then use the command ll, which internally expands to the command ls with the argument -lA. Generally, you’ll want to keep your number of aliases to a minimum, as a majority of them can be better expressed as functions.

Functions

Functions are a powerful feature of Bash. They allow you to define commands in terms of other commands and the Bash scripting language. They can also take command-line arguments (unlike aliases), return exit codes, and set local variables. For example, the above alias can be expressed as a function in the following manner:

ll() {
  ls -lA "$@"
}

While this might not seem better than the alias, the differences will become more apparent as you want more out of your command.

External Commands

There is not much to be said about external commands. They are programs found on the actual filesystem that Bash searches for and executes. The program ls used above is a good example of an external command.

Builtins

Before discussing builtins (and why they’re important), let’s take a quick look at how a shell executes commands. For every program you launch, they are put into what is called a process image. This is essentially the program’s space in memory. It contains all of the data the program has defined up to that point, as well as the rest of the code for the program. When a program uses the execve syscall to launch a new program, the new program takes over the existing process image (instead of making a new one). We can see this in action with the following C program:

/* exec.c */

#include <stdio.h>
#include <unistd.h>

int main(int argc, char **argv, char **environ) {
  char *const prog[] = { "/bin/echo", "Hello, world!", NULL };
  printf("Executing a command...\n");
  execve(prog[0], prog, environ);
  printf("You won't see me.\n");
  return 0;
}

Assuming you saved it as exec.c, it can then be compiled with:

cc -o exec exec.c

Our program can then be run as ./exec:

Terminal

./exec
Executing a command...
Hello, world!

We can see that execve has replaced the code in memory for ./exec with the code for /bin/echo. Due to this, the message You won't see me. will never be seen. To create a new process image for /bin/echo to fill, we need to use the fork syscall. This creates an exact copy of the existing process image (called a child process). This can be seen with the following code:

/* fork.c */

#include <stdio.h>
#include <sys/wait.h>
#include <unistd.h>

int main(int argc, char **argv, char **environ) {
  char *const prog[] = { "/bin/echo", "Hello, from the child process!", NULL };
  printf("Executing a command...\n");

  pid_t pid = fork();

  if(pid > 0) {
    wait(NULL);
    printf("Hello, from the parent process!\n");
  } else {
    execve(prog[0], prog, environ);
  }

  printf("You will see me now.\n");
  return 0;
}

This program is slightly more complicated, but should show a basic fork attempt. The wait syscall just tells the parent process to wait until the child process has totally finished. Assuming this one was saved as fork.c, it can then be compiled like before:

cc -o fork fork.c

Running it should show all messages as expected:

Terminal

./fork
Executing a command...
Hello, from the child process!
Hello, from the parent process!
You will see me now.

The shell uses a similar flow when executing programs, so that it can continue without interruption.

Returning back to builtins, forking is an expensive process, as it is essentially a 1:1 copy of the running program. Builtins are literally built into Bash (such as echo or test), which allows it to use the command without forking, thus saving time and memory. They are similar to functions in this manner, however functions are defined in the Bash scripting language, and builtins are compiled into the Bash executable (not really, but for the purposes of this article, that is close enough). Builtins follow the same rules for processing as external commands.

Keywords

Keywords are like builtins, in that they are built into Bash. However they can alter and manipulate standard behavior. To see a good example of this, let’s take a look at the [ command. [ is the same as the test command (both of which are builtin), except that it requires a closing ]. Using the following example:

[ 5 > 4 ]

One might assume that it is testing if 5 is greater than 4, which would be true. However, since [ follows normal processing rules, Bash treats > 4 as a redirection (which will also be discussed in more detail later), creating a file named 4, and dumping the output of [ 5 ] into it. It is exactly the same as:

[ 5 ] > 4

To do the test properly, you would need to do:

[ 5 \> 4 ]

You can instead use the [[ keyword, which operates similar to [, but behaves as expected (since it doesn’t follow normal parsing rules).

Further Reading

Filenames and Paths

Filenames

One might think that filenames are a straight-forward topic, however they can be one of the biggest causes of issues, next to quoting. Filenames can contain any character, except for \0 (the NUL byte) and /. This includes spaces, tabs, *, ?, and newlines, all of which are meaningful to Bash. This isn’t considered often enough, and can lead to obscure bugs. For example, it is not uncommon to see the following construct:

for file in $(ls); do
  echo "$file"
done

We won’t delve into this example too much just yet, however it can break in a number of ways. Let’s say you had the files file one.txt file two.txt and file *.txt. Using the above code, let’s see what happens when we run it over our files:

Terminal

ls
file one.txt  file two.txt  file *.txt
for file in $(ls); do echo "$file"; done
file
one.txt
file
two.txt
file
file one.txt
file two.txt
file *.txt

Paths

Filenames can’t contain a / as that is the only character that can be used to indicate the next piece in the filepath. There are two terms used to describe paths: absolute and relative. An absolute path must begin with a /, and gives the exact location of a file or directory. Relative paths must not begin with a /, and decribes a path relative to another file (usually from the current working directory). As long as they maintain that relation, it doesn’t matter where they are located in the filesystem. For example, if you have an application looking for conf/app.conf, it doesn’t matter if the application looks from /etc/app (as long as /etc/app/conf/app.conf exists) or from /usr/etc/app (as long as /usr/etc/app/conf/app.conf exists).

The Current Working Directory

The current working directory, commonly shortened to cwd, is the directory an application is working in. For example, when you start Bash, it will start you in your home directory by default. This would make your current working directory your home directory, until you move to another one. The current working directory is also referred to as the present working directory. The . character can be used to refer to the cwd, whereas .. can be used to refer to the parent directory. If your cwd is just / (root), then .. is still just /.

File Permissions

File permissions are the bread and butter of UNIX security. They control who can access a file, and what they can do with it. Each file has 3 modes associated with it: the owner’s permissions, the group’s permissions, and everyone else’s permissions. There are three values that can describe a set of permissions: the read bit (4), the write bit (2), and the executable bit (1). These values are then totalled together for the mode. For example, let’s say we had a file owned by the user root, and the group wheel. Each mode is set to read and write only. These permissions can be described the following ways:

rw-rw-rw-
42-42-42-
666

For regular files, these permission bits act in the way you would expect. The read bit allows you to read a file, the write bit allows you to write to a file, and the executable bit allows you to execute the file as a program. Directories use the same set of bits, but work a little bit differently. The read bit means you can list files in the directory, the write bits means you can create new files in the directory, and the executable bit lets you move into that directory.

The Tilde Character

Before moving on, let’s discuss the ~ character real quick. It typically refers to the current user’s home directory, whereas ~someuser refers to the home directory of someuser. For example:

Terminal

echo ~
/home/uplime
echo ~root
/root

It is also important to know that these aren’t true paths. Instead, they are better thought of as macros. POSIX has very specific behavior for how to expand these macros, which Bash conforms to. This means that the filesystem isn’t actually aware that ~ maps to the home directory:

Terminal

echo '~'
~
ls '~'
ls: cannot access '~': No such file or directory

Since a literal ~ was given to ls (instead of Bash expanding it), it did not try to look in the home directory.

Further Reading

Executing Commands

Finally, we’re at the point of actually working with the shell. Bash can take commands in a variety of forms, however we’ll start with the simplest. This is essentially just a program name or a path to the program.

Specifying the Location to a Program

Bash can take either an absolute or a relative path to a program. If Bash is passed an absolute path, it looks there. Otherwise Bash looks in the specificied path relative to the present working directory. For example, if you had a program in ~/bin, and your present working directory was ~, both of the following methods are equivalent for executing the program:

~/bin/program

or:

bin/program

If you were to move into ~/bin, so that it becomes your present working directory, you could then execute the program as:

./program

If the file does not have the executable bit set, the program won’t execute:

Terminal

touch code/does-not-run
code/does-not-run
bash: code/does-not-run: Permission denied

The PATH Environment Variable

If the command is just a name (and not a path to a program), Bash looks through a list of :-separated directories (in order of left to right) for a matching filename (and has the executable bit set). This list is stored in an environment variable called PATH. If it doesn’t exist in any of the directories, Bash prints the error “command not found”. The default value for PATH in Bash is:

/usr/local/bin:/usr/local/sbin:/usr/bin:/usr/sbin:/bin:/sbin:.

While it isn’t always clear where to put a program, these directories have some common usecases.

  • /usr/local/bin - Programs that are specific to that machine. These tools don’t necessarily make sense across an entire cluster of machines, or they address one-off and esoteric issues.

  • /usr/local/sbin - System management programs and tools specific to that machine.

  • /usr/bin - Non-essential programs available to all users.

  • /usr/sbin - Non-essential programs used for system management, such as network services.

  • /bin - Essential programs available for all users on the system.

  • /sbin - Programs essential for system management.

  • . - Your current working directory. This negates the need to do ./program if . is in your PATH. While it is a default value, most systems ship a startup file that doesn’t include it in PATH.

Some common (external) commands you might find in PATH are:

  • ls - List all files in a directory.

  • cat - Read from standard input and write to standard output.

  • clear - Clear your terminal window.

  • cp - Copy a file from one place to another.

  • mv - Move a file from one place to another. This tool is also used to rename a single file.

Further Reading

Program Arguments

In most cases, the commands you’ll be running will take arguments. Arguments are a list of whitespace-separated strings that can be used to configure and provide additional details to a program. In their simplest form, arguments look like:

command arg1 arg2 arg3 arg4

In the example above, there are a total of 5 arguments. The program name (command in this case) is also an argument, commonly referred to as argument 0. This is due to the fact that arrays traditionally start at index 0.

Escaping Arguments

There are certain symbols and characters that have special meaning to Bash. We saw this earlier when discussing the ~ symbol. There is a large number of such symbols that Bash may or may not treat specially under any number of circumstances. To force Bash to pass any given character to a program as its literal value, it should be escaped. Escaping can be done in one of three ways: using the \ character, enclosing it in single quotes, or enclosing it in double quotes. The \ (or backslash) escapes the character immediately following it. A \\ evaluates to a literal \. If the last character of the line is a \, it escapes the newline, telling Bash to continue processing the arguments on the next line for the same command, instead of executing it. In single quotes, Bash treats everything literally until the closing '. This includes \, so it isn’t possible to nest single quotes within single quotes. Bash does have a solution to this, but it will be discussed further in the quoting section. Double quotes do allow for some escaping, meaning you can nest double quotes. Bash only allows certain characters to be escaped within double quotes however, otherwise it treats both the \ and the next character literally.

Hello, world!

Keeping with tradition, let’s do the Bash hello! This will be a short section without much explanation, as we’ve finally laid enough groundwork to see some examples in action.

Terminal

echo Hello, world! # 3 arguments
Hello, world!
echo Hello,\ world! # 2 arguments
Hello, world!
echo "Hello, world!"
Hello, world!
echo 'Hello, world!'
Hello, world!
echo 'Hello,' 'world!' # 3 arguments
Hello, world!
echo \~ '~' ~
~ ~ /home/uplime
echo Hello, \
world!
Hello, world!

Optional Parameters (Flags)

While programs are free to interpret arguments in any way they like, there are some common ways these arguments can be formatted so they are interpreted as options. These options allow you to configure the program at its startup. For each of these styles, options that don’t take a value are considered boolean and either enable or disable a feature within the program. Otherwise, the option can be thought of as a key/value pair. Let’s take a look at some of these formats.

-a -b -c value -d"another value"

This style of options is specified by POSIX, and is consequently the only portable style. Generally, when using this style, options can be grouped together as long none of them, or only the last one, take a value. For example, we could have written the above style as -ab or even -abc value. Every utility specified by POSIX that takes options uses this style (with one exception).

--option --other=value --parameter "another value"

This style, commonly called long opts (or long options), is not technically portable, but many programs will still accept options this way. In a lot of cases, programs will also usually provide an equivalent POSIX-style option. Bash itself takes options in both styles, as well as most of the GNU utilities.

-option -other=value -parameter "another value"

This style is not used often, however besides only starting with a single -, it is the same as the previous style. The openssl and gcc tools both accept options in this style.

key=value

This style is also not used often, probably due to the fact that it is difficult to convey arguments in a similar format that shouldn’t be considered as options. The dd tool is the only POSIX utility that takes options in this manner.

Positional Parameters

For optional parameters, the order doesn’t usually matter. A program might be passed the options -a -b or -b -a and still behave in the same manner. Positional parameters, on the other hand, are usually dependent on their position in relation to the other arguments (hence the name). For example, the first non-option argument passed to bash is (by default) considered a script to execute. The following positional parameters are then passed as arguments to that script. This means that bash myscript "some argument" would not behave the same as bash "some argument" myscript. While optional parameters are usually used to configure a program, positional parameters are usually treated as input data.

End of Options

In some cases, you will need to pass a positional parameter to a program that starts with a -. For example, let’s say you wanted to search for the -v flag in help printf. Just passing the flag -v to grep won’t work, as that is a standard grep option:

Terminal

help printf | grep -v
Usage: grep [OPTION]... PATTERNS [FILE]...
Try 'grep --help' for more information.

Instead, we could use the -- indicator, which tells the program that there are no more options, only positional parameters (regardless of how they’re formatted):

Terminal

help printf | grep -- -v
printf: printf [-v var] format [arguments]
      -v var  assign the output to shell variable VAR rather than

All POSIX utilities that take options (except for dd again), as well as most tools that take long options, also accept -- as the end of options indicator.

The ARG_MAX Value

On most operating systems, only so many arguments can be passed to a program. This is not a limitation of Bash, or any shell really, but of the kernel itself. This is an important distinction, because it means that an external program (such as echo) may fail due to too many arguments, whereas the builtin equivalent would work fine (since Bash doesn’t have to use execve on the builtin). The exact size of ARG_MAX is dependent on the operating system, and there isn’t always a clean way to obtain it. On my system, it can be found through the getconf tool:

Terminal

getconf ARG_MAX
262144

It is important to note that this number doesn’t mean I can have 262144 separate arguments. Rather, the total length of all arguments combined can’t be longer than 262144 characters. The environment variables passed to the program are also included in this total.

The 0th Argument

As stated before, programs are free to interpret arguments however they like. This is especially true for the 0th argument. Depending on the name, programs might enable or disable options automatically. For example, if Bash’s 0th argument is sh, it will start as a shell that closely resembles POSIX sh.

Further Reading

Pattern Matching

When working with your shell, there will probably be times when you need to operate on a group of files. It could be as specific as needing to work on all of the .txt files in the /tmp directory, or as general as all files in the current directory. Bash provides a handy feature called globals (or more commonly known as globs), which are patterns you can use to match the files you need.

A Quick Note About the Set and Shopt Builtins

The set builtin isn’t directly used in matching patterns, however it will be used a couple of times, so it is worth knowing about. Specified by POSIX, it is used to enable or disable various settings internal to the shell. The main option we’ll be using is set -x, which tells the shell to show the actual command it will be executing (among other things). It is also good to know about the set -f option, which disables most of the pattern matching we’ll be discussing.

The shopt utility similarly is used to enable or disable options as well, however these options are non-POSIX (or in other words, options specific to Bash). We will be discussing several shopt options in this section, and how they affect pattern matching.

Basic Globals

When Bash is in its default mode, with no special options enabled in set or shopt, it has a very basic set of globs. These globs are made up of 3 characters: *, ?, and [. Since Bash is the one expanding these globs into filenames, it doesn’t matter what characters the filenames contain. The shell will store them internally as separate entries. Let’s take another look at the example we saw in the Filenames and Paths section, with globs instead of ls:

Terminal

ls
file one.txt  file two.txt  file *.txt
for file in *.txt; do echo "$file"; done
file one.txt
file two.txt
file *.txt

As you can see, despite these files having special characters in their names, like * or a space, Bash expands the glob into a list that we can iterate over safely. Using set -x, we can see how Bash expands this list behind the scenes:

Terminal

set -x
echo *.txt
+ echo 'file one.txt' 'file two.txt' 'file *.txt'
file one.txt file two.txt file *.txt

The *, or star, matches 0 or more characters until the next literal is matched. So, using the example above, the star matched file one, file two, and file *, and then the literal .txt matched .txt. The ? will match exactly any one character. If we had the files file.txt and file-txt, the pattern *?txt could be used to match both. In such a pattern, * would match file, ? would match . and -, and the literal txt again matches txt.

The [, or open bracket, matches any character between it and a closing ], or close bracket. It will match anything within it to exactly 1 corresponding character in a filename. If you wanted to match any of the first 6 characters of the alphabet (lowercase), you could use the pattern [abcdef]. You can also specify an inclusive range of characters using a -. For example, the previous pattern could be rewritten as [a-f]. It is possible to stack ranges and characters as well. The pattern [A-Za-z_] will match any uppercase letter, lowercase letter, or the underscore. If you want to match a literal -, you must either escape it (with a \, single quotes, or double quotes), put it right after the open bracket, or right before the close bracket. To match a literal ], it also must be escaped, or put immediately after the open bracket. You can also have character classes within brackets, which are essentially macros that match a set of characters. POSIX specifies the following classes:

  • [:alnum:] - Matches any digit, uppercase or lowercase characters.

  • [:alpha:] - Matches any uppercase or lowercase characters.

  • [:ascii:] - Matches any character from the ASCII character set.

  • [:blank:] - Matches the space or tab characters.

  • [:cntrl:] - Matches any control character sequence.

  • [:digit:] - Matches the digits 0 through 9.

  • [:graph:] - Matches any character which has a graphic representation.

  • [:lower:] - Matches any lowercase character.

  • [:print:] - Matches the same character set as [:graph:], as well as space.

  • [:punct:] - Matches the same character set as [:graph:], except for letters and digits.

  • [:space:] - Matches all whitespace characters.

  • [:upper:] - Matches any uppercase character.

  • [:word:] - Matches any lowercase or uppercase character, digit, or the underscore.

  • [:xdigit:] - Matches any hexadecimal digit.

These character classes can be stacked as well, alongside literals and ranges. The following patterns all match the same set of characters (any uppercase or lowercase character, digit, or the underscore):

[A-Za-z0-9_]
[[:upper:]a-z0-9_]
[[:upper:][:lower:]0-9_]
[[:upper:][:lower:][:digit:]_]
[[:alnum:]_]
[[:word:]]

Patterns will match filenames in the current directory, until either the end of the pattern is reached, or a / is encountered. In the event of a /, Bash will start looking in any directories whose names match up to the /. It then treats the remaining characters as a new pattern, matching against filenames in the matched directory. This repeats for every / encountered, until the end of the pattern is reached. If a pattern ends in a /, then it only matches directories in the ending directory. Let’s take a look at this in action:

Terminal

ls
abc def ghi
ls def
jkl
ls def/jkl
hi.txt
echo */???/*.txt
def/jkl/hi.txt
echo */*/
def/jkl/

In the first pattern above, the first * matched the def directory, the ??? matched the jkl directory (within def), the last * matched hi, and of course .txt matched .txt. Bash will apply a pattern to the first series of directories and files that match all parts of a pattern. So even though the first * could have matched abc, since the rest of the pattern didn’t apply, Bash stopped considering abc.

Dotglob

By default, Bash won’t match files starting with a ., unless the pattern itself specifies files starting with a .. However, this will also include your cwd (.) and parent directory (..) in the results. For example:

Terminal

ls -A
.hidden   .some-file  another.txt document.txt  file.txt
echo *
another.txt document.txt file.txt
echo .*
. .. .hidden .some-file

Bash provides an option to include files starting with a . without explicitly starting your pattern with a ., called dotglob. It can be enabled with the shopt builtin:

shopt -s dotglob

Let’s take a look at a variation of the above example with that turned on:

Terminal

ls -A
.hidden   .some-file  another.txt document.txt  file.txt
shopt -s dotglob
echo *
.hidden .some-file another.txt document.txt file.txt

As can be seen above, your cwd and parent directory aren’t included in the results.

Nullglob

By default, if Bash can’t match any files to a pattern, it takes the pattern as a literal argument. For example:

Terminal

ls -A
.hidden   .some-file  another.txt document.txt  file.txt
echo not*a?[f]ile
not*a?[f]ile

This isn’t always desired (or even expected) behavior, especially when we get to loops. This can be changed with the nullglob option:

shopt -s nullglob

This option instructs Bash to erase the pattern if no matches occur, instead of treating it as a literal:

Terminal

shopt -s nullglob
echo not*a?[f]ile

echo not*a?[f]ile Hello, world!
Hello, world!

It is important to note, that with this option, the argument is completely gone. This means that in the last command of the above example, there were three arguments: echo, Hello,, and world!. We can see this with set -x:

Terminal

set -x
echo not*a?[f]ile Hello, world!
+ echo Hello, 'world!'
Hello, world!

Instead of passing an empty argument, Bash removes it.

Nocaseglob

This is a fairly straightforward option. It can be enabled via:

shopt -s nocaseglob

This option tells Bash to match files in a case insensitive manner:

Terminal

echo [A-D]*
[A-D]*
shopt -s nocaseglob
echo [A-D]*
another.txt document.txt

Failglob

This option will cause Bash to print out an error if no files match and immediately stops processing. It is enabled through the shopt builtin as well:

shopt -s failglob

Let’s see what happens when no files match a pattern:

Terminal

shopt -s failglob
echo not*a?[f]ile
bash: no match: not*a?[f]ile
echo not*a?[f]ile; echo Hello, world!
bash: no match: not*a?[f]ile
echo first line; echo not*a?[f]ile; echo Hello, world!
first line
bash: no match: not*a?[f]ile
echo not*a?[f]ile; echo Hello, \
world!
bash: no match: not*a?[f]ile

As seen above, it doesn’t matter what the line contains. As soon as a pattern is encountered that doesn’t match any files, Bash reports the error and stops processing immediately. This option takes precedence over nullglob.

Extglob

As usual, it is enabled via the shopt builtin:

shopt -s extglob

This option enables several additional kinds of globs for pattern matching. Each of these globs is of the form x(first|second|...). The x specifies the type of glob, and is one of the following characters: ?, *, +, @, or !. Everything between the opening ( and closing ) is a |-separated list of entries. The 5 kinds of extended globs are:

  • ?(...) - Matches 0 or 1 entries of the list to a filename.
  • *(...) - Matches 0 or more entries of the list to a filename.
  • +(...) - Matches 1 or more entries of the list to a filename.
  • @(...) - Matches 1 of the list entries to a filename.
  • !(...) - Matches anything except the entries in the list to a filename.

Let’s see some examples of this:

Terminal

shopt -s extglob
echo foo-?(a|b|c)-bar
foo-a-bar foo--bar foo-b-bar foo-c-bar
echo foo-*(a|b|c)-bar
foo-aaaa-bar foo-a-bar foo--bar foo-b-bar foo-c-bar
echo foo-+(a|b|c)-bar
foo-aaaa-bar foo-a-bar foo-b-bar foo-c-bar
echo foo-@(a|b|c)-bar
foo-a-bar foo-b-bar foo-c-bar
echo foo-!(a|b|c)-bar
foo-aaaa-bar foo--bar

Globstar

This enables a very handy glob: **. Being a shopt feature, it is enabled the usual way:

shopt -s globstar

By default, ** is treated exactly the same as *. However, with globstar enabled, it will match directories recursively. In the context of filesystems, recursively means to search within directories and any subdirectories within them, and so on and so forth. If we had the following file hierarchy:

folder1/
  folder2/
    hey.txt
    there.txt
  folder3/
    another.txt
  1.txt
folder4/
  doc.txt
folder5/
  folder6/
    list.txt
    folder7/
      movies.txt

We could then list all .txt files with the pattern **/*.txt:

Terminal

shopt -s globstar
echo **/*.txt
folder1/1.txt folder1/folder2/hey.txt folder1/folder2/there.txt folder1/folder3/another.txt folder4/doc.txt folder5/folder6/folder7/movies.txt folder5/folder6/list.txt

As with regular globs, ending a pattern in / will only match directories:

Terminal

shopt -s globstar
echo **/*/
folder1/ folder1/folder2/ folder1/folder3/ folder4/ folder5/ folder5/folder6/ folder5/folder6/folder7/

Further Reading

Controlling Input and Output

Variables

Variables are an extremely useful tool in Bash. However, for a number of reasons that we’ll discuss in this chapter, they are often misused and abused.

The Stringly Typed Language

In programming language theory, you can often refer to the underlying type system as weakly or strongly typed. I’m far from qualified to discuss either, but I am strongly of the opinion that Bash belongs in neither category. This is because Bash has an extremely simplified type system. One could even make the argument that it doesn’t really have one at all. In Bash (and most POSIX-based shells), unlike normal programming languages, almost everything is a string. Let’s take a look at the following command:

echo "Hello, world!"

Since Hello, world! was the only text wrapped in ", one might think it is the only string in the command. However, echo is actually a string as well. In Bash, the only role of quotes is to escape characters. They don’t denote or start a string literal like they do in other programming languages. Every argument passed to every command is a string. Even if the argument is all digits, it is still a string as far as Bash is concerned. This means that the following command is equivalent to the previous one:

"echo" "Hello, world!"

As is this one:

echo Hello,\ world!

Because of this, variables in Bash are essentially just strings as well. This should become more apparent as we continue on.

Assignments

Variable Scope

Environment Variables

Quoting and Word-Splitting

String Interpolation

Positional Parameters

Special Variables

Integers and Math

Further Reading