UNIX Shell Scripting with Bash
Table of Contents
- Foreword
- Recommended Resources
- Expected Knowledge
- What is (not) a Shell?
- Popular Shells
- Login and Interactive Shells
- Types of Commands
- Filenames and Paths
- Executing Commands
- Program Arguments
- Pattern Matching
- Controlling Input and Output
- Variables
Foreword
I’ve often heard that it is easier to port a script from languages such as Ruby or Python than it is a shell script. This is probably true, since shell scripts are heavily dependent on the operating system’s environment. However, this implies that porting a shell script is difficult. I’ve found that such difficulties, more often than not, are due to a number of bad (and unfortunately common) practices that people use when writing shell scripts, rather than the language itself. This article aims to remedy this misconception by demystifying a bit of what happens underneath Bash’s hood, as well as hopefully teach some proper scripting habits.
Recommended Resources
There are a number of great resources out there to help write better shell scripts. Some of these include:
- #bash on freenode (Webchat)
- The Wooledge Wiki
- Bash Guide
- Bash Hackers Wiki
- Bash Reference Manual
- Shellcheck
- Rich’s sh (POSIX shell) tricks
As implied above, there are a lot of a guides and tutorials out there that teach bad habits that often aren’t corrected. If you’re going to use resources not on this list, you’ll want to do plenty of research and verification on the information they provide, especially if it comes from the following sources:
Expected Knowledge
This article aims to teach how the Bash shell operates on UNIX and UNIX-like systems, including Linux. As a secondary goal, it also aims to teach proper shell scripting techniques from the ground up. I will try to explain as much as I can in this article, but to properly explore certain topics, it will require diving into some deeper technical concepts. If you feel that something can or should be clarified further, please let me know. That being said, coming into this with some basic knowledge in the following topics is a good idea:
- To be decided
What is (not) a Shell?
It is not uncommon to see variations of phrases like “Linux Commands”, “Terminal Commands”, “Bash Commands”, “Linux Bash Commands”, etc. Technically speaking, these are all incorrect terms for running commands in a shell (for the most part). Before going further, I’d like to define some terms so it becomes more apparent what your shell is, and what your shell isn’t.
-
UNIX - Originally an operating system written for PDP-11 systems (and one of the inspirations for the C programming language), it is now a standard for how an operating system should be designed. This standard, called SUS or Single Unix Specification, provides much of the standardization across popular operating systems today.
-
POSIX - A standard for how to design an operating system that covers a number of topics, including the kernel, what programs should go into the user space, and how those programs should behave. It is part of the larger SUS standard.
-
Kernel - A very low level piece of software that interfaces with the physical hardware and initializes the operating system for more high level software.
-
Syscall - Short for system call, these are provided by the kernel to allow higher level programs to interact with it.
-
Linux - A popular kernel used by many operating systems. While it is possible to have a UNIX-certified Linux, this is usually not the case. Due to this, Linux is often referred to as UNIX-like.
-
GNU - GNU’s Not Unix. An extensive collection of free (as in freedom) software that forms an operating system. It is not uncommon for operating systems with a Linux kernel to have a GNU runtime.
-
Terminal - Originally a physical piece of hardware that sends your input (such as key presses) to the shell, and displays its output (usually on a screen). Nowadays, terminal emulators are software that mimic this functionality, and are usually referred to as terminals. They don’t run or process commands themselves (besides launching the shell). Instead they still just manage user input, and display program output.
-
Shell - A program to allow you to easily interface with and manage a computer. These can be graphical or text-based. Text-based shells (such as Bash) are the programs that actually process and execute your commands.
-
Bash - Bourne Again SHell. A shell developed by the GNU project included by default on a number of operating systems. It will also be the main shell used throughout this tutorial.
A Quick Note about Terminals
The terminals I described above are commonly referred to as dumb terminals. Modern terminal emulators can often provide more features, such as an API that the shell can query, or interpretation of various kinds of output. While these features make terminal emulators smarter than the original terminals, they still do not play any role in processing or executing commands.
Further Reading
- Single UNIX Specification
- UNIX®-Certified Linux-Based Operating Systems
- PDP-11
- Computer terminal
- GNU Bash
- Linux Syscall Reference
Popular Shells
While Bash will be the only shell covered in this tutorial, it is good to know about some of the other shells out there.
- Dash is a POSIX-compliant
shell, and is actually the system shell (
/bin/sh
) for a number of operating systems, including Ubuntu and CRUX. - Busybox provides a number of POSIX commands (including its own sh) as a single binary file.
- Zsh provides many interactive features and a powerful scripting interface.
- Fish is a slightly unique shell, in that it provides its own take on how to do scripting.
- Elvish is a shell that attempts to bring modern programming to shell scripting.
- Oil is another shell that provides more modern programming concepts to scripting.
- Xonsh is a shell that extends Python
Login and Interactive Shells
When Bash starts up, it can be in login mode, in interactive mode, in both modes, or even in neither mode. Depending on the options passed, Bash will source from 0 or more files on startup. In shell terms, sourcing a file means to load and evaluate it in the current shell.
Login Shells
Login shells are the slightly more complex of the two. They can be requested
usually in one of two ways: passing the -l
flag, or making the first character
of argument 0 a -
. Arguments will be described in more detail later, but
essentially, the following pseudo code would theoretically launch a login shell:
exec("/bin/bash", ["-bash"]);
as would this:
exec("/bin/bash", ["bash", "-l"])
Bash first tries to source /etc/profile
, if it exists. Afterwards, it tries to
source (in the following order) ~/.bash_profile
, ~/.bash_login
, and
~/.profile
. The shell stops searching if it is able to successfully read from
one of those 3 files.
Interactive Shells
Interactive mode is fairly easy to grasp. This is the term used when Bash
presents a prompt that you can enter commands at. There are a number of ways to
have Bash start interactively. Just starting Bash with no arguments will put you
in interactive mode. Alternatively, it can be explicitly specified with the -i
option. There are a couple of other ways this can be done, but those are the two
most common. If Bash is just started as an interactive shell, and not a login
shell, it will source ~/.bashrc
.
In either mode, if the exit
builtin is used, ~/.bash_logout
is sourced if it
exists.
Further Reading
Types of Commands
In Bash, there are currently 5 types of commands:
- Aliases
- Functions
- External Commands
- Builtins
- Keywords
These will be discussed in more detail later, however let’s take a quick look at each of them.
Aliases
Aliases are very simple, and can essentially be thought of as macros. They are
of the form alias name='value'
. This then allows you to use name
as a
shorthand for value
. As an example, a common alias is:
alias ll='ls -lA'
You can then use the command ll
, which internally expands to the command ls
with the argument -lA
. Generally, you’ll want to keep your number of aliases
to a minimum, as a majority of them can be better expressed as functions.
Functions
Functions are a powerful feature of Bash. They allow you to define commands in terms of other commands and the Bash scripting language. They can also take command-line arguments (unlike aliases), return exit codes, and set local variables. For example, the above alias can be expressed as a function in the following manner:
ll() {
ls -lA "$@"
}
While this might not seem better than the alias, the differences will become more apparent as you want more out of your command.
External Commands
There is not much to be said about external commands. They are programs found on
the actual filesystem that Bash searches for and executes. The program ls
used
above is a good example of an external command.
Builtins
Before discussing builtins (and why they’re important), let’s take a quick look
at how a shell executes commands. For every program you launch, they are put
into what is called a process image. This is essentially the program’s space in
memory. It contains all of the data the program has defined up to that point, as
well as the rest of the code for the program. When a program uses the execve
syscall to launch a new program, the new program takes over the existing process
image (instead of making a new one). We can see this in action with the
following C program:
/* exec.c */
#include <stdio.h>
#include <unistd.h>
int main(int argc, char **argv, char **environ) {
char *const prog[] = { "/bin/echo", "Hello, world!", NULL };
printf("Executing a command...\n");
execve(prog[0], prog, environ);
printf("You won't see me.\n");
return 0;
}
Assuming you saved it as exec.c
, it can then be compiled with:
cc -o exec exec.c
Our program can then be run as ./exec
:
Terminal
./exec Executing a command... Hello, world!
We can see that execve
has replaced the code in memory for ./exec
with the
code for /bin/echo
. Due to this, the message You won't see me.
will never be
seen. To create a new process image for /bin/echo
to fill, we need to use the
fork
syscall. This creates an exact copy of the existing process image (called
a child process). This can be seen with the following code:
/* fork.c */
#include <stdio.h>
#include <sys/wait.h>
#include <unistd.h>
int main(int argc, char **argv, char **environ) {
char *const prog[] = { "/bin/echo", "Hello, from the child process!", NULL };
printf("Executing a command...\n");
pid_t pid = fork();
if(pid > 0) {
wait(NULL);
printf("Hello, from the parent process!\n");
} else {
execve(prog[0], prog, environ);
}
printf("You will see me now.\n");
return 0;
}
This program is slightly more complicated, but should show a basic fork attempt.
The wait
syscall just tells the parent process to wait until the child process
has totally finished. Assuming this one was saved as fork.c
, it can then be
compiled like before:
cc -o fork fork.c
Running it should show all messages as expected:
Terminal
./fork Executing a command... Hello, from the child process! Hello, from the parent process! You will see me now.
The shell uses a similar flow when executing programs, so that it can continue without interruption.
Returning back to builtins, forking is an expensive process, as it is
essentially a 1:1 copy of the running program. Builtins are literally built into
Bash (such as echo
or test
), which allows it to use the command without
forking, thus saving time and memory. They are similar to functions in this
manner, however functions are defined in the Bash scripting language, and
builtins are compiled into the Bash executable (not really, but for the purposes
of this article, that is close enough). Builtins follow the same rules for
processing as external commands.
Keywords
Keywords are like builtins, in that they are built into Bash. However they can
alter and manipulate standard behavior. To see a good example of this, let’s
take a look at the [
command. [
is the same as the test
command (both of
which are builtin), except that it requires a closing ]
. Using the following
example:
[ 5 > 4 ]
One might assume that it is testing if 5 is greater than 4, which would be true.
However, since [
follows normal processing rules, Bash treats > 4
as a
redirection (which will also be discussed in more detail later), creating a file
named 4, and dumping the output of [ 5 ]
into it. It is exactly the same as:
[ 5 ] > 4
To do the test properly, you would need to do:
[ 5 \> 4 ]
You can instead use the [[
keyword, which operates similar to [
, but behaves
as expected (since it doesn’t follow normal parsing rules).
Further Reading
- Aliases
- Shell Functions
- Shell Builtin Commands
- What (really) happens when you type ls -l in the shell
- Process images
- Von Neumann architecture
Filenames and Paths
Filenames
One might think that filenames are a straight-forward topic, however they can be
one of the biggest causes of issues, next to quoting. Filenames can contain any
character, except for \0
(the NUL byte) and /
. This includes spaces, tabs,
*
, ?
, and newlines, all of which are meaningful to Bash. This isn’t
considered often enough, and can lead to obscure bugs. For example, it is not
uncommon to see the following construct:
for file in $(ls); do
echo "$file"
done
We won’t delve into this example too much just yet, however it can break in a
number of ways. Let’s say you had the files file one.txt
file two.txt
and
file *.txt
. Using the above code, let’s see what happens when we run it over
our files:
Terminal
ls file one.txt file two.txt file *.txt for file in $(ls); do echo "$file"; done file one.txt file two.txt file file one.txt file two.txt file *.txt
Paths
Filenames can’t contain a /
as that is the only character that can be used to
indicate the next piece in the filepath. There are two terms used to describe
paths: absolute and relative. An absolute path must begin with a /
, and gives
the exact location of a file or directory. Relative paths must not begin with a
/
, and decribes a path relative to another file (usually from the current
working directory). As long as they maintain that relation, it doesn’t matter
where they are located in the filesystem. For example, if you have an
application looking for conf/app.conf
, it doesn’t matter if the application
looks from /etc/app
(as long as /etc/app/conf/app.conf
exists) or from
/usr/etc/app
(as long as /usr/etc/app/conf/app.conf
exists).
The Current Working Directory
The current working directory, commonly shortened to cwd, is the directory an
application is working in. For example, when you start Bash, it will start you
in your home directory by default. This would make your current working
directory your home directory, until you move to another one. The current
working directory is also referred to as the present working directory. The .
character can be used to refer to the cwd, whereas ..
can be used to refer to
the parent directory. If your cwd is just /
(root), then ..
is still just
/
.
File Permissions
File permissions are the bread and butter of UNIX security. They control who can
access a file, and what they can do with it. Each file has 3 modes associated
with it: the owner’s permissions, the group’s permissions, and everyone else’s
permissions. There are three values that can describe a set of permissions:
the read bit (4), the write bit (2), and the executable bit (1).
These values are then totalled together for the mode. For example, let’s say we
had a file owned by the user root
, and the group wheel
. Each mode is set to
read and write only. These permissions can be described the following ways:
rw-rw-rw-
42-42-42-
666
For regular files, these permission bits act in the way you would expect. The read bit allows you to read a file, the write bit allows you to write to a file, and the executable bit allows you to execute the file as a program. Directories use the same set of bits, but work a little bit differently. The read bit means you can list files in the directory, the write bits means you can create new files in the directory, and the executable bit lets you move into that directory.
The Tilde Character
Before moving on, let’s discuss the ~
character real quick. It typically
refers to the current user’s home directory, whereas ~someuser
refers to the
home directory of someuser
. For example:
Terminal
echo ~ /home/uplime echo ~root /root
It is also important to know that these aren’t true paths. Instead, they are
better thought of as macros. POSIX has very specific behavior for how to expand
these macros, which Bash conforms to. This means that the filesystem isn’t
actually aware that ~
maps to the home directory:
Terminal
echo '~' ~ ls '~' ls: cannot access '~': No such file or directory
Since a literal ~
was given to ls
(instead of Bash expanding it), it did not
try to look in the home directory.
Further Reading
- How to use unix domain socket without creating a socket file
- 3.170 Filename
- Absolute and relative paths
- Filesystem Hierarchy Standard
- Traditional Unix permissions
- Tilde Expansion
Executing Commands
Finally, we’re at the point of actually working with the shell. Bash can take commands in a variety of forms, however we’ll start with the simplest. This is essentially just a program name or a path to the program.
Specifying the Location to a Program
Bash can take either an absolute or a relative path to a program. If Bash is
passed an absolute path, it looks there. Otherwise Bash looks in the specificied
path relative to the present working directory. For example, if you had a
program in ~/bin
, and your present working directory was ~
, both of the
following methods are equivalent for executing the program:
~/bin/program
or:
bin/program
If you were to move into ~/bin
, so that it becomes your present working
directory, you could then execute the program as:
./program
If the file does not have the executable bit set, the program won’t execute:
Terminal
touch code/does-not-run code/does-not-run bash: code/does-not-run: Permission denied
The PATH Environment Variable
If the command is just a name (and not a path to a program), Bash looks through
a list of :
-separated directories (in order of left to right) for a matching
filename (and has the executable bit set). This list is stored in an environment
variable called PATH. If it doesn’t exist in any of the directories, Bash prints
the error “command not found”. The default value for PATH in Bash is:
/usr/local/bin:/usr/local/sbin:/usr/bin:/usr/sbin:/bin:/sbin:.
While it isn’t always clear where to put a program, these directories have some common usecases.
-
/usr/local/bin
- Programs that are specific to that machine. These tools don’t necessarily make sense across an entire cluster of machines, or they address one-off and esoteric issues. -
/usr/local/sbin
- System management programs and tools specific to that machine. -
/usr/bin
- Non-essential programs available to all users. -
/usr/sbin
- Non-essential programs used for system management, such as network services. -
/bin
- Essential programs available for all users on the system. -
/sbin
- Programs essential for system management. -
.
- Your current working directory. This negates the need to do./program
if.
is in your PATH. While it is a default value, most systems ship a startup file that doesn’t include it in PATH.
Some common (external) commands you might find in PATH are:
-
ls
- List all files in a directory. -
cat
- Read from standard input and write to standard output. -
clear
- Clear your terminal window. -
cp
- Copy a file from one place to another. -
mv
- Move a file from one place to another. This tool is also used to rename a single file.
Further Reading
Program Arguments
In most cases, the commands you’ll be running will take arguments. Arguments are a list of whitespace-separated strings that can be used to configure and provide additional details to a program. In their simplest form, arguments look like:
command arg1 arg2 arg3 arg4
In the example above, there are a total of 5 arguments. The program name
(command
in this case) is also an argument, commonly referred to as argument
0. This is due to the fact that arrays traditionally start at index 0.
Escaping Arguments
There are certain symbols and characters that have special meaning to Bash. We
saw this earlier when discussing the ~
symbol. There is a large number of such
symbols that Bash may or may not treat specially under any number of
circumstances. To force Bash to pass any given character to a program as its
literal value, it should be escaped. Escaping can be done in one of three ways:
using the \
character, enclosing it in single quotes, or enclosing it in
double quotes. The \
(or backslash) escapes the character immediately
following it. A \\
evaluates to a literal \
. If the last character of the
line is a \
, it escapes the newline, telling Bash to continue processing the
arguments on the next line for the same command, instead of executing it. In
single quotes, Bash treats everything literally until the closing '
. This
includes \
, so it isn’t possible to nest single quotes within single quotes.
Bash does have a solution to this, but it will be discussed further in the
quoting section. Double quotes do allow for some escaping, meaning you can nest
double quotes. Bash only allows certain characters to be escaped within double
quotes however, otherwise it treats both the \
and the next character
literally.
Hello, world!
Keeping with tradition, let’s do the Bash hello! This will be a short section without much explanation, as we’ve finally laid enough groundwork to see some examples in action.
Terminal
echo Hello, world! # 3 arguments Hello, world! echo Hello,\ world! # 2 arguments Hello, world! echo "Hello, world!" Hello, world! echo 'Hello, world!' Hello, world! echo 'Hello,' 'world!' # 3 arguments Hello, world! echo \~ '~' ~ ~ ~ /home/uplime echo Hello, \ world! Hello, world!
Optional Parameters (Flags)
While programs are free to interpret arguments in any way they like, there are some common ways these arguments can be formatted so they are interpreted as options. These options allow you to configure the program at its startup. For each of these styles, options that don’t take a value are considered boolean and either enable or disable a feature within the program. Otherwise, the option can be thought of as a key/value pair. Let’s take a look at some of these formats.
-a -b -c value -d"another value"
This style of options is specified by POSIX, and is consequently the only
portable style. Generally, when using this style, options can be grouped
together as long none of them, or only the last one, take a value. For example,
we could have written the above style as -ab
or even -abc value
. Every
utility specified by POSIX that takes options uses this style (with one
exception).
--option --other=value --parameter "another value"
This style, commonly called long opts (or long options), is not technically portable, but many programs will still accept options this way. In a lot of cases, programs will also usually provide an equivalent POSIX-style option. Bash itself takes options in both styles, as well as most of the GNU utilities.
-option -other=value -parameter "another value"
This style is not used often, however besides only starting with a single -
,
it is the same as the previous style. The openssl
and gcc
tools both accept
options in this style.
key=value
This style is also not used often, probably due to the fact that it is difficult
to convey arguments in a similar format that shouldn’t be considered as options.
The dd
tool is the only POSIX utility that takes options in this manner.
Positional Parameters
For optional parameters, the order doesn’t usually matter. A program might be
passed the options -a -b
or -b -a
and still behave in the same manner.
Positional parameters, on the other hand, are usually dependent on their
position in relation to the other arguments (hence the name). For example, the
first non-option argument passed to bash
is (by default) considered a script
to execute. The following positional parameters are then passed as arguments to
that script. This means that bash myscript "some argument"
would not behave
the same as bash "some argument" myscript
. While optional parameters are
usually used to configure a program, positional parameters are usually treated
as input data.
End of Options
In some cases, you will need to pass a positional parameter to a program that
starts with a -
. For example, let’s say you wanted to search for the -v
flag in help printf
. Just passing the flag -v
to grep
won’t work, as that
is a standard grep
option:
Terminal
help printf | grep -v Usage: grep [OPTION]... PATTERNS [FILE]... Try 'grep --help' for more information.
Instead, we could use the --
indicator, which tells the program that there are
no more options, only positional parameters (regardless of how they’re
formatted):
Terminal
help printf | grep -- -v printf: printf [-v var] format [arguments] -v var assign the output to shell variable VAR rather than
All POSIX utilities that take options (except for dd
again), as well as most
tools that take long options, also accept --
as the end of options indicator.
The ARG_MAX Value
On most operating systems, only so many arguments can be passed to a program.
This is not a limitation of Bash, or any shell really, but of the kernel itself.
This is an important distinction, because it means that an external program
(such as echo) may fail due to too many arguments, whereas the builtin
equivalent would work fine (since Bash doesn’t have to use execve
on the
builtin). The exact size of ARG_MAX is dependent on the operating system, and
there isn’t always a clean way to obtain it. On my system, it can be found
through the getconf
tool:
Terminal
getconf ARG_MAX 262144
It is important to note that this number doesn’t mean I can have 262144 separate arguments. Rather, the total length of all arguments combined can’t be longer than 262144 characters. The environment variables passed to the program are also included in this total.
The 0th Argument
As stated before, programs are free to interpret arguments however they like.
This is especially true for the 0th argument. Depending on the name, programs
might enable or disable options automatically. For example, if Bash’s 0th
argument is sh
, it will start as a shell that closely resembles POSIX sh
.
Further Reading
- Simple Commands
- ARG_MAX, maximum length of arguments for a new process
- What defines the maximum size for a command single argument?
- limits.h - implementation-defined constants
Pattern Matching
When working with your shell, there will probably be times when you need to
operate on a group of files. It could be as specific as needing to work on all
of the .txt
files in the /tmp
directory, or as general as all files in the
current directory. Bash provides a handy feature called globals (or more
commonly known as globs), which are patterns you can use to match the files you
need.
A Quick Note About the Set and Shopt Builtins
The set
builtin isn’t directly used in matching patterns, however it will be
used a couple of times, so it is worth knowing about. Specified by POSIX, it is
used to enable or disable various settings internal to the shell. The main
option we’ll be using is set -x
, which tells the shell to show the actual
command it will be executing (among other things). It is also good to know about
the set -f
option, which disables most of the pattern matching we’ll be
discussing.
The shopt
utility similarly is used to enable or disable options as well,
however these options are non-POSIX (or in other words, options specific to
Bash). We will be discussing several shopt
options in this section, and how
they affect pattern matching.
Basic Globals
When Bash is in its default mode, with no special options enabled in set
or
shopt
, it has a very basic set of globs. These globs are made up of 3
characters: *
, ?
, and [
. Since Bash is the one expanding these globs into filenames, it doesn’t matter what characters the filenames contain. The shell
will store them internally as separate entries. Let’s take another look at the
example we saw in the Filenames and Paths section, with
globs instead of ls
:
Terminal
ls file one.txt file two.txt file *.txt for file in *.txt; do echo "$file"; done file one.txt file two.txt file *.txt
As you can see, despite these files having special characters in their names,
like *
or a space, Bash expands the glob into a list that we can iterate over
safely. Using set -x
, we can see how Bash expands this list behind the scenes:
Terminal
set -x echo *.txt + echo 'file one.txt' 'file two.txt' 'file *.txt' file one.txt file two.txt file *.txt
The *
, or star, matches 0 or more characters until the next literal is
matched. So, using the example above, the star matched file one
, file two
,
and file *
, and then the literal .txt
matched .txt
. The ?
will match
exactly any one character. If we had the files file.txt
and file-txt
, the
pattern *?txt
could be used to match both. In such a pattern, *
would match
file
, ?
would match .
and -
, and the literal txt
again matches txt
.
The [
, or open bracket, matches any character between it and a closing ]
, or
close bracket. It will match anything within it to exactly 1 corresponding
character in a filename. If you wanted to match any of the first 6 characters of
the alphabet (lowercase), you could use the pattern [abcdef]
. You can also
specify an inclusive range of characters using a -
. For example, the previous
pattern could be rewritten as [a-f]
. It is possible to stack ranges and
characters as well. The pattern [A-Za-z_]
will match any uppercase letter,
lowercase letter, or the underscore. If you want to match a literal -
, you
must either escape it (with a \
, single quotes, or double quotes), put it
right after the open bracket, or right before the close bracket. To match a
literal ]
, it also must be escaped, or put immediately after the open bracket.
You can also have character classes within brackets, which are essentially
macros that match a set of characters. POSIX specifies the following classes:
-
[:alnum:]
- Matches any digit, uppercase or lowercase characters. -
[:alpha:]
- Matches any uppercase or lowercase characters. -
[:ascii:]
- Matches any character from the ASCII character set. -
[:blank:]
- Matches the space or tab characters. -
[:cntrl:]
- Matches any control character sequence. -
[:digit:]
- Matches the digits 0 through 9. -
[:graph:]
- Matches any character which has a graphic representation. -
[:lower:]
- Matches any lowercase character. -
[:print:]
- Matches the same character set as[:graph:]
, as well as space. -
[:punct:]
- Matches the same character set as[:graph:]
, except for letters and digits. -
[:space:]
- Matches all whitespace characters. -
[:upper:]
- Matches any uppercase character. -
[:word:]
- Matches any lowercase or uppercase character, digit, or the underscore. -
[:xdigit:]
- Matches any hexadecimal digit.
These character classes can be stacked as well, alongside literals and ranges. The following patterns all match the same set of characters (any uppercase or lowercase character, digit, or the underscore):
[A-Za-z0-9_]
[[:upper:]a-z0-9_]
[[:upper:][:lower:]0-9_]
[[:upper:][:lower:][:digit:]_]
[[:alnum:]_]
[[:word:]]
Patterns will match filenames in the current directory, until either the end of
the pattern is reached, or a /
is encountered. In the event of a /
, Bash
will start looking in any directories whose names match up to the /
. It then
treats the remaining characters as a new pattern, matching against filenames in
the matched directory. This repeats for every /
encountered, until the end of
the pattern is reached. If a pattern ends in a /
, then it only matches
directories in the ending directory. Let’s take a look at this in action:
Terminal
ls abc def ghi ls def jkl ls def/jkl hi.txt echo */???/*.txt def/jkl/hi.txt echo */*/ def/jkl/
In the first pattern above, the first *
matched the def
directory, the ???
matched the jkl
directory (within def
), the last *
matched hi
, and of
course .txt
matched .txt
. Bash will apply a pattern to the first series
of directories and files that match all parts of a pattern. So even though the
first *
could have matched abc
, since the rest of the pattern didn’t apply,
Bash stopped considering abc
.
Dotglob
By default, Bash won’t match files starting with a .
, unless the pattern
itself specifies files starting with a .
. However, this will also include your
cwd (.
) and parent directory (..
) in the results. For example:
Terminal
ls -A .hidden .some-file another.txt document.txt file.txt echo * another.txt document.txt file.txt echo .* . .. .hidden .some-file
Bash provides an option to include files starting with a .
without explicitly
starting your pattern with a .
, called dotglob
. It can be enabled with the
shopt
builtin:
shopt -s dotglob
Let’s take a look at a variation of the above example with that turned on:
Terminal
ls -A .hidden .some-file another.txt document.txt file.txt shopt -s dotglob echo * .hidden .some-file another.txt document.txt file.txt
As can be seen above, your cwd and parent directory aren’t included in the results.
Nullglob
By default, if Bash can’t match any files to a pattern, it takes the pattern as a literal argument. For example:
Terminal
ls -A .hidden .some-file another.txt document.txt file.txt echo not*a?[f]ile not*a?[f]ile
This isn’t always desired (or even expected) behavior, especially when we get to
loops. This can be changed with the nullglob
option:
shopt -s nullglob
This option instructs Bash to erase the pattern if no matches occur, instead of treating it as a literal:
Terminal
shopt -s nullglob echo not*a?[f]ile echo not*a?[f]ile Hello, world! Hello, world!
It is important to note, that with this option, the argument is completely gone.
This means that in the last command of the above example, there were three
arguments: echo
, Hello,
, and world!
. We can see this with set -x
:
Terminal
set -x echo not*a?[f]ile Hello, world! + echo Hello, 'world!' Hello, world!
Instead of passing an empty argument, Bash removes it.
Nocaseglob
This is a fairly straightforward option. It can be enabled via:
shopt -s nocaseglob
This option tells Bash to match files in a case insensitive manner:
Terminal
echo [A-D]* [A-D]* shopt -s nocaseglob echo [A-D]* another.txt document.txt
Failglob
This option will cause Bash to print out an error if no files match and
immediately stops processing. It is enabled through the shopt
builtin as well:
shopt -s failglob
Let’s see what happens when no files match a pattern:
Terminal
shopt -s failglob echo not*a?[f]ile bash: no match: not*a?[f]ile echo not*a?[f]ile; echo Hello, world! bash: no match: not*a?[f]ile echo first line; echo not*a?[f]ile; echo Hello, world! first line bash: no match: not*a?[f]ile echo not*a?[f]ile; echo Hello, \ world! bash: no match: not*a?[f]ile
As seen above, it doesn’t matter what the line contains. As soon as a pattern is
encountered that doesn’t match any files, Bash reports the error and stops
processing immediately. This option takes precedence over nullglob
.
Extglob
As usual, it is enabled via the shopt
builtin:
shopt -s extglob
This option enables several additional kinds of globs for pattern matching. Each
of these globs is of the form x(first|second|...)
. The x
specifies the type
of glob, and is one of the following characters: ?
, *
, +
, @
, or !
.
Everything between the opening (
and closing )
is a |
-separated list of
entries. The 5 kinds of extended globs are:
?(...)
- Matches 0 or 1 entries of the list to a filename.*(...)
- Matches 0 or more entries of the list to a filename.+(...)
- Matches 1 or more entries of the list to a filename.@(...)
- Matches 1 of the list entries to a filename.!(...)
- Matches anything except the entries in the list to a filename.
Let’s see some examples of this:
Terminal
shopt -s extglob echo foo-?(a|b|c)-bar foo-a-bar foo--bar foo-b-bar foo-c-bar echo foo-*(a|b|c)-bar foo-aaaa-bar foo-a-bar foo--bar foo-b-bar foo-c-bar echo foo-+(a|b|c)-bar foo-aaaa-bar foo-a-bar foo-b-bar foo-c-bar echo foo-@(a|b|c)-bar foo-a-bar foo-b-bar foo-c-bar echo foo-!(a|b|c)-bar foo-aaaa-bar foo--bar
Globstar
This enables a very handy glob: **
. Being a shopt
feature, it is enabled the
usual way:
shopt -s globstar
By default, **
is treated exactly the same as *
. However, with globstar
enabled, it will match directories recursively. In the context of filesystems,
recursively means to search within directories and any subdirectories within
them, and so on and so forth. If we had the following file hierarchy:
folder1/
folder2/
hey.txt
there.txt
folder3/
another.txt
1.txt
folder4/
doc.txt
folder5/
folder6/
list.txt
folder7/
movies.txt
We could then list all .txt
files with the pattern **/*.txt
:
Terminal
shopt -s globstar echo **/*.txt folder1/1.txt folder1/folder2/hey.txt folder1/folder2/there.txt folder1/folder3/another.txt folder4/doc.txt folder5/folder6/folder7/movies.txt folder5/folder6/list.txt
As with regular globs, ending a pattern in /
will only match directories:
Terminal
shopt -s globstar echo **/*/ folder1/ folder1/folder2/ folder1/folder3/ folder4/ folder5/ folder5/folder6/ folder5/folder6/folder7/
Further Reading
Controlling Input and Output
Variables
Variables are an extremely useful tool in Bash. However, for a number of reasons that we’ll discuss in this chapter, they are often misused and abused.
The Stringly Typed Language
In programming language theory, you can often refer to the underlying type system as weakly or strongly typed. I’m far from qualified to discuss either, but I am strongly of the opinion that Bash belongs in neither category. This is because Bash has an extremely simplified type system. One could even make the argument that it doesn’t really have one at all. In Bash (and most POSIX-based shells), unlike normal programming languages, almost everything is a string. Let’s take a look at the following command:
echo "Hello, world!"
Since Hello, world!
was the only text wrapped in "
, one might think it is
the only string in the command. However, echo is actually a string as well. In
Bash, the only role of quotes is to escape characters. They don’t denote or
start a string literal like they do in other programming languages. Every
argument passed to every command is a string. Even if the argument is all
digits, it is still a string as far as Bash is concerned. This means that the
following command is equivalent to the previous one:
"echo" "Hello, world!"
As is this one:
echo Hello,\ world!
Because of this, variables in Bash are essentially just strings as well. This should become more apparent as we continue on.