Learning shell scripting without manuals

Imagine you wake up one sunny, blissful morning, brew some drip coffee
with your V60, and start reading the morning paper–I mean the
morning reddit–on your homebrew Kindle-like contraption. Two dangerously
named files await you on the screen:

~/morning$ ls *
total 8
drwxrwxr-x  2 damian damian 4096 Jun 19 14:22 .
drwxr-xr-x 30 damian damian 4096 Jun 19 14:22 ..
-rw-rw-r--  1 damian damian    0 Jun 19 14:44 *
-rw-rw-r--  1 damian damian    0 Jun 19 14:45 -la

Now, that’s strange, because I invoked ls without -la. If I instead provide no arguments, /bin/ls behaves as intended.

~/morning$ ls /
bin    dev   initrd.img      lib64       mnt   root  srv  usr      vmlinuz.old
boot   etc   initrd.img.old  lost+found  opt   run   sys  var
cdrom  home  lib             media       proc  sbin  tmp  vmlinuz

In the two /bin/ls invocations above, neither have options so we can conclude one
thing: wildcards, if used improperly, can lead to different behavior
depending on the contents of a directory. If your new to bash and that doesn’t scare you,
you may want to get your amygdala checked.

If you have a firm understanding of bash’s rules for variable
expansion and the basics of how options are parsed, you will stay on
top of dealing with these files. You should, however, try to get to
the bottom of where they came from, but that’s another story.

To safely simulate how bash might handle something more dangerous like:

  rm *

we can use /bin/echo. Like /bin/rm, /bin/echo is written in C and
uses the same library functions for parsing command line arguments. /bin/echo
writes its arguments to its standard output.

  /bin/echo /your/dangerous command call here

Let’s try this technique:

~/morning$ /bin/echo rm *
rm * -la

Fortunately, -l and -a are invalid options in rm so they cause an error without doing anything harmful. If a file had been named "-rf", then we’d have to be a bit more careful.

Without knowing bash’s rules, we still do not know how bash chops the string “/bin/echo ls *” into elements of the argument vector passed to the /bin/echo C program. This is important regardless of your choice language since Python, Perl, Java, and even bash itself provide an array of strings that is directly derived from the array passed to C’s main(). I can’t speak for Ruby.

More impressive than a V60 is a program called strace. It records most system call interactions between a process and the kernel. A system call is a API call made between a user space program and the kernel. For example, the ubiquitous open(), read(), write(), and close() are system calls.

We will run strace on /bin/echo simply to see how bash
parses an expression and chops it into individual string elements of an argument
array before its passed to the program.

  $ strace /bin/echo rm *
  execve("/bin/echo", ["/bin/echo", "rm", "*", "-la"], [/* 57 vars */]) = 0

The very first system call is execve(). It is almost always the first
call trapped by strace.
It is called by the C implementation of bash
to load in a program into the child process. Here is the interface:

   int execve(const char *program_name, char *const argv[], char *const envp[])

It takes in the pathname of the program to run, the program’s
arguments as an array of null-terminated strings, and the environment
variables set for the program.

execve() is typically called after a fork(), which the parent process
(e.g. bash) uses to create the child process. If execve() is successful
in loading the program, it takes over the child process with the
program image and flow control begins at the start of the program’s
main() function. Otherwise, flow control continues in the program that
called fork() and execve(), which might be bash.

Digression #1

Most programs like bash, Perl, and Python are written in
C and do something like the following to call an external program:

  int child_pid;
  child_pid = fork();
  if (child_pid < 0) {
    error(0, errno, "fork() failed")
  if (child_pid == 0) {
    int status = execvp(program_name, argv);
    if (status == -1) {
      error(0, errno, "could not execute program %s", program_name);

This idiom is called a “fork, exec”. If we’re in the new child process, fork() returns a positive value. By convention, the parent continues on, and the child is responsible for loading the desired program and running it, which is accomplished by execve().

execve is pretty nifty: if successful, the child won’t continue
to the next line. In fact, it will forget its current program image
and the program_name will be memory-mapped to replace it. Its main() function is then called.

  $ strace -- /bin/echo rm *
  execve("/bin/echo", ["/bin/echo", "rm", "*", "-la"], [/* 57 vars */]) = 0

If you are unfamiliar with the dash dash --, it tells most
programs that any arguments that follow are not to be interpreted as options.
This enables you to chain many commands together without ambiguity about the command
to which each option belongs. But is it useful for something else?

Digression #2

The POSIX standard defines the semantics of how options are parsed. A
great majority of programs bundled with Linux and Mac OS X follow
it. In fact, if you use getopt(), you automatically follow it for free.
Most core programs in UNIX use getopt() to parse the options so you
often get consistent option parsing behavior across a broad spectrum of programs.

Now suppose we wanted to tunnel through a bastion/gateway named
host1 into a second machine host2, and remove a file in the environment
variable $file_to_remove one could do:

  $ ssh -i ~/.ssh/key1 user1@host1 -- ssh -i /nfs/home/me/.ssh/key2 user2@host2 -- /bin/rm -- "$file_to_remove"

Now, as it is written, I know that the first ssh will stop interpreting options
after the first --, the second ssh will interpret only options in between the
first and second –, and the third command /bin/rm will not parse any options!

Eureka! So, let’s see if we can get /bin/ls to behave consistently
regardless of the contents of the current directory:

  ~/morning$ ls -- *
  *  -la

Excellent, -- without any arguments before it, effectively invokes
ls without any options.

Let’s assume we have a file we cannot afford to lose called important:

  ~/morning$ touch important

Now, let’s list the files without any special options:

~/morning$ ls -- *
*  important  -la

Alright, so bash is obviously interpreting the asterisk. We can safely
remove -la using the -- trick:

  ~/morning$ rm -- -la

Now this file is gone, ls should behave correctly (but not consistently) without the --.

  ~/morning$ ls *
  *  important

Now that we think we understand things better, let’s create a file
that’s more dangerous than the innocuous “-la”:

  ~/morning$ touch -- -rf

In fact, -- is the only way we can get touch to create a file
with a name that begins with -.

Let’s call our friendly, non-destructive program /bin/echo
to simulate the interpretation of an asterisk:

  ~/morning$ /bin/echo *
  * important -rf

The expansion of a filename wildcard is called glob(), a venerable
C standard library function. In fact, we have an endearing name for this process:
globbing. My hypothesis is that bash passes an unglobbed asterisk when
it is double-quoted. Let’s see if bash globs when we enclose it with double quotes.

  ~/morning$ /bin/echo "*"

Ahh, now let’s corroborate¬†this with strace.

  ~/morning$ strace -- /bin/echo "*"
  execve("/bin/echo", ["/bin/echo", "*"], [/* 57 vars */]) = 0

And indeed, main() receives its asterisk unglobbed. Most system calls will
not glob their filename inputs including the system call used to delete files,
viz. unlink(). In fact, the commands rm, ls, etc. will not glob their arguments and neither will the system calls.

Let’s be unusually brave and try:

   rm -- "*"

The -- will ensure that "-rf" is not interpreted, and /bin/rm does not descend into parent directory .. and its contents. Recursively, this would include all files the executing user has access to. Let’s use strace to see what’s happening under the hood and confirm that our asterisk is passed to unlink unglobbed:

  ~/morning$ strace -- /bin/rm -- "*"
  execve("/bin/rm", ["/bin/rm", "--", "*"], [/* 57 vars */]) = 0
  access("*", W_OK)          = 0
  unlink("*")                = 0

It is important to note in the execve(), since the asterisk is not globbed by bash, -rf does not appear in our arguments. However, if it did, we would be protected from interpreted it as an option due to the dash dash.

Digression #3

strace traces lots of different system calls. If you are unfamiliar with a particular system call, you can look it up with man.

  man 2 access

The 2 tells man to only include pages about system calls. Use 1 instead for commands and 3 for C standard library calls.

To read up on glob, the function used to turn a wildcard expression string into a list of matches,

  man 3 glob

This implies glob is not part of the kernel. In fact, I believe it is a true statement that all system calls do not glob.

From the last strace, we can conclude several things:

  • the asterisk is not globbed when passed to the system calls so it will be treated as a filename without any special interpretation of the wildcard characters,
  • access(): rm is first checking if it can write to the file named “*”
  • unlink(): the file is removed

Now, let’s confirm the asterisk file is removed:

  ~/morning$ ls
  important  -rf

Now we can safely remove the “-rf” file so it does not wreak havoc when we’re feeling less mindful.

  ~/morning$ rm -- -rf

Now, only our important file remains without any dangerously named files:

  ~/morning$ ls

How about files with spaces? Let’s create one:

 ~/morning$ touch -- "-rf *"
 ~/morning$ ls -l
total 0
-rw-rw-r-- 1 damian damian 0 Jun 19 15:24 important
-rw-rw-r-- 1 damian damian 0 Jun 19 16:02 -rf *

We see that double quotes suppresses the glob so it can be used to manipulate files with spaces, but it also can be used to avoid splitting a string into multiple command line arguments. Let’s verify this behavior with strace and our innocuous /bin/echo:

  ~/morning$ strace -- /bin/echo -- "-rf *"
  execve("/bin/echo", ["/bin/echo", "--", "-rf *"], [/* 58 vars */]) = 0

Indeed, "-rf *" is not split. Let’s take it further and put two
double-quoted strings side-by-side:

  ~/morning$ strace -- /bin/echo -- "-rf *"" more stuff"
  execve("/bin/echo", ["/bin/echo", "--", "-rf * more stuff"], [/* 58 vars */]) = 0

Let’s now separate the two double quoted strings by whitespace.

  ~/morning$ strace -- /bin/echo -- "-rf *"   " more stuff"
  execve("/bin/echo", ["/bin/echo", "--", "-rf *", " more stuff"], [/* 58 vars */]) = 0

From these two recent straces, we notice two things: bash splits an expression into separate elements of the args array passed in execve() when there is unquoted whitespace, but without unquoted whitespace, they belong to the same string in a single element of the args array.

Let’s be careful and rename our file. Knowing most standard programs follow the POSIX convention, let’s make a habit of preventing filenames from being interpreted as options.

  ~/morning$ mv -- important "my important file"

Now let’s store its name in a variable:

  ~/morning$ important_file="my important file"

Now let’s try echoing it unquoted:

  ~/morning$ strace -- /bin/echo $important_file
  execve("/bin/echo", ["/bin/echo", "my", "important", "file"], [/* 58 vars */]) = 0

bash breaks up “my important file” into three args: “my”, “important”, “file”.

With double quotes, we prevent whitespace from splitting the string into
separate args. Let’s try it.

  ~/morning$ strace -- /bin/echo -- "$important_file"
  execve("/bin/echo", ["/bin/echo", "--", "my important file"], [/* 58 vars */]) = 0

As intended, “my important file” survives as a single argument when we use double quotes. Let’s see what happens when we use single quotes:

  ~/morning$ strace -- /bin/echo -- '$important_file'
execve("/bin/echo", ["/bin/echo", "--", "$important_file"], [/* 58 vars */]) = 0

This experiment means that bash does not expand variables with single quotes. And how about wildcards?

  ~/morning$ strace -- /bin/echo -- '*'
  execve("/bin/echo", ["/bin/echo", "--", "*"], [/* 58 vars */]) = 0

Let’s protect our important file by making it read only:

  ~/morning$ strace -- chmod -- 400 "$important_file"
  execve("/bin/chmod", ["chmod", "--", "400", "my important file"], [/* 58 vars */]) = 0

As can be seen from chmod, -- does not completely protect us against option injection since 400 is interpreted as a BSD-style option.


To write more secure bash scripts, it is best to follow four¬†rules of thumb. First, specify the fully qualified pathname to the intended program to execute. Next, specify all the intended options together and use a -- to separate the non-option arguments. Third, double quote your variables so they aren’t globbed and they survive as a single argument. Fourth, single quote variable expressions that shouldn’t be expanded.

   /full/path/to/command [intended options] -- "$x1" "$x2" ...

For example,

  • copy a file with filename "$x" to directory foo:
    /bin/mv -- "$x" foo/
  • copy files "$x" and "$y" to directory dst:
    /bin/cp -- "$x" "$y" dst/

This helps ensure our scripts:

  • execute only the program that was intended,
  • are resilient to the most common option injections in command-line arguments
  • can safely operate on arguments/filenames that start with a dash as well as names containing wildcards or whitespace

In summary, strace and /bin/echo, enabled us to learn a lot about bash without harm. The dangerously named files are gone. We can continue our morning coffee and news stress free now that we know how to better deal with subversive filenames.