What’s all this new-fangled DBus rubbish?

The 1990s called…

DBus has been one of the larger changes that swept through Linux in the last 20 years or so. I mostly (with the exception of a small amount of whinging) ignored it. Usually I’ve encountered it when it hasn’t worked or has got in the way with something or other. Naturally when things worked, I was generally unaware of it.

It’s basically:

  • Async RPC system with call, response, exception and asynchronus push.
  • Authentication (useful for talking to daemons running as root)
  • Introspection (the set of methods, arguments etc are exposed via DBus)
  • Central point for services to register objects and methods: you open a dbus connection as opposed to a random socket

Various tools exist to send and receive messages, e.g. python libraries, shell commands via ‘dbus-send’ and so on. I’ve nothing against RPC in general, but it replaces essentially shell scripts triggered by the kernel etc with programs exchanging messages. You can do more with the latter (though often that isn’t necessary), but it replaces easily discoverable scripts with reading documentation, something the programming community is not well known for and as such it has a rather higher barrier to entry.

It has also has been used to replace simply libraries with relatively complex IPC requiring complex setup. But I won’t blame a table saw for injuries it causes through misuse. Anyway this post is a nearly linear stream of me learning DBus with the mistakes and confusion removed.

The tools

I will be relying on the following:

  • python dbus module import dbus
  • The basic dbus commandline program: dbus-send
  • QT’s qdbus since it provides in many cases a nicer interface and introspection
  • Gnome’s gdbus for more verbose introspection

The basics

I’m interested in manipulating the system, so I will be working with the system bus. There’s usually a system bus (for talking to OS related daemons) and a session bus (for all the programs in a logged in session). You can make more if you like but no one does.

In order to make an RPC call, you need:

  1. A program to talk to, e.g. NetworkManager (this is called the bus name) and usually has a name like org.freedesktop.NetworkManager
  2. An object in the program. Each program exports a tree of objects, rooted at / and separated with forward slashes. Something like: /org/freedesktop/NetworkManager. Freedesktop likes redundant naming. They could equally well have exported /.
  3. The interface and method name. Just like any OO system each object can present zero or more interfaces with methods.

We can examine this. First, qdbus will show the names present on the system bus. For example, on my system (I don’t know what the numbers prefixed with colons are) I have:

qdbus --system  | grep -v :
 org.freedesktop.systemd1
 org.freedesktop.resolve1
 fi.epitest.hostap.WPASupplicant
 fi.w1.wpa_supplicant1
 org.freedesktop.NetworkManager
 org.gnome.DisplayManager
 org.freedesktop.ColorManager
 org.freedesktop.Avahi
 org.bluez
 org.freedesktop.UPower
 org.freedesktop.Accounts
 org.freedesktop.login1
 org.freedesktop.RealtimeKit1
 org.freedesktop.UDisks2
 org.freedesktop.ModemManager1
 org.freedesktop.bolt
 org.freedesktop.PackageKit
 org.freedesktop.PolicyKit1
org.freedesktop.DBus

There’s a few one might recognise, Udisks for hotplug disks, NetworkManager for wifi control etc, ModemManager1 because this machine actually has a real actual physical 56k modem, and a few other well established ones like systemd. You can go further and query what’s on the bus. For example, I can query systemd using gdbus, which as you may recall is the verbose, detailed one. I’m going to query the root object to see what’s there:

$gdbus introspect  --system --dest org.freedesktop.systemd1   --object-path / 
node / {
  interface org.freedesktop.DBus.Peer {
    methods:
      Ping();
      GetMachineId(out s machine_uuid);
    signals:
    properties:
  };
  interface org.freedesktop.DBus.Introspectable {
    methods:
      Introspect(out s data);
    signals:
    properties:
  };
  interface org.freedesktop.DBus.Properties {
    methods:
      Get(in  s interface,
          in  s property,
          out v value);
      GetAll(in  s interface,
             out a{sv} properties);
      Set(in  s interface,
          in  s property,
          in  v value);
    signals:
      PropertiesChanged(s interface,
                        a{sv} changed_properties,
                        as invalidated_properties);
    properties:
  };
  node org {
  };
};

If you’ve ever read IDLs of any sort this will look vaguely familiar. Systemd is exporting an object which presents three interfaces:

  • org.freedesktop.DBus.Peer
  • org.freedesktop.DBus.Introspectable
  • org.freedesktop.DBus.Properties

and a subobject, org. Since gdbus doesn’t recurse by default, that’s all that’s displayed. The first of those objects has two methods neither of which have any arguments, which makes them easy to call. So I shall ping dbus:

$dbus-send --print-reply --system --dest=org.freedesktop.systemd1 / org.freedesktop.DBus.Peer.GetMachineId
method return time=1632074720.055870 sender=:1.0 -> destination=:1.3006 serial=54374 reply_serial=2
   string "73f867af39124a3583c288e620019332"

and it responds. See how the bus name (dest), path (/) and interface.object (org.freedesktop.DBus.Peer.GerMachineId) are used. I can examine another common one. when given a bus, but no object, qdbus will recurse showing the object tree:

$qdbus --literal --system org.freedesktop.NetworkManager 
/
/org
/org/freedesktop
/org/freedesktop/NetworkManager
/org/freedesktop/NetworkManager/DnsManager
/org/freedesktop/NetworkManager/DHCP4Config
/org/freedesktop/NetworkManager/DHCP4Config/70
/org/freedesktop/NetworkManager/ActiveConnection
/org/freedesktop/NetworkManager/ActiveConnection/2
/org/freedesktop/NetworkManager/ActiveConnection/72
/org/freedesktop/NetworkManager/AccessPoint
/org/freedesktop/NetworkManager/AccessPoint/2686
/org/freedesktop/NetworkManager/Devices
/org/freedesktop/NetworkManager/Devices/3
/org/freedesktop/NetworkManager/Devices/2
/org/freedesktop/NetworkManager/Devices/1
/org/freedesktop/NetworkManager/Devices/25
/org/freedesktop/NetworkManager/Devices/4
/org/freedesktop/NetworkManager/AgentManager
/org/freedesktop/NetworkManager/Settings
/org/freedesktop/NetworkManager/Settings/8
/org/freedesktop/NetworkManager/Settings/7
/org/freedesktop/NetworkManager/Settings/6
/org/freedesktop/NetworkManager/Settings/5
/org/freedesktop/NetworkManager/Settings/4
/org/freedesktop/NetworkManager/Settings/3
/org/freedesktop/NetworkManager/Settings/2
/org/freedesktop/NetworkManager/Settings/1
/org/freedesktop/NetworkManager/Settings/24
/org/freedesktop/NetworkManager/Settings/9
/org/freedesktop/NetworkManager/IP6Config
/org/freedesktop/NetworkManager/IP6Config/3
/org/freedesktop/NetworkManager/IP6Config/204
/org/freedesktop/NetworkManager/IP6Config/203
/org/freedesktop/NetworkManager/IP6Config/6
/org/freedesktop/NetworkManager/IP4Config
/org/freedesktop/NetworkManager/IP4Config/3
/org/freedesktop/NetworkManager/IP4Config/204
/org/freedesktop/NetworkManager/IP4Config/203
/org/freedesktop/NetworkManager/IP4Config/6

For reference, gdbus (non recursive; recursive is too verbose for this blog post) gives:

$gdbus introspect  --system --dest org.freedesktop.NetworkManager   --object-path /
node / {
  node org {
  };
};

The root node isn’t very interesting, it just has the child org and nothing else. qdbus will also give a more compact method view, for example, I can query one of the devices:

$qdbus --literal --system org.freedesktop.NetworkManager /org/freedesktop/NetworkManager/Devices/3 
method QDBusVariant org.freedesktop.DBus.Properties.Get(QString interface_name, QString property_name)
method QVariantMap org.freedesktop.DBus.Properties.GetAll(QString interface_name)
signal void org.freedesktop.DBus.Properties.PropertiesChanged(QString interface_name, QVariantMap changed_properties, QStringList invalidated_properties)
method void org.freedesktop.DBus.Properties.Set(QString interface_name, QString property_name, QDBusVariant value)
method QString org.freedesktop.DBus.Introspectable.Introspect()
method QString org.freedesktop.DBus.Peer.GetMachineId()
method void org.freedesktop.DBus.Peer.Ping()
property read QDBusObjectPath org.freedesktop.NetworkManager.Device.ActiveConnection
property readwrite bool org.freedesktop.NetworkManager.Device.Autoconnect
property read QList<QDBusObjectPath> org.freedesktop.NetworkManager.Device.AvailableConnections
property read uint org.freedesktop.NetworkManager.Device.Capabilities
property read uint org.freedesktop.NetworkManager.Device.DeviceType
property read QDBusObjectPath org.freedesktop.NetworkManager.Device.Dhcp4Config
property read QDBusObjectPath org.freedesktop.NetworkManager.Device.Dhcp6Config
property read QString org.freedesktop.NetworkManager.Device.Driver
property read QString org.freedesktop.NetworkManager.Device.DriverVersion
property read bool org.freedesktop.NetworkManager.Device.FirmwareMissing
property read QString org.freedesktop.NetworkManager.Device.FirmwareVersion
property read QString org.freedesktop.NetworkManager.Device.Interface
property read uint org.freedesktop.NetworkManager.Device.Ip4Address
property read QDBusObjectPath org.freedesktop.NetworkManager.Device.Ip4Config
property read QDBusObjectPath org.freedesktop.NetworkManager.Device.Ip6Config
property read QString org.freedesktop.NetworkManager.Device.IpInterface
property read QDBusRawType::aa{sv} org.freedesktop.NetworkManager.Device.LldpNeighbors
property readwrite bool org.freedesktop.NetworkManager.Device.Managed
property read uint org.freedesktop.NetworkManager.Device.Metered
property read uint org.freedesktop.NetworkManager.Device.Mtu
property read bool org.freedesktop.NetworkManager.Device.NmPluginMissing
property read QString org.freedesktop.NetworkManager.Device.PhysicalPortId
property read bool org.freedesktop.NetworkManager.Device.Real
property read uint org.freedesktop.NetworkManager.Device.State
property read QDBusRawType::(uu) org.freedesktop.NetworkManager.Device.StateReason
property read QString org.freedesktop.NetworkManager.Device.Udi
method void org.freedesktop.NetworkManager.Device.Delete()
method void org.freedesktop.NetworkManager.Device.Disconnect()
method QDBusRawType::a{sa{sv}} org.freedesktop.NetworkManager.Device.GetAppliedConnection(uint flags, qulonglong& version_id)
method void org.freedesktop.NetworkManager.Device.Reapply(QDBusRawType::a{sa{sv}} connection, qulonglong version_id, uint flags)
signal void org.freedesktop.NetworkManager.Device.StateChanged(uint new_state, uint old_state, uint reason)
property read QList<QDBusObjectPath> org.freedesktop.NetworkManager.Device.Wireless.AccessPoints
property read QDBusObjectPath org.freedesktop.NetworkManager.Device.Wireless.ActiveAccessPoint
property read uint org.freedesktop.NetworkManager.Device.Wireless.Bitrate
property read QString org.freedesktop.NetworkManager.Device.Wireless.HwAddress
property read uint org.freedesktop.NetworkManager.Device.Wireless.Mode
property read QString org.freedesktop.NetworkManager.Device.Wireless.PermHwAddress
property read uint org.freedesktop.NetworkManager.Device.Wireless.WirelessCapabilities
signal void org.freedesktop.NetworkManager.Device.Wireless.AccessPointAdded(QDBusObjectPath access_point)
signal void org.freedesktop.NetworkManager.Device.Wireless.AccessPointRemoved(QDBusObjectPath access_point)
method QList<QDBusObjectPath> org.freedesktop.NetworkManager.Device.Wireless.GetAccessPoints()
method QList<QDBusObjectPath> org.freedesktop.NetworkManager.Device.Wireless.GetAllAccessPoints()
signal void org.freedesktop.NetworkManager.Device.Wireless.PropertiesChanged(QVariantMap properties)
method void org.freedesktop.NetworkManager.Device.Wireless.RequestScan(QVariantMap options)
property readwrite uint org.freedesktop.NetworkManager.Device.Statistics.RefreshRateMs
property read qulonglong org.freedesktop.NetworkManager.Device.Statistics.RxBytes
property read qulonglong org.freedesktop.NetworkManager.Device.Statistics.TxBytes
signal void org.freedesktop.NetworkManager.Device.Statistics.PropertiesChanged(QVariantMap properties)

Yikes! There’s a lot there. Actually gdbus has the nice thing where it also queries properties for you. I recommend trying that. Anyway if you carefully read through, you can see the GetAccessPoints method. If I call it, I get a list of accesspoints:

$dbus-send --print-reply --system --dest=org.freedesktop.NetworkManager /org/freedesktop/NetworkManager/Devices/3 org.freedesktop.NetworkManager.Device.Wireless.GetAccessPoints
method return time=1632075788.446527 sender=:1.12 -> destination=:1.3041 serial=431030 reply_serial=2
   array [
      object path "/org/freedesktop/NetworkManager/AccessPoint/2686"
      object path "/org/freedesktop/NetworkManager/AccessPoint/2699"
      object path "/org/freedesktop/NetworkManager/AccessPoint/2700"
      object path "/org/freedesktop/NetworkManager/AccessPoint/2701"
      object path "/org/freedesktop/NetworkManager/AccessPoint/2702"
      object path "/org/freedesktop/NetworkManager/AccessPoint/2703"
      object path "/org/freedesktop/NetworkManager/AccessPoint/2704"
      object path "/org/freedesktop/NetworkManager/AccessPoint/2706"
      object path "/org/freedesktop/NetworkManager/AccessPoint/2708"
      object path "/org/freedesktop/NetworkManager/AccessPoint/2710"
      object path "/org/freedesktop/NetworkManager/AccessPoint/2711"
      object path "/org/freedesktop/NetworkManager/AccessPoint/2712"
      object path "/org/freedesktop/NetworkManager/AccessPoint/2713"
      object path "/org/freedesktop/NetworkManager/AccessPoint/2714"
      object path "/org/freedesktop/NetworkManager/AccessPoint/2715"
      object path "/org/freedesktop/NetworkManager/AccessPoint/2716"
      object path "/org/freedesktop/NetworkManager/AccessPoint/2717"
   ]

This is not very shell friendly, but I can persist. For example, if I examine one of the APs, I get:

$gdbus introspect  --system --dest org.freedesktop.NetworkManager   --object-path /org/freedesktop/NetworkManager/AccessPoint/2714
node /org/freedesktop/NetworkManager/AccessPoint/2714 {
  interface org.freedesktop.DBus.Properties {
    methods:
      Get(in  s interface_name,
          in  s property_name,
          out v value);
      GetAll(in  s interface_name,
             out a{sv} properties);
      Set(in  s interface_name,
          in  s property_name,
          in  v value);
    signals:
      PropertiesChanged(s interface_name,
                        a{sv} changed_properties,
                        as invalidated_properties);
    properties:
  };
  interface org.freedesktop.DBus.Introspectable {
    methods:
      Introspect(out s xml_data);
    signals:
    properties:
  };
  interface org.freedesktop.DBus.Peer {
    methods:
      Ping();
      GetMachineId(out s machine_uuid);
    signals:
    properties:
  };
  interface org.freedesktop.NetworkManager.AccessPoint {
    methods:
    signals:
      PropertiesChanged(a{sv} properties);
    properties:
      readonly u Flags = 0;
      readonly u WpaFlags = 0;
      readonly u RsnFlags = 0;
      readonly ay Ssid = [0x48, 0x50, 0x2d, 0x50, 0x72, 0x69, 0x6e, 0x74, 0x2d, 0x33, 0x31, 0x2d, 0x4f, 0x66, 0x66, 0x69, 0x63, 0x65, 0x6a, 0x65, 0x74, 0x20, 0x36, 0x36, 0x30, 0x30];
      readonly u Frequency = 2412;
      readonly s HwAddress = 'xx:xx:xx:xx:xx:xx';
      readonly u Mode = 2;
      readonly u MaxBitrate = 54000;
      readonly y Strength = 0x31;
      readonly i LastSeen = 2602413;
  };
};

That looks interesting. The ssid, which is a byte array is a property. The way to read properties is with the property get method which is present, so you have to call that. Note it takes two arguments, both strings, one which is the interface name, the other being the method, so you have to specify those. And querying AP number 2714 gives:

 $dbus-send --print-reply --system --dest=org.freedesktop.NetworkManager /org/freedesktop/NetworkManager/AccessPoint/2714 org.freedesktop.DBus.Properties.Get string:org.freedesktop.NetworkManager.AccessPoint string:Ssid
method return time=1632076552.174223 sender=:1.12 -> destination=:1.3082 serial=431816 reply_serial=2
   variant       array of bytes "HP-Print-31-Officejet 6600"

Oh look, one of my neighbours has one of the worse printers ever made. I had one of those printers.

Python provides a reasonable library for doing such things. Here’s some code which iterates over all available network interfaces and prints the SSID of whichever access points it finds:

import dbus
bus = dbus.SystemBus()

obj = bus.get_object('org.freedesktop.NetworkManager', '/org/freedesktop/NetworkManager')
network_manager = dbus.Interface(obj, 'org.freedesktop.NetworkManager')

#Iterate over all devices
for device in network_manager.GetDevices():

    #Get the wireless interface for each device
    obj = bus.get_object('org.freedesktop.NetworkManager', device)
    wlan =  dbus.Interface(obj, 'org.freedesktop.NetworkManager.Device.Wireless')
    
    #Note we don't get an error until we attempt to use the interface
    #I suspect there is a better way
    try:
        for ap_path in  wlan.GetAccessPoints():
            # Read the SSID property
            obj =  bus.get_object('org.freedesktop.NetworkManager', ap_path)
            ap_props =   dbus.Interface(obj, 'org.freedesktop.DBus.Properties')
            ssid = ap_props.Get('org.freedesktop.NetworkManager.AccessPoint', 'Ssid')
            print(''.join([str(v) for v in ssid]))

    except dbus.exceptions.DBusException as e:
        pass

So that’s it for the basics. There’s a whole introspection API as well which I believe returns the structure as XML.

But to call methods, introspection isn’t needed.

Writing a service

Writing a service is pretty easy in Python. Here’s an example where dbus doesn’t steal the entire main loop. Note that this runs in the sesson bus, not the system one, because of security.

from gi.repository import GLib
import time

class Service(dbus.service.Object):
   def __init__(self):
      #Register dbus with GLib's main loop
      dbus.mainloop.glib.DBusGMainLoop(set_as_default=True)
      
      #Bus name 
      bus_name = dbus.service.BusName("com.hello.helloworld", dbus.SessionBus())

      #Object to export
      dbus.service.Object.__init__(self, bus_name, "/")

      self._count=0

   #Register two methods to the one interface
   @dbus.service.method("com.hello.helloworld.Message", in_signature='', out_signature='s')
   def get_message(self):
      self._count+=1
      return "Hello, world " + str(self._count)
    
   @dbus.service.method("com.hello.helloworld.Message", in_signature='i')
   def set_counter(self, i):
      self._count = i
    
   #A way of polling the main loop so that GLib doesn't 
   #steal the program's main loop
   def poll(self):
      loop = GLib.MainLoop()
      def quit():
          loop.quit()
      GLib.idle_add(quit)
      loop.run()

#Instantiate the service
service = Service()

#Poll it
while True:
    service.poll()
    time.sleep(.1)

And it works:

~ $qdbus --session com.hello.helloworld / get_message
Hello, world 1
~ $qdbus --session com.hello.helloworld / get_message
Hello, world 2
~ $qdbus --session com.hello.helloworld / get_message
Hello, world 3
~ $qdbus --session com.hello.helloworld / get_message
Hello, world 4
~ $qdbus --session com.hello.helloworld / get_message
Hello, world 5
~ $qdbus --session com.hello.helloworld / set_counter 0

~ $qdbus --session com.hello.helloworld / get_message
Hello, world 1
~ $qdbus --session com.hello.helloworld / get_message
Hello, world 2

qdbus is often a lot less verbose to use!

Writing a system service

If you try to run with the system bus, it will fail, even if you run as root. This is because of the security policy in place. The policies allow non-root daemons to run as system services, and no one special cased it to make root ones always allowed, which is fine.

The policies are in /etc/dbus-1/system.d/ and they’re sort of understandable, but only sort of. It is documented: essentially it’s default deny with allow/deny rules applied top to bottom based on a matching scheme. In order to run the service as root, the following file works:

<!DOCTYPE busconfig PUBLIC
 "-//freedesktop//DTD D-BUS Bus Configuration 1.0//EN"
 "http://www.freedesktop.org/standards/dbus/1.0/busconfig.dtd">
<busconfig>

  <!-- Only root can own the service -->
  <policy user="root">
    <allow own="com.hello.helloworld"/>
  </policy>

  <policy context="default">
    <allow send_destination="com.hello.helloworld"/>
  </policy>
</busconfig>

essentially this allows only root to own the service, but allows anyone (the default context) to send messages. One the file is created, you then need to reload DBus which can be done with SIGHUP or with systemctl reload debus on systemd based systems.

Starting a system service with systemd

I could do it manually, but perhaps I should bend to the winds of change.

This is simple, for this rather basic service. First, add #!/usr/bin/env python3 to the first line of the script and make it executable. Then copy it to /opt/helloservice/service.py. Then add a very simple unit file to /etc/systemd/system/hello.service:

[Unit]
Description=Hello world service

[Service]
ExecStart=/opt/helloservice/service.py

[Install]
WantedBy=multi-user.target

Is it me or is it weird that the old Windows 3 style .INI files have become a perverse sort of standard?

The future..

Anyway, now a simple:

sudo systemctl daemon-reload
sudo systemctl start hello

starts the service. And to enable it on boot:

sudo systemctl enable hello

which apparently symlinks it in a directory.

And that’ it

A new system service, running as root controllable by a user. The purpose is to have some NeoPixels on an Raspberry Pi controlled as a user (the pixels can only run as root due to the need to access low level hardware).

Overwrite a file but only if it exists (in BASH)

Imagine you have a line in a script:

cat /some/image.img > /dev/mmcblk0

which dumps a disk image on to an SD card. I have such a line as part of the setup. Actually, a more realistic one is:

pv /some/image.img | sudo 'bash -c cat >/dev/mmcblk0'

which displays a progress bar and doesn’t require having the whole script being run as root. Either way, there’s a problem: if the SD card doesn’t happen to be in when that line runs, it will create /dev/mmcblk0. Then all subsequent writes will go really fast (at the speed of the main disk), and you will get confused and sad when none of the changes are reflected on the SD card. You might even reboot which will magically fix the problem (/dev gets nuked). That happened to me šŸ˜¦

The weird, out of place dd tool offers a nice fix:

pv /some/image.img | sudo dd conv=nocreat of=/dev/mmcblk0

You can specify a pseudo-conversion, which tells it to not create the file if it doesn’t already exist. It also serves the second purpose as the “sudo tee” idiom but without dumping everything to stdout.

A simple hack but a useful one. You can do similar things like append, but only if it exists, too. The incantation is:

dd conv=nocreat,notrunc oflag=append of=/file/to/append/to

That is, don’t create the file, don’t truncate it on opening and append. If you allow it to truncate, then it will truncate then append, which is entirely equivalent to overwriting but nonetheless you can still specify append without notrunc.

Warn or exit on failure in a shell script

Make has a handy feature where when a rule fails, it will stop whatever it’s doing. Often though you simply want a linear list of commands to be run in sequence (i.e. a shell script) with the same feature.

You can more or less hack that feature with BASH using the DEBUG trap. The trap executes a hook before every command is run, so you can use it to test the result of the previous command. That of course leaves the last command dangling, so you can put the same hook on the EXIT trap which runs after the last command finishes.

Here’s the snippet and example which warns (rather than exits) on failure:

function test_failure(){
  #Save the exit code, since anything else will trash it.
  v=$?
  if [ $v != 0 ]
  then
    echo -e Line $LINE command "\e[31m$COM\e[0m" failed with code $v
  fi
  #This trap runs before a command, so we use it to
  #test the previous command run. So, save the details for
  #next time.
  COM=${BASH_COMMAND}
  LINE=${BASH_LINENO}
}

#Set up the traps
trap test_failure EXIT
trap test_failure DEBUG

#Some misc stuff to test.
echo hello
sleep 2 ; bash -c 'exit 3'

echo world
false

echo what > /this/is/not/writable

echo the end

Running it produces:

$ bash errors.bash 
hello
Line 21 command bash -c 'exit 3' failed with code 3
world
Line 25 command false failed with code 1
errors.bash: line 27: /this/is/not/writable: No such file or directory
Line 27 command echo what > /this/is/not/writable failed with code 1
the end
$

Simple unit testing with a Makefile

Automated unit tests are very useful. They’re an excellent way of making sure you haven’t broken something in an obvious way when you change things. You can also implement them with very little work and without needing to pull in an external framework, just by using Makefiles. Since Make understands dependencies, this also ensures that when edits are made, only the minimal number of tests need to be rerun.

This is slightly simplified example on the method I’ve been using on the TooN library and my own projects.

Let’s say you have a very simple matrix class:

#ifndef MATRIX_H
#define MATRIX_H

#include <cmath>
#include <initializer_list>
#include <array>
#include <cassert>
#include <iostream>
struct Matrix2
{
	
	std::array<std::array<double, 2>, 2> data;

	public:

		Matrix2(std::initializer_list<double> i)
		{
			assert(i.size() == 4);
			auto d = i.begin();
			data ={ *(d+0), *(d+1), *(d+2), *(d+3)};
		}

		std::array<double,2>& operator[](int i)
		{
			return data[i];
		}	

		const std::array<double,2>& operator[](int i) const
		{
			return data[i];
		}	


		Matrix2 operator*(const Matrix2& m) const
		{
			Matrix2 ret = {0,0,0,0};

			for(int r=0; r < 2; r++)	
				for(int c=0; c < 2; c++)	
					for(int i=0; i < 2; i++)
						ret[r][c] += (*this)[r][i] * m[i][c];
			return ret;
		}

};

inline double norm_fro(const Matrix2& m)
{
	double f=0;
	for(int r=0; r < 2; r++)	
		for(int c=0; c < 2; c++)	
			f+=m[r][c];

	return sqrt(f);
}

inline Matrix2 inv(const Matrix2& m)
{
	double d = 1./(m[0][0]*m[1][1] - m[1][0]*m[0][1]);

	return {
		 m[1][1]*d,   m[0][1]*d ,
		 -m[1][0]*d,  m[0][0]*d 
	};
}

std::ostream& operator<<(std::ostream& o, const Matrix2& m)
{
	o<< m[0][0] << " " << m[0][1] << std::endl;
	o<< m[1][0] << " " << m[1][1] << std::endl;
	return o;
}

#endif

(did you spot the error?)

And you want to find the inverse of a 2×2 matrix:

#include "matrix.h"

using namespace std;

int main()
{
	Matrix2 m = {1, 1, 0, 1};
	cout << "Hello, this is a matrix:\n" << m << endl 
	     << "and this is its inverse:\n" << inv(m) << endl;

}

Simple enough. In order to build it, you can write a very simple makefile:

CXX=g++-5
CXXFLAGS=-std=c++14 -g -ggdb -Wall -Wextra -O3  -Wodr -flto


prog:prog.o
	$(CXX) -o prog prog.o $(LDFLAGS)	

We can make:

make

And get a result:

Hello, this is a matrix:
1 1
0 1

and this is its inverse:
1 -1
-0 1

Plausible, but is it right? (a clue: no.) So, let’s write a test program that creates matrices and multiplies them by their inverse and checks their norm against the norm of I. This will go in tests/inverse.cc


#include "matrix.h"
#include <random>
#include <iostream>
using namespace std;


int main()
{
	mt19937 rng;
	uniform_real_distribution<> r(-1, 1);
	int N=10000;
	double sum=0;
	
	for(int i=0; i < N; i++)
	{
		Matrix2 m = {r(rng), r(rng), r(rng), r(rng) };
		sum += norm_fro(m * inv(m))-sqrt(2);
	}

	cout << sum / N << endl;

	//Looks odd? Imagine if sum is NaN
	if(!(sum / N < 1e-8 ))
	{
		return EXIT_FAILURE;
	}

	cout << "OK\n";
}

And we get the output:

6.52107

So there’s an error. At the moment the test is ad-hoc. We have to remember to compile it (there’s no rule for that) and we have to remember to run it whenever we make some edits. This can all be automated with Make.

So, let’s first make a rule for building tests:


#Build a test executable from a test program. On compile error,
#create an executable which declares the error.
tests/%.test: tests/%.cc
	$(CXX) $(CXXFLAGS) $< -o $@ -I . $(LDFLAGS) ||\ { \ echo "echo 'Compile error!'; return 126" > $@ ; \
	  chmod +x $@; \
	}

This is a bit unusual, instead of just building the executable, if it fails, we make a working executable which indicates a compile error. This will eventually allow us to run a battery of tests and get a neat report of any failures and compile errors rather than the usual spew of compiler error messages.

So now we can (manually) initiate make and run the test. Note that if the test fails, the program returns an error.

We’re now going to take this a bit further. From the test program we’re going to generate a file with a similar name, but that has one line in it. The line will consist of the test name, followed by a status of the result. We do this in two stages. First, run the test and append either “OK”, “Failed” or “Crash!!” to the output depending on the exit status. If a program dies because of a signal, the exit status is 128+signal number, so a segfault has exit status 139. From the intermediate file, we’ll then create the result file with the one line in it.

#Build a test executable from a test program. On compile error,
#create an executable which declares the error.
tests/%.test: tests/%.cc
	$(CXX) $(CXXFLAGS) $< -o $@ -I . $(LDFLAGS) ||\ { \ echo "echo 'Compile error!'" > $@ ; \
	  chmod +x $@; \
	}

#Run the program and either use it's output (it should just say OK)
#or a failure message
tests/%.result_: tests/%.test
	$< > $@ ; \
	a=$$? ;\
	if [ $$a != 0 ]; \
	then \
	   if [ $$a -ge 128 ] ; \
	   then \
	       echo Crash!! > $@ ; \
	   elif [ $$a -ne 126 ] ;\
	   then \
	       echo Failed > $@ ; \
	   fi;\
	else\
	    echo OK >> $@;\
	fi


tests/%.result: tests/%.result_
	echo $*: `tail -1 $<` > $@

We can now make test/inverse.result and we get the following text:

g++-5 -std=c++14 -g -ggdb -Wall -Wextra -O3  -Wodr -flto tests/inverse.cc -o tests/inverse.test -I .  ||\
        { \
          echo "echo 'Compile error!'" > tests/inverse.test ; \
          chmod +x tests/inverse.test; \
        }
tests/inverse.test > tests/inverse.result_ ; \
        a=$? ;\
        if [ $a != 0 ]; \
        then \
           if [ $a -ge 128 ] ; \
           then \
               echo Crash!! > tests/inverse.result_ ; \
           else\
               echo Failed > tests/inverse.result_ ; \
           fi;\
        else\
            echo OK >> tests/inverse.result_;\
        fi
echo inverse: `tail -1 tests/inverse.result_` > tests/inverse.result

And the contents is:

inverse: Failed

Just to check the other options, we can add the following line to tests/inverse.cc

*(int) 0 = 1;

And sure enough we get:

inverse: Crash!!

So it seems to be working. The next thing is to be able to run all the tests at once and generate a report. So we’ll add the following lines to use every .cc file in tests/ as a test and process the strings accordingly:

#Every .cc file in the tests directory is a test
TESTS=$(notdir $(basename $(wildcard tests/*.cc)))


#Get the intermediate file names from the list of tests.
TEST_RESULT=$(TESTS:%=tests/%.result)


# Don't delete the intermediate files, since these can take a
# long time to regenerate
.PRECIOUS: tests/%.result_ tests/%.test


#Add the rule "test" so make test works. It's not a real file, so
#mark it as phony
.PHONY: test
test:tests/results


#We don't want this file hanging around on failure since we 
#want the build depend on it. If we leave it behing then typing make
#twice in a row will suceed, since make will find the file and not try
#to rebuild it.
.DELETE_ON_ERROR: tests/results 

tests/results:$(TEST_RESULT)
	cat $(TEST_RESULT) > tests/results
	@echo -------------- Test Results ---------------
	@cat tests/results
	@echo -------------------------------------------
	@ ! grep -qv OK tests/results 

Now type “make test” and you’ll get the following output:

-------------- Test Results ---------------
inverse: OK
-------------------------------------------

The system is pretty much working. You can now very easily add tests. Create a .cc file in the tests directory and make it return s standard code and… that’s it. The very final stage is to make the target we want to build depend on the results of the test:

prog:prog.o tests/results
	$(CXX) -o prog prog.o $(LDFLAGS)	

At this point you can now type “make prog” and the executable will only build if all the tests pass. There’s one minor wrinkle remaining: make has no mechanism for scanning C++ source files to check for dependencies. So, if you update matrix.h then it won’t rerun the tests because it doesn’t know about the dependency of the test results on matrix.h. This problem can also be solved in make. The complete makefile (with the dependency scanner at the bottom is):

CXX=g++-5
CXXFLAGS=-std=c++14 -g -ggdb -Wall -Wextra -O3  -Wodr -flto


prog:prog.o tests/results
	$(CXX) -o prog prog.o $(LDFLAGS)	

clean:
	rm -f tests/*.result tests/*.test tests/*.result_ prog *.o


#Every .cc file in the tests directory is a test
TESTS=$(notdir $(basename $(wildcard tests/*.cc)))




#Get the intermediate file names from the list of tests.
TEST_RESULT=$(TESTS:%=tests/%.result)


# Don't delete the intermediate files, since these can take a
# long time to regenerate
.PRECIOUS: tests/%.result_ tests/%.test

#Add the rule "test" so make test works. It's not a real file, so
#mark it as phony
.PHONY: test
test:tests/results


#We don't want this file hanging around on failure since we 
#want the build depend on it. If we leave it behing then typing make
#twice in a row will suceed, since make will find the file and not try
#to rebuild it.
.DELETE_ON_ERROR: tests/results 

tests/results:$(TEST_RESULT)
	cat $(TEST_RESULT) > tests/results
	@echo -------------- Test Results ---------------
	@cat tests/results
	@echo -------------------------------------------
	@ ! grep -qv OK tests/results 


#Build a test executable from a test program. On compile error,
#create an executable which declares the error.
tests/%.test: tests/%.cc
	$(CXX) $(CXXFLAGS) $< -o $@ -I . $(LDFLAGS) ||\ { \ echo "echo 'Compile error!' ; return 126" > $@ ; \
	  chmod +x $@; \
	}

#Run the program and either use it's output (it should just say OK)
#or a failure message
tests/%.result_: tests/%.test
	$< > $@ ; \
	a=$$? ;\
	if [ $$a != 0 ]; \
	then \
	   if [ $$a -ge 128 and ] ; \
	   then \
	       echo Crash!! > $@ ; \
	   elif [ $$a -ne 126 ] ;\
	   then \
	       echo Failed > $@ ; \
	   fi;\
	else\
	    echo OK >> $@;\
	fi
	
tests/%.result: tests/%.result_
	echo $*: `tail -1 $<` > $@

#Get the C style dependencies working. Note we need to massage the test dependencies
#to make the filenames correct
.deps:
	rm -f .deps .sourcefiles
	find . -name "*.cc" | xargs -IQQQ $(CXX) $(CXXFLAGS) -MM -MG QQQ | sed -e'/test/s!\(.*\)\.o:!tests/\1.test:!'  > .deps

include .deps

The result is a basic unit testing system written in about 30 lines of GNU Make/bash. Being make based, you get all the nice properties of make: the building and testing all runs in parallel if you ask it to and if you update some file, it will only rerun the tests it needs to.The code along with some more sample tests is available here: https://github.com/edrosten/unit_tests_with_make

Learning shell scripting without manuals

Imagine you wake up one sunny, blissful morning, brew some drip coffee
with your V60, and start reading the morning paper–I mean the
morning reddit–on your homebrew Kindle-like contraption. Two dangerously
named files await you on the screen:

~/morning$ ls *
total 8
drwxrwxr-x  2 damian damian 4096 Jun 19 14:22 .
drwxr-xr-x 30 damian damian 4096 Jun 19 14:22 ..
-rw-rw-r--  1 damian damian    0 Jun 19 14:44 *
-rw-rw-r--  1 damian damian    0 Jun 19 14:45 -la
~/morning$

Now, that’s strange, because I invoked ls without -la. If I instead provide no arguments, /bin/ls behaves as intended.

~/morning$ ls /
bin    dev   initrd.img      lib64       mnt   root  srv  usr      vmlinuz.old
boot   etc   initrd.img.old  lost+found  opt   run   sys  var
cdrom  home  lib             media       proc  sbin  tmp  vmlinuz
~/morning$

In the two /bin/ls invocations above, neither have options so we can conclude one
thing: wildcards, if used improperly, can lead to different behavior
depending on the contents of a directory. If your new to bash and that doesn’t scare you,
you may want to get your amygdala checked.

If you have a firm understanding of bash’s rules for variable
expansion and the basics of how options are parsed, you will stay on
top of dealing with these files. You should, however, try to get to
the bottom of where they came from, but that’s another story.

To safely simulate how bash might handle something more dangerous like:

  rm *

we can use /bin/echo. Like /bin/rm, /bin/echo is written in C and
uses the same library functions for parsing command line arguments. /bin/echo
writes its arguments to its standard output.

  /bin/echo /your/dangerous command call here

Let’s try this technique:

~/morning$ /bin/echo rm *
rm * -la

Fortunately, -l and -a are invalid options in rm so they cause an error without doing anything harmful. If a file had been named "-rf", then we’d have to be a bit more careful.

Without knowing bash’s rules, we still do not know how bash chops the string “/bin/echo ls *” into elements of the argument vector passed to the /bin/echo C program. This is important regardless of your choice language since Python, Perl, Java, and even bash itself provide an array of strings that is directly derived from the array passed to C’s main(). I can’t speak for Ruby.

More impressive than a V60 is a program called strace. It records most system call interactions between a process and the kernel. A system call is a API call made between a user space program and the kernel. For example, the ubiquitous open(), read(), write(), and close() are system calls.

We will run strace on /bin/echo simply to see how bash
parses an expression and chops it into individual string elements of an argument
array before its passed to the program.

  $ strace /bin/echo rm *
  execve("/bin/echo", ["/bin/echo", "rm", "*", "-la"], [/* 57 vars */]) = 0
  ...

The very first system call is execve(). It is almost always the first
call trapped by strace.
It is called by the C implementation of bash
to load in a program into the child process. Here is the interface:

   int execve(const char *program_name, char *const argv[], char *const envp[])

It takes in the pathname of the program to run, the program’s
arguments as an array of null-terminated strings, and the environment
variables set for the program.

execve() is typically called after a fork(), which the parent process
(e.g. bash) uses to create the child process. If execve() is successful
in loading the program, it takes over the child process with the
program image and flow control begins at the start of the program’s
main() function. Otherwise, flow control continues in the program that
called fork() and execve(), which might be bash.

Digression #1

Most programs like bash, Perl, and Python are written in
C and do something like the following to call an external program:

  int child_pid;
  child_pid = fork();
  if (child_pid < 0) {
    error(0, errno, "fork() failed")
  }
  if (child_pid == 0) {
    int status = execvp(program_name, argv);
    if (status == -1) {
      error(0, errno, "could not execute program %s", program_name);
    }
  }

This idiom is called a “fork, exec”. If we’re in the new child process, fork() returns a positive value. By convention, the parent continues on, and the child is responsible for loading the desired program and running it, which is accomplished by execve().

execve is pretty nifty: if successful, the child won’t continue
to the next line. In fact, it will forget its current program image
and the program_name will be memory-mapped to replace it. Its main() function is then called.

  $ strace -- /bin/echo rm *
  execve("/bin/echo", ["/bin/echo", "rm", "*", "-la"], [/* 57 vars */]) = 0
  ...

If you are unfamiliar with the dash dash --, it tells most
programs that any arguments that follow are not to be interpreted as options.
This enables you to chain many commands together without ambiguity about the command
to which each option belongs. But is it useful for something else?

Digression #2

The POSIX standard defines the semantics of how options are parsed. A
great majority of programs bundled with Linux and Mac OS X follow
it. In fact, if you use getopt(), you automatically follow it for free.
Most core programs in UNIX use getopt() to parse the options so you
often get consistent option parsing behavior across a broad spectrum of programs.

Now suppose we wanted to tunnel through a bastion/gateway named
host1 into a second machine host2, and remove a file in the environment
variable $file_to_remove one could do:

  $ ssh -i ~/.ssh/key1 user1@host1 -- ssh -i /nfs/home/me/.ssh/key2 user2@host2 -- /bin/rm -- "$file_to_remove"

Now, as it is written, I know that the first ssh will stop interpreting options
after the first --, the second ssh will interpret only options in between the
first and second –, and the third command /bin/rm will not parse any options!

Eureka! So, let’s see if we can get /bin/ls to behave consistently
regardless of the contents of the current directory:

  ~/morning$ ls -- *
  *  -la

Excellent, -- without any arguments before it, effectively invokes
ls without any options.

Let’s assume we have a file we cannot afford to lose called important:

  ~/morning$ touch important

Now, let’s list the files without any special options:

~/morning$ ls -- *
*  important  -la

Alright, so bash is obviously interpreting the asterisk. We can safely
remove -la using the -- trick:

  ~/morning$ rm -- -la

Now this file is gone, ls should behave correctly (but not consistently) without the --.

  ~/morning$ ls *
  *  important

Now that we think we understand things better, let’s create a file
that’s more dangerous than the innocuous “-la”:

  ~/morning$ touch -- -rf

In fact, -- is the only way we can get touch to create a file
with a name that begins with -.

Let’s call our friendly, non-destructive program /bin/echo
to simulate the interpretation of an asterisk:

  ~/morning$ /bin/echo *
  * important -rf

The expansion of a filename wildcard is called glob(), a venerable
C standard library function. In fact, we have an endearing name for this process:
globbing. My hypothesis is that bash passes an unglobbed asterisk when
it is double-quoted. Let’s see if bash globs when we enclose it with double quotes.

  ~/morning$ /bin/echo "*"
  *

Ahh, now let’s corroborateĀ this with strace.

  ~/morning$ strace -- /bin/echo "*"
  execve("/bin/echo", ["/bin/echo", "*"], [/* 57 vars */]) = 0

And indeed, main() receives its asterisk unglobbed. Most system calls will
not glob their filename inputs including the system call used to delete files,
viz. unlink(). In fact, the commands rm, ls, etc. will not glob their arguments and neither will the system calls.

Let’s be unusually brave and try:

   rm -- "*"

The -- will ensure that "-rf" is not interpreted, and /bin/rm does not descend into parent directory .. and its contents. Recursively, this would include all files the executing user has access to. Let’s use strace to see what’s happening under the hood and confirm that our asterisk is passed to unlink unglobbed:

  ~/morning$ strace -- /bin/rm -- "*"
  execve("/bin/rm", ["/bin/rm", "--", "*"], [/* 57 vars */]) = 0
  ...
  access("*", W_OK)          = 0
  unlink("*")                = 0

It is important to note in the execve(), since the asterisk is not globbed by bash, -rf does not appear in our arguments. However, if it did, we would be protected from interpreted it as an option due to the dash dash.

Digression #3

strace traces lots of different system calls. If you are unfamiliar with a particular system call, you can look it up with man.

  man 2 access

The 2 tells man to only include pages about system calls. Use 1 instead for commands and 3 for C standard library calls.

To read up on glob, the function used to turn a wildcard expression string into a list of matches,

  man 3 glob

This implies glob is not part of the kernel. In fact, I believe it is a true statement that all system calls do not glob.

From the last strace, we can conclude several things:

  • the asterisk is not globbed when passed to the system calls so it will be treated as a filename without any special interpretation of the wildcard characters,
  • access(): rm is first checking if it can write to the file named “*”
  • unlink(): the file is removed

Now, let’s confirm the asterisk file is removed:

  ~/morning$ ls
  important  -rf

Now we can safely remove the “-rf” file so it does not wreak havoc when we’re feeling less mindful.

  ~/morning$ rm -- -rf

Now, only our important file remains without any dangerously named files:

  ~/morning$ ls
  important

How about files with spaces? Let’s create one:

 ~/morning$ touch -- "-rf *"
 ~/morning$ ls -l
total 0
-rw-rw-r-- 1 damian damian 0 Jun 19 15:24 important
-rw-rw-r-- 1 damian damian 0 Jun 19 16:02 -rf *

We see that double quotes suppresses the glob so it can be used to manipulate files with spaces, but it also can be used to avoid splitting a string into multiple command line arguments. Let’s verify this behavior with strace and our innocuous /bin/echo:

  ~/morning$ strace -- /bin/echo -- "-rf *"
  execve("/bin/echo", ["/bin/echo", "--", "-rf *"], [/* 58 vars */]) = 0
  ...

Indeed, "-rf *" is not split. Let’s take it further and put two
double-quoted strings side-by-side:

  ~/morning$ strace -- /bin/echo -- "-rf *"" more stuff"
  execve("/bin/echo", ["/bin/echo", "--", "-rf * more stuff"], [/* 58 vars */]) = 0
  ...

Let’s now separate the two double quoted strings by whitespace.

  ~/morning$ strace -- /bin/echo -- "-rf *"   " more stuff"
  execve("/bin/echo", ["/bin/echo", "--", "-rf *", " more stuff"], [/* 58 vars */]) = 0

From these two recent straces, we notice two things: bash splits an expression into separate elements of the args array passed in execve() when there is unquoted whitespace, but without unquoted whitespace, they belong to the same string in a single element of the args array.

Let’s be careful and rename our file. Knowing most standard programs follow the POSIX convention, let’s make a habit of preventing filenames from being interpreted as options.

  ~/morning$ mv -- important "my important file"

Now let’s store its name in a variable:

  ~/morning$ important_file="my important file"

Now let’s try echoing it unquoted:

  ~/morning$ strace -- /bin/echo $important_file
  execve("/bin/echo", ["/bin/echo", "my", "important", "file"], [/* 58 vars */]) = 0

bash breaks up “my important file” into three args: “my”, “important”, “file”.

With double quotes, we prevent whitespace from splitting the string into
separate args. Let’s try it.

  ~/morning$ strace -- /bin/echo -- "$important_file"
  execve("/bin/echo", ["/bin/echo", "--", "my important file"], [/* 58 vars */]) = 0

As intended, “my important file” survives as a single argument when we use double quotes. Let’s see what happens when we use single quotes:

  ~/morning$ strace -- /bin/echo -- '$important_file'
execve("/bin/echo", ["/bin/echo", "--", "$important_file"], [/* 58 vars */]) = 0

This experiment means that bash does not expand variables with single quotes. And how about wildcards?

  ~/morning$ strace -- /bin/echo -- '*'
  execve("/bin/echo", ["/bin/echo", "--", "*"], [/* 58 vars */]) = 0

Let’s protect our important file by making it read only:

  ~/morning$ strace -- chmod -- 400 "$important_file"
  execve("/bin/chmod", ["chmod", "--", "400", "my important file"], [/* 58 vars */]) = 0

As can be seen from chmod, -- does not completely protect us against option injection since 400 is interpreted as a BSD-style option.

Conclusion

To write more secure bash scripts, it is best to follow fourĀ rules of thumb. First, specify the fully qualified pathname to the intended program to execute. Next, specify all the intended options together and use a -- to separate the non-option arguments. Third, double quote your variables so they aren’t globbed and they survive as a single argument. Fourth, single quote variable expressions that shouldn’t be expanded.

   /full/path/to/command [intended options] -- "$x1" "$x2" ...

For example,

  • copy a file with filename "$x" to directory foo:
    /bin/mv -- "$x" foo/
  • copy files "$x" and "$y" to directory dst:
    /bin/cp -- "$x" "$y" dst/

This helps ensure our scripts:

  • execute only the program that was intended,
  • are resilient to the most common option injections in command-line arguments
  • can safely operate on arguments/filenames that start with a dash as well as names containing wildcards or whitespace

In summary, strace and /bin/echo, enabled us to learn a lot about bash without harm. The dangerously named files are gone. We can continue our morning coffee and news stress free now that we know how to better deal with subversive filenames.