New paper: 3D Structure from 2D Microscopy images using Deep Learning

Benjamin J. Blundell, Christian Sieben, Suliana Manley, myself, QueeLim Ch’ng, and Susan Cox published a new paper in Frontiers in Bioinformatics (open access).


Understanding the structure of a protein complex is crucial in determining its function. However, retrieving accurate 3D structures from microscopy images is highly challenging, particularly as many imaging modalities are two-dimensional. Recent advances in Artificial Intelligence have been applied to this problem, primarily using voxel based approaches to analyse sets of electron microscopy images. Here we present a deep learning solution for reconstructing the protein complexes from a number of 2D single molecule localization microscopy images, with the solution being completely unconstrained. Our convolutional neural network coupled with a differentiable renderer predicts pose and derives a single structure. After training, the network is discarded, with the output of this method being a structural model which fits the data-set. We demonstrate the performance of our system on two protein complexes: CEP152 (which comprises part of the proximal toroid of the centriole) and centrioles.

Don’t pay for what you don’t use in libCVD

libCVD has a pretty spiffy image loading function. You can do:

Image<Rgb<byte>> img = img_load("a_file.ext");

and you’re ready to go accessing pixels. The img_load function takes care of a lot for you: it determines the file type, calls the appropriate handler, then converts whatever the pixel type on disk is (it could be binary, greyscale or high bit depth) into the type in your program (and you don’t even need to provide the type to img_load).

At this point bear in mind that libCVD is a library for computer vision (especially frame rate vision), and at that point you know which type you need at compile time. The automatic conversion would be a showstopper if you wanted to accurately represent the file, but for the domain, you want to get the data into the desired type.

This function is very easy to use, but potentially expensive because libCVD supports quite a wide variety of image types. That causes two problems for shipped code:

  1. The shipped binary will be larger than necessary because it will contain code to load image formats you probably don’t care to support in production (e.g. FITS).
  2. Increased attack surface. TIFF in particular is a very complex file format with a vast number of options, and as a result libtiff has even recently had a number of serious CVEs

You can compile libCVD without an external library (e.g. TIFF), but currently there’s no way of switching off built-in libraries. I could add that, but that creates another problem: I didn’t add those formats for no reason. During debugging or analysis it can often be very useful to save and load internal state such as floating point images (for which you’d need TIFF or FITS). Then you’d need to build libCVD in multiple configurations, both with and without various options and switch them in and out as necessary.

That’s a big administrative disadvantage and adds a continual ongoing burden of wrangling multiple build configurations. The solution it turns out was remarkably simple:

Why not provide a type list? Then the linker can remove unused code.

– David McCabe

And that’s it, really! There are three minor variations:

Image<byte> i = img_load("file"); 
Image<byte> i = img_load<PNG::Reader, JPEG::Reader>("file"); 

using Files = std::tuple<PNG::Reader, JPEG::Reader>;
Image<byte> i = img_load<Files>("file");  

The first works as always and will load all supported image types. The second and third limit the list to only the specified ones and other types won’t be included anywhere in the resulting binary. Both variations are provided for ergonomics. The second because it’s nice to use directly, the third because you can’t save a parameter pack.

Internally it’s implemented using tuples because converting from a pack to a tuple is easy, but the reverse is more annoying.

The implementation is pretty straightforward (edited for brevity; the runtime error checking stuff isn’t relevant). First the code to load given a typelist in a tuple:

template<class I, class ImageTypeList, int N=0>
void img_load_tuple(Image<I>& im, std::istream& i, [[maybe_unused]] int c){
	if constexpr (N==std::tuple_size_v<ImageTypeList>) {
		throw Exceptions::Image_IO::UnsupportedImageType();
		using ImageReader = std::tuple_element_t<N, ImageTypeList>;
			CVD::Internal::readImage<I,ImageReader>(im, i);
			img_load_tuple<I, ImageTypeList, N+1>(im, i, c);

It’s a pretty run-of-the-mill compile time iteration scheme, which is now just a single fnuction with if constexpr. Note that loading makes it’s decision based on the first byte of the file, and each image loader has a function to test for a match. This replaces the old if-else chain which I misremembered as a switch statement (edited):

	template<class I> void img_load(Image<I>& im, std::istream& i)
	  unsigned char c = i.peek();
	  if(c == 'P')
	    CVD::Internal::readImage<I, PNM::Reader>(im, i);
	  else if(c == 0xff)
	    CVD::Internal::readImage<I, JPEG::reader>(im, i);
	  else if(c == 'I' || c == 'M') //Little or big endian TIFF
	    CVD::Internal::readImage<I, TIFF::tiff_reader>(im, i);
	  else if(c == 0x89)
	    CVD::Internal::readImage<I, PNG::png_reader>(im, i);
	  else if(c == 'B')
	    CVD::Internal::readImage<I, BMP::Reader>(im, i);
	  else if(c == 'S')
	    CVD::Internal::readImage<I, FITS::reader>(im, i);
	  else if(c == 'C')
		CVD::Internal::readImage<I, CVDimage::reader>(im, i);
	  else if(c == ' ' || c == '\t' || isdigit(c) || c == '-' || c == '+')
	    CVD::Internal::readImage<I, TEXT::reader>(im, i);
	    throw Exceptions::Image_IO::UnsupportedImageType();

Fortunately the image types are distinguishable from exactly 1 byte. This is handy because it allows all applicable types to be read from a non-seekable istream (e.g. one wrapping a pipe) without any modification because you can peek exactly one byte.

The neat bit allowing the dual interfaces easily is a template which turns its input into a tuple. You give it a tuple or a parameter pack and you always get a tuple back:

template<class... T> struct as_tuple{
	using type = std::tuple<T...>;

template<class... T> struct as_tuple<std::tuple<T...>>{
	using type = std::tuple<T...>;

This allows then a single function which has no opinions on what it is meant to operate on:

template <class I, class Head = Internal::AllImageTypes, class... ImageTypes>
void img_load(Image<I>& im, std::istream& i)
	img_load_tuple<I, typename Internal::as_tuple<Head, ImageTypes...>::type>(im, i, c);

The only flourish is taking two arguments for the loaders, one simple one a pack in order to allow the arguments to be defaulted.

The results I think gives the best of all worlds. It has a simple to use interface for the library, allows the user to not pay for what they don’t use, avoids compile time configuration, and it’s a clear and straightforward implementation. It also enforces a more sensible separation: the image loaders themselves can determine from the file magic bytes whether they can operate, rather than separating two different parts of the loading into different places.

New paper: Y-Autoencoders: Disentangling latent representations via sequential encoding

Massimiliano Patacchiola, Patrick Fox-Roberts and myself published a new paper in Pattern Recognition Letters (PDF here).

This work presents a new way of training auto encoders to allow separation of style and content which gives GAN like performance with the ease of training of auto encoders.


In the last few years there have been important advancements in generative models with the two dominant approaches being Generative Adversarial Networks (GANs)and Variational Autoencoders (VAEs). However, standard Autoencoders (AEs) and closely related structures have re-mained popular because they are easy to train and adapt to different tasks. An interesting question is if we can achieve state-of-the-art performance with AEs while retaining their good properties. We propose an answer to this question by introducing a new model called Y-Autoencoder (Y-AE). The structure and training procedure of a Y-AE enclose a representation into an implicit and an explicit part. The implicit part is similar to the output of an auto-encoder and the explicit part is strongly correlated with labels in the training set. The two parts are separated in the latent space by split-ting the output of the encoder into two paths (forming a Y shape) before decoding and re-encoding. We then impose a number of losses, such as reconstruction loss, and a loss on dependence between the implicit and explicit parts. Additionally, the projection in the explicit manifold is monitored by a predictor, that is embedded in the encoder and trained end-to-end with no adversarial losses. We provide significant experimental results on various domains, such as separation of style and content, image-to-image translation, and inverse graphics.

New paper: Large Scale Photometric Bundle Adjustment

Myself and Olly Woodford published a new paper: Large Scale Photometric Bundle Adjustment, PDF here, at BMCV 2020.

This work presents a fully photometric formulation for bundle adjustment. Starting from a classical system (such as COLMAP), the system performs a structure and pose refinement, where the cost function is essentially the normalised correlation cost of patches reprojected into the source images.


Direct methods have shown promise on visual odometry and SLAM, leading to greater accuracy and robustness over feature-based methods. However, offline 3-d reconstruction from internet images has not yet benefited from a joint, photometric optimization over dense geometry and camera parameters. Issues such as the lack of brightness constancy, and the sheer volume of data, make this a more challenging task. Thiswork presents a framework for jointly optimizing millions of scene points and hundreds of camera poses and intrinsics, using a photometric cost that is invariant to local lighting changes. The improvement in metric reconstruction accuracy that it confers over feature-based bundle adjustment is demonstrated on the large-scale Tanks & Temples benchmark. We further demonstrate qualitative reconstruction improvements on an in ternet photo collection, with challenging diversity in lighting and camera intrinsics.

Realtime AR world transformations with occlusions

This is me, my team and collaborators have been working on recently. World transforming AR specifically the floor.

You can see occlusions, such as the pillars occluding the floor effect, but we have more sophisticated occlusion handling too:

You can’t tell from this video that the occlusion handling is dynamic, so if the postbox managed to move, the occlusions would stay up to date. And here’s a gallery of nice shots

If you have Snapchat and want to try it for yourself, here are the snapcodes: