ICS-E4020 exercise: Image Segmentation

Overview Examples Implementation Details Tasks Hints


Find the best way to partition the given figure in two parts: a monochromatic rectangle and a monochromatic background. The objective is to minimise the sum of squared errors.


Input OutputOutput

Input OutputOutput

Input OutputOutput

Input OutputOutput

Input OutputOutput


We will implement a command line tool pngsegment that you can use to do image segmentation in PNG files.


The program can be used as follows:

    ./pngsegment INPUT OUTPUT1 OUTPUT2

where INPUT is the input file and OUTPUT1 and OUTPUT2 will show the segmentation: OUTPUT1 will draw the approximation of the image as a monochromatic rectangle and a monochromatic background, while OUTPUT2 will show the location of the rectangle overlaid with the original figure.

For example, if you have a PNG file called example.png in the current directory, you can do image segmentation as follows:

    ./pngsegment example.png result1.png result2.png


You can find an almost-working implementation in our shared code repository, in the subdirectories is*. There is one directory per task; the templates are identical.

The only part that is missing is the implementation of the subroutine segment() in file is*/is.cc. This is the function that actually does segmentation. See is*/is.h for the description of the interface.

Once you have implemented the function correctly, you should be able to run make to compile your code, make test to check that it works correctly (at least for some test cases) and make benchmark to see how well it performs.

Detailed specification

You need to implement the following function:

    Result segment(int ny, int nx, const float* data)

Here data is a colour image with ny*nx pixels, and each pixel consists of three colour components, red, green, and blue. In total, there are ny*nx*3 floating point numbers in the array data.

The colour components are numbered 0 <= c < 3, x coordinates are numbered 0 <= x < nx, y coordinates are numbered 0 <= y < ny, and the value of this colour component is stored in data[c + 3 * x + 3 * nx * y].

Correct output

The function has to return an instance of the following structure that indicates the optimal segmentation:

    struct Result {
        int y0;
        int x0;
        int y1;
        int x1;
        float outer[3];
        float inner[3];

The first four fields indicate the location of the rectangle. The upper left corner of the rectangle is at coordinates (x0, y0), and the lower right corner is at coordinates (x1-1, y1-1). That is, the width of the rectangle is x1-x0 pixels and the height is y1-y0 pixels. The coordinates have to satisfy 0 <= y0 < y1 <= ny and 0 <= x0 < x1 <= nx.

The last two fields indicate the colour of the background and the rectangle. Field outer contains the three colour components of the background and field inner contains the three colour components of the rectangle.

Objective function

For each pixel (x,y) and colour component c, we define the error error(y,x,c) as follows:

The total cost of the segmentation is the sum of squared errors, that is, the sum of error(y,x,c) * error(y,x,c) over all 0 <= c < 3 and 0 <= x < nx and 0 <= y < ny.

Your task is to find a segmentation that minimises the total cost.

Numerical stability

The input and output are given as single-precision floating point numbers (type float). However, please use double-precision floating point numbers (type double) for all intermediate results, and only round the final result to single precision.


Check the course home page to see which tasks you are supposed to solve each week.

Task IS1

Subdirectory: is1, file to edit: is1/is.cc.

Implement an algorithm for image segmentation. Use the naive algorithm that tries out all possible locations 0 <= y0 < y1 <= ny and 0 <= x0 < x1 <= nx for the rectangle and finds the best one.

However, your implementation should be reasonably efficient. In particular, make sure that for each possible location of the rectangle you only need to perform O(1) operations to evaluate how good this position is. That is, the total complexity for an n × n figure has to be O(n4). See the hints for suggestions.

You are supposed to use both multiple threads and vector instructions to parallelise your code. You can do it, for example, as follows:

You can also try out different memory layouts and different ways to exploit vector instructions. For example, instead of storing 1 pixel per 1 vector, you can try to store 4 pixels per 3 vectors (one vector per one colour). However, please use double-precision floating point numbers in all arithmetic operations.

Task IS2 — challenging

Subdirectory: is2, file to edit: is2/is.cc.

Design a more efficient algorithm that (at least in typical cases) does not need to try out all possible locations of the rectangle.

Task IS3 — challenging

Subdirectory: is3, file to edit: is3/is.cu.

Use CUDA to implement an efficient algorithm that does image segmentation on GPU. You can use the naive approach from IS1, but make sure that it makes an efficient use of the GPU.


Task IS1

It is important that you can very efficiently calculate the cost of placing a rectangle at certain position. A little bit of math is needed to do this.

Let $P$ be the set of all $(x,y)$ coordinates. For each pixel $p \in P$, let $v_p$ be its colour in the input. That is, $v_p$ is a vector of size 3, where $v_{p,0}$ is the red component, $v_{p,1}$ is the green component, and $v_{p,2}$ is the blue component.

Let us partition $P$ in two disjoint sets, $X \cup Y = P$ and $X \cap Y = \emptyset$. In our application, $X$ is the “rectangle” and $Y$ is the “background”, but this is not yet important here.

We choose colour $a$ for set $X$ and colour $b$ for set $Y$. Again, these are vectors of size 3.

For each pixel $p \in X$ and colour component $c$ the error is $a_c - v_{p,c}$, and for each pixel $p \in Y$ and colour component $c$ the error is $b_c - v_{p,c}$. The total cost over all pixels and all colour components is therefore $$ f(X,Y,a,b) = \sum_c \sum_{p \in X} \left(a_c - v_{p,c}\right)^2 + \sum_c \sum_{p \in Y} \left(b_c - v_{p,c}\right)^2, $$ and our task is to find $(X,Y,a,b)$ that minimises the cost $f(X,Y,a,b)$. Equally well we can maximise the utility $$ g(X,Y,a,b) = - f(X,Y,a,b) + \sum_c \sum_{p \in P} v_{p,c}^2 $$

Let us introduce the shorthand notation $$ v_{X,c} = \sum_{p \in X} v_{p,c}, \quad v_{Y,c} = \sum_{p \in Y} v_{p,c}. $$ We can rewrite the cost as follows: $$ \begin{split} f(X,Y,a,b) &= \sum_c \left( \sum_{p \in X} a_c^2 - \sum_{p \in X} 2a_c v_{p,c} + \sum_{p \in X} v_{p,c}^2 + \sum_{p \in Y} b_c^2 - \sum_{p \in Y} 2b_c v_{p,c} + \sum_{p \in Y} v_{p,c}^2 \right) \\ &= \sum_c \left( |X| \cdot a_c^2 + |Y| \cdot b_c^2 - 2a_c v_{X,c} - 2b_c v_{Y,c} + \sum_{p \in P} v_{p,c}^2 \right). \end{split} $$ Therefore the utility that we want to maximise is simply $$ g(X,Y,a,b) = \sum_c \left( 2a_c v_{X,c} + 2b_c v_{Y,c} - |X| \cdot a_c^2 - |Y| \cdot b_c^2 \right). $$

To maximise $g(X,Y,a,b)$, we try all possible partitions $(X,Y)$, and for each partition we find the best colours $(a^*,b^*)$ that maximise $g(X,Y,a^*,b^*)$. By differentiation, it is easy to see that the optimal colours are the arithmetic means $$ a^*_c = \frac{v_{X,c}}{|X|}, \quad b^*_c = \frac{v_{Y,c}}{|Y|}. $$ We use the shorthand $h(X,Y) = g(X,Y,a^*,b^*)$ for the utility of partition $(X,Y)$ for the best choice of colours. Note that this function takes the following very simple form: $$ h(X,Y) = \sum_c \left( \frac{v_{X,c}^2}{|X|} + \frac{v_{Y,c}^2}{|Y|} \right). $$

Therefore to find an optimal segmentation, it is sufficient to check all possible rectangles $X$ and calculate $h(X,Y)$. To do that, we only need to know $v_{X,c}$ and $v_{Y,c}$. Naturally, $v_{Y,c} = v_{P,c} - v_{X,c}$. Hence to calculate $h(X,Y)$ efficiently, we only need to know how to calculate $v_{X,c}$ efficiently, for any rectangle $X$.

Note that $v_{X,c}$ is simply the sum of colour component $c$ for all pixels in rectangle $X$. With a little bit of preprocessing, we can calculate this sum in constant time.

Let us write $s(x_0,y_0,x_1,y_1,c) = v_{R,c}$, where $R = [x_0,x_1-1] \times [y_0,y_1-1]$ is the rectangle with the corners $(x_0,y_0)$ and $(x_1,y_1)$. Then by the inclusion–exclusion principle (see the illustration below), $$ \begin{split} s(x_0,y_0,x_1,y_1,c) {}={} &s(0,0,x_1,y_1,c) \\ {}-{} &s(0,0,x_0,y_1,c) \\ {}-{} &s(0,0,x_1,y_0,c) \\ {}+{} &s(0,0,x_0,y_0,c). \end{split} $$ So we only need to precompute $s(0,0,x,y,c)$ for all $x$, $y$, and $c$. With this precomputed information, we can then calculate $h(X,Y)$ for any rectangle $X$ in constant time. Vector types will help us to do all work conveniently for all colour components in parallel.

Applying the inclusion-exclusion principle in IS1

For further performance improvements, it may be helpful to organise the loops so that the outer loop tries out different sizes of the rectangle, and the inner loop tries all possible positions of the rectangle. Then $1/|X|$ and $1/|Y|$ remain constant in the inner loop.

If you are comfortable with studying the assembly code produced by the compiler, these observations related to branch predication may also be helpful.

Task IS2

Here is one approach: First use a coarse grid, with e.g. 10 × 10 pixels per grid cell. Then try all possible ways to place the rectangle in this coarse grid. Each coarse location represents a set of approx. 10000 fine-grained locations (so it is much faster to try out all possible coarse locations). For each coarse location, calculate the following two estimates (efficiently, in constant time):

After these calculations, you can hopefully rule out a majority of coarse-grained locations: you know that there are other locations where the cost is at most some value s, so you can then ignore all coarse-grained locations where the cost is larger than s.

Finally, zoom in to those coarse-grained locations that you have not yet ruled out.

You can also apply the grid idea recursively.