Increasing the speed of {raster} processing with R: Part 3/3 Cluster

Hey!

Today I am going to finish the series on how to increase the speed of processing raster images with R. In the last two posts I talked about changing the rasterOptions() and about parallelisation using foreach(). Both of these options migth give you a decent speed boost and decrese your processing time. In this post, I will talk about processing in a cluster with the help of the clusterR function from the {raster} package.

clusterR is a flexible interface for using cluster with other functions. This function only works with functions that have a Raster* object as first argument and that operate on a cell by cell basis (i.e., there is no effect of neigboring cells) and return an object with the same number of cells as the input raster object. ( from ?clusterR)

So what this function does is, that it outsources a process that normally runs on a single core to n other cores specified by the user. Thies yields a major speed boost and minimises your processing time. Cluster works very well with the following (raster) functions: calc(), overlay() or predict(). However, it does not work with merge(), crop(), mosaic(), (dis)aggregate(), resample(), projectRaster(), focal(), distance(), buffer,() direction().

Example without clusterR

Let’s say ras is our rasterstack and we want to calculate the mean of every pixel in the stack. This is how you would do it with calc() using a single core:

ras.mean <- calc(ras, mean, na.rm=T)

Example using clusterR

Now imagine that its a big raster with a lot of layers. This operation might take foreeeever to finish. Here the clusterR() function comes in handy:

beginCluster(4)

ras.mean <- clusterR(ras, calc, args=list(mean, na.rm=T))

endCluster()

The command beginCluster(4) initialises the cluster on your PC and assigns 4 cores to the process. (Note that you are limited by the number of cores on your pc. If you use too many, your PC might crash.)

Then the magic happens: clusterR () calls the function calc() which calls the fucntion mean(). Note that the function that is called by calc() has to be written in a list().

After the process is finished, close the cluster with endCluster().

Use of clusterR and calc with an user defined function

You can also combine the clusterR() and calc() function with your custom function:

a <- 10
f3 <- function(x) sum(x)+a
beginCluster(4)
z1 <- clusterR(s, calc, args=list(fun=f3), export='a')
endCluster()

When f3() is your custom function, simply plug it into the args argument inside a list. If you are using a parameter in your function, like for example a, you need to use export=’parameter name’ to make this parameter availabe to all other cores.

You can find more information and examples on the clusterR() function here.

I hope you enjoyed this last tutorial on increasing the raster processing speed with R {raster}. Let me know if you have any questions. See you soon!

Martin

4 Comments

You can post comments in this post.

Really awesome blog post Martin – I have been using your tips in my own raster processing. What I am having a hard time with is processing a bunch of rasters and performing a focal() operation with a large circular radius (5km). This operation just chugs – is there a way to process foca() operations fast? I have been using your 2/3 blog post implementing the for each loop and it works great for my 90m and 1km radius focal() operation… but when the weight is 5km it just takes forever! Any thoughts on this? Again, thanks for your posts, they have been really helpful

Dan 6 years ago Reply

Hey Dan!
Thanks for the feedback! 🙂
Are you using a custom function inside the focal or the default weighted sum?
The default should run quite fast, however when you use a custom function, you need to make sure that it is programmed as efficiently as possible.
Unfortunately its a fact that larger window sizes increase the processing time. I haven’t come up with a solution for this yet. Let me know if you figure something out.

Cheers martin

Martin 6 years ago Reply

[…] with the same number of cells as the input raster object (from ?clusterR). Martin Šiklar has a nice post on increasing the speed of raster processing using […]

clusterR is sweet – but beware when using with predict fxn | Tim Assal 5 years ago Reply
[…] Increasing the speed of {raster} processing with R: Part 3/3 Cluster […]

Raster Parallelization – Avian Ecologist 4 years ago Reply