Increasing the speed of {raster} processing with R: Part 3/3 Cluster

Hey!

Today I am going to finish the series on how to increase the speed of processing raster images with R. In the last two posts I talked about changing the rasterOptions() and about parallelisation using foreach(). Both of these options migth give you a decent speed boost and decrese your processing time. In this post, I will talk about processing in a cluster with the help of the clusterR function from the {raster} package.

clusterR is a flexible interface for using cluster with other functions. This function only works with functions that have a Raster* object as first argument and that operate on a cell by cell basis (i.e., there is no effect of neigboring cells) and return an object with the same number of cells as the input raster object. ( from ?clusterR)

So what this function does is, that it outsources a process that normally runs on a single core to n other cores specified by the user. Thies yields a major speed boost and minimises your processing time. Cluster works very well with the following (raster) functions: calc(), overlay() or predict(). However, it does not work with merge(), crop(), mosaic(), (dis)aggregate(), resample(), projectRaster(), focal(), distance(), buffer,() direction().

Example without clusterR

Let’s say ras is our rasterstack and we want to calculate the mean of every pixel in the stack. This is how you would do it with calc() using a single core:

ras.mean <- calc(ras, mean, na.rm=T)

Example using clusterR

Now imagine that its a big raster with a lot of layers. This operation might take foreeeever to finish. Here the clusterR() function comes in handy:

beginCluster(4)

ras.mean <- clusterR(ras, calc, args=list(mean, na.rm=T))

endCluster()

The command beginCluster(4) initialises the cluster on your PC and assigns 4 cores to the process. (Note that you are limited by the number of cores on your pc. If you use too many, your PC might crash.)

Then the magic happens: clusterR () calls the function calc() which calls the fucntion mean(). Note that the function that is called by calc() has to be written in a list().

After the process is finished, close the cluster with endCluster().

Use of clusterR and calc with an user defined function

You can also combine the clusterR() and calc() function with your custom function:

a <- 10
f3 <- function(x) sum(x)+a
beginCluster(4)
z1 <- clusterR(s, calc, args=list(fun=f3), export='a')
endCluster()

When f3() is your custom function, simply plug it into the args argument inside a list. If you are using a parameter in your function, like for example a, you need to use export=’parameter name’ to make this parameter availabe to all other cores.

You can find more information and examples on the clusterR() function here.

I hope you enjoyed this last tutorial on increasing the raster processing speed with R {raster}. Let me know if you have any questions. See you soon!

Martin

About This Author

Martin was born in Czech Republic and studied at the University of Natural Resources and Life Sciences, Vienna. He is currently working at an Earth Observation Company in Austria, specialised in Land Monitoring. His main interests are: Open-source applications like R, (geospatial) statistics and data-management, web-mapping and visualization. He loves travelling, geocaching, photography and TV series.

Post A Reply

*