Increasing the speed of {raster} processing with R: Part 1/3

Hey there rasterR folks!
I love to code with R – most of my time I am spending on processing raster data with the {raster} package, bceause it’s quite intuitive and easy to use compared to other scripting languages. Sometimes however, when I work on larger datasets or when I have to process a lot of files, the sloooooow and long processing times make me wonder if R is always the best choice… Of course it is! You just have to know how to improve the performance of R by either writing more efficient code or cluster and parallelise your R functions. Since this seems to be a big cradle for many R beginners and some advanced coders, I decided to write a tutorial on this topic: It will consist out of three parts, where I will describe how to speed up the processing time of the raster package.

Today we will have a short introduction on the topic and I will show you how you can speed up the processing time by setting some local rasterOptions() in your R session, which will give you a good initial speed boost.

rasterOptions()

The rasterOptions() function from the {raster} package allows you to customize your R session. If you type rasterOptions() with no arguments, you can see the current (default) settings:

> rasterOptions()
format        : raster
datatype      : FLT8S
overwrite     : FALSE
progress      : none
timer         : FALSE
chunksize     : 1e+07
maxmemory     : 1e+08
tmpdir        : /var/folders/lk/hg070f5n4md9hrchf__th6080000gn/T//RtmpQrCgZD/raster//
tmptime       : 168
setfileext    : TRUE
tolerance     : 0.1
standardnames : TRUE
warn depracat.: TRUE
header        : none

There are two parameters that have a lot of influence on the performance of the {raster} package: chunksize and maxmemory.

  • chunksize: integer. Maximum number of cells to read/write in a single chunk while processing (chunk by chunk) disk based Raster* objects
  • maxmemory: integer. Maximum number of cells to read into memory.

The default value of maxmemory is 1e+08. Let´s have a look if and how the processing time of a simple raster calculation changes when we increase the maxmemory limit. To increase the limit, you can simply write:

rasterOptions(maxmemory = 1e+09)

Benchmark

In order to test the influence on the processing time of the maxmemory setting, I loaded a Landsat 8 subset into R and performed a simple calculation: Ten times with a maxmemory limit of 1e+08 and ten times with a limit of 1e+09:

Here is an summary on the raster image that was used for the benchmark:

class       : RasterStack
dimensions  : 3485, 4606, 16051910, 5  (nrow, ncol, ncell, nlayers)
resolution  : 30, 30  (x, y)

And here the code for the simple (nonsene) raster calulation to evaluate the time:

start <- Sys.time()
ras_calc <- ras ^ 2 + ras
end <- Sys.time()
difftime(end,start)

Result:

The time saved due to the maxmemory limit increas from 1e+08 to 1e+09 was signicant. The lower memory limit (1e+08) needed on average 61 seconds to perform the simple calculation task. The higher limit (1e+09) performed much better and only needed about 35 seconds to perform the same task. You can see the final result in the boxplot below:

Maxmemory Benchmark
I hope this was useful to some of you. The 2nd part of  the tutorial on “Increasing the speed of {raster} processing with R” will be on how you can parallelise your R code using the {foreach} and {doParallel} packages.
See you soon!

Martin

About This Author

Martin was born in Czech Republic and studied at the University of Natural Resources and Life Sciences, Vienna. He is currently working at an Earth Observation Company in Austria, specialised in Land Monitoring. His main interests are: Open-source applications like R, (geospatial) statistics and data-management, web-mapping and visualization. He loves travelling, geocaching, photography and TV series.

4 Comments

You can post comments in this post.


  • thanks! that made my script not crash on a cluster

    fa 11 months ago Reply


    • Happy to hear! Cheers Martin

      Martin 11 months ago Reply


  • Hi Martin, firstly thanks for sharing useful hints of ‘raster’ R package processing.

    I’d like to know if there is a limit of maxmemory. For instance, could we estimate maxmemory limit as a function of the computer ram? What do you think?

    Thanks in advance

    Jose Lucas 4 months ago Reply


Post A Reply

*