Hey there rasterR folks!
I love to code with R – most of my time I am spending on processing raster data with the {raster} package, bceause it’s quite intuitive and easy to use compared to other scripting languages. Sometimes however, when I work on larger datasets or when I have to process a lot of files, the sloooooow and long processing times make me wonder if R is always the best choice… Of course it is! You just have to know how to improve the performance of R by either writing more efficient code or cluster and parallelise your R functions. Since this seems to be a big cradle for many R beginners and some advanced coders, I decided to write a tutorial on this topic: It will consist out of three parts, where I will describe how to speed up the processing time of the raster package.
Today we will have a short introduction on the topic and I will show you how you can speed up the processing time by setting some local rasterOptions() in your R session, which will give you a good initial speed boost.
rasterOptions()
The rasterOptions() function from the {raster} package allows you to customize your R session. If you type rasterOptions() with no arguments, you can see the current (default) settings:
> rasterOptions() format : raster datatype : FLT8S overwrite : FALSE progress : none timer : FALSE chunksize : 1e+07 maxmemory : 1e+08 tmpdir : /var/folders/lk/hg070f5n4md9hrchf__th6080000gn/T//RtmpQrCgZD/raster// tmptime : 168 setfileext : TRUE tolerance : 0.1 standardnames : TRUE warn depracat.: TRUE header : none
There are two parameters that have a lot of influence on the performance of the {raster} package: chunksize and maxmemory.
- chunksize: integer. Maximum number of cells to read/write in a single chunk while processing (chunk by chunk) disk based Raster* objects
- maxmemory: integer. Maximum number of cells to read into memory.
The default value of maxmemory is 1e+08. Let´s have a look if and how the processing time of a simple raster calculation changes when we increase the maxmemory limit. To increase the limit, you can simply write:
rasterOptions(maxmemory = 1e+09)
Benchmark
In order to test the influence on the processing time of the maxmemory setting, I loaded a Landsat 8 subset into R and performed a simple calculation: Ten times with a maxmemory limit of 1e+08 and ten times with a limit of 1e+09:
Here is an summary on the raster image that was used for the benchmark:
class : RasterStack dimensions : 3485, 4606, 16051910, 5 (nrow, ncol, ncell, nlayers) resolution : 30, 30 (x, y)
And here the code for the simple (nonsene) raster calulation to evaluate the time:
start <- Sys.time() ras_calc <- ras ^ 2 + ras end <- Sys.time() difftime(end,start)
Result:
The time saved due to the maxmemory limit increas from 1e+08 to 1e+09 was signicant. The lower memory limit (1e+08) needed on average 61 seconds to perform the simple calculation task. The higher limit (1e+09) performed much better and only needed about 35 seconds to perform the same task. You can see the final result in the boxplot below:
I hope this was useful to some of you. The 2nd part of the tutorial on “Increasing the speed of {raster} processing with R” will be on how you can parallelise your R code using the {foreach} and {doParallel} packages.
See you soon!
Martin
4 Comments
You can post comments in this post.
thanks! that made my script not crash on a cluster
fa 7 years ago
Happy to hear! Cheers Martin
Martin 7 years ago
Hi Martin, firstly thanks for sharing useful hints of ‘raster’ R package processing.
I’d like to know if there is a limit of maxmemory. For instance, could we estimate maxmemory limit as a function of the computer ram? What do you think?
Thanks in advance
Jose Lucas 7 years ago
Yes, there is a limit. You can find a detailed answer here: http://stackoverflow.com/questions/38368939/rasteroptions-difference-between-chunksize-and-maxmemory
Martin 7 years ago
Post A Reply