## Benchmark: R vs. Python Rasterio vs. GDAL

Hey there!
Today I would like to do a small comparsion between three raster processing tools:
R, Python (rasterio) and GDAL.
When I started with GIS and raster processing, I didn’t really pay much attention to the performance of the scripts I wrote and tools I used. But now, working with bigger data, higher spatial resolution, a fast processing of my raster files is the key to success. Therefore I will compare the performance of three raster processing tools for you, namely on a simple NDVI calculation using a Landsat 8 subset:

### Data:

First, lets have a look at the data I’ve used for the benchmark: It’s a standard Landsat 8 image downloaded from the USGS EarthExplorer Webpage.
It has spatial resolution of 30mx30m and the dimensions of the subset are 3485×4606 pixels. The scene depicts the Lake Baikal in Siberia, Russia:

### Tested processing tools:

The processing tools I have tested in this benchmark are the following: R, Python rasterio and GDAL.

#### R

R is a free software environment for statistical computing and graphics. It compiles and runs on a wide variety of UNIX platforms, Windows and MacOS. Here I focused on the {raster} package that is widely used for the handling of raster files in R. [read more here]

I tested two algorithms:

• The first using the overlay() function from the raster package. This functions enables users to perform raster algebra with two or more input bands. This is necessary for the calculation of the NDVI.
• A second one, using standard R syntax.

You can have a look at both code snippets below:

```library(raster)

#----------1. ALGORITHM - RASTER PACKAGE with OVERLAY FUNCTION -------------
ras <- stack("baikal_subset.tif")

ndvi <- overlay(ras, fun=function(x){(x[1]-x[2])/(x[1]+x[2])})

writeRaster(ndvi, "ndvi_subset_raster.tif",
datatype="FLT8S",
options=c("compress=lzw"),
overwrite=T)

#---------- 2. ALGORITHM - RASTER with STANDARD R SYNTAX -----------------
ras <- stack("baikal_subset.tif")

b1 <- ras[[1]]
b2 <- ras[[2]]

ndvi = (b1-b2)/(b1+b2)

writeRaster(ndvi, "ndvi_subset_R.tif",
datatype="FLT8S",
options=c("compress=lzw"),
overwrite=T)```

#### Python Rasterio

Rasterio employs GDAL under the hood for file I/O and raster formatting. Its functions typically accept and return Numpy ndarrays. Rasterio is designed to make working with geospatial raster data more productive and more fun.  [read more here]

You can have a look at the code below:

```import rasterio
import numpy

with rasterio.drivers():
with rasterio.open('baikal_subset.tif') as src:
b1, b2, b3, b4, b5 = src.read()

profile = src.profile
profile.update(
dtype=rasterio.float64,
count=1,
compress='lzw')

ndvi = numpy.zeros(b1.shape)
ndvi = (b1-b2)/(b1+b2)

with rasterio.open('ndvi_python.tif', 'w', **profile) as dst:
dst.write(ndvi.astype(rasterio.float64), 1)```

#### GDAL

The last tool I used is GDAL and the function gdal_calc.py:

GDAL is a translator library for raster and vector geospatial data formats that is released under an X/MIT style Open Source license by the Open Source Geospatial Foundation. As a library, it presents a single raster abstract data model and vector abstract data model to the calling application for all supported formats. It also comes with a variety of useful command line utilities for data translation and processing. The NEWS page describes the January 2016 GDAL/OGR 2.0.2 release. [read more here]

### Benchmark Results

For the benchmark I ran all the scripts described above 5 times each and got the following very interesting result:

The fastest processing was done by GDAL with a mean value of 5.8seconds. The python rasterio package was slightly slower with an average processing time of 7.95 seconds. Unfortunately my favourite software R was the slowest: using the standard R syntax an average processing time of 15.3 seconds was recorded and what was shocking (!) to me: the overlay() function from the raster package needed 105.84 seconds on average to calculate the NDVI from the Landsat 8 subset.

Note: This benchmark was carried out on a MacBook Air with 1,8 GHz Intel Core i5 and 8GB 1600 MHz DDR3 with OS X El Capitan. Here my version numbers: R 3.2.3, Python 2.7, GDAL 1.11.3 and Rasterio 0.31.0.

Let me know if you have any questions.

Cheers

Martin

Martin was born in Czech Republic and studied at the University of Natural Resources and Life Sciences, Vienna. He is currently working at GeoVille - an Earth Observation Company based in Austria, specialised in Land Monitoring. His main interests are: Open-source applications like R, (geospatial) statistics and data-management, web-mapping and visualization. He loves travelling, geocaching, photography and sports.

You can post comments in this post.

• Thanks Martin for your speed testing. The experiences I have made with the raster package in R confirm that in some way it is very slow and often unstable.
Now, why is GDAL the winner? imho because it is written in C. Do you agree?

• Dear Simon,
I am happy to hear from you. Yes, as you already pointed out it’s due to the fact that GDAL is written in C / Python which makes it much faster. I recently stumbled upon the possibility of parallelisation with R, this increases the performance of many processing tasks enormously. You should definitely check out my posts on that topic.
Hope you are doing fine.
Cheers Martin