Hey there,

welcome to part 3 of our short introduction to extreme value analysis using the `extRemes`

package in R.

Having discussed the block maxima method the last time, we will now have a look at the threshold excess method.

According to Coles (2001), the threshold approach is more efficient than the block maxima method if complete (time) series without gaps are available, as all values exceeding a certain threshold can serve as a basis for model fitting. In some cases, fitting distributions to block maxima data is a wasteful approach as only one value per block is used for modelling, while a threshold excess approach potentially provides more information on extremes.

However, analogous to the choice of the block size in the block maxima approach, the selection of the threshold value for partial duration models is also subject to a trade-off between bias (low threshold) and variance (high threshold).

Coles (2001) describes two different methods for threshold selection. Firstly, there is an exploratory approach based on the mean residual life plot. This technique is applied prior to the actual model fitting. Secondly, an alternative approach is to assess the stability of parameter estimates. Hence, this sensitivity analysis for model fitting is performed across a range of different thresholds.

However, selecting an appropriate threshold is probably the most crucial part of performing an extreme value analysis using partial duration series. Scarrott and MacDonald provide a quite good overview of approaches for threshold estimation in their 2012 article *A review of extreme value threshold estimation and uncertainty quantification* (REVSTAT 10(1): 33–59).

Having found an appropriate threshold, the resulting subset of extreme values exceeding this threshold is used for fitting a generalized Pareto distribution.

According to the Pickands-Balkema-de Haan theorem, the distribution of values exceeding a threshold can be approximated by a generalized pareto distribution.

The following code shows a short practical example of fitting a generalized pareto distribution to a time series of precipitation data using the `extRemes`

package in R. The sample data set features precipitation data of Oberwang (Salzburg, Austria) from 1981 through 2013 and is provided by the hydrographic services of Austria via via eHYD.

# load packages library(extRemes) library(ismev) library(RCurl) # get data from eHYD link <- getURL("http://ehyd.gv.at/eHYD/MessstellenExtraData/nlv?id=105700&file=2") df <- read.csv(text = link, sep=";", dec=",", skip=21, header=FALSE) df[df == "Lücke"] = NA colnames(df) <- c("Date","Precip") df$Date <- as.Date(df$Date, format = "%d.%m.%Y") df$Precip <- as.numeric(gsub(",", ".", as.character(df$Precip))) df <- na.omit(df) head(df) # mean residual life plot: mrlplot(df$Precip, main="Mean Residual Life Plot") # The mean residual life plot depicts the Thresholds (u) vs Mean Excess flow. # The idea is to ﬁnd the lowest threshold where the plot is nearly linear; # taking into account the 95% conﬁdence bounds. # fitting the GPD model over a range of thresholds gpd.fitrange(df$Precip, umin=30, umax=45, nint = 16) # set threshold th <- 40 # maximum likelihood estimation pot.mle <- fevd(df$Precip, method = "MLE", type="GP", threshold=th) plot(pot.mle) # diagnostic plots rl.mle <- return.level(pot.mle, conf = 0.05, return.period= c(2,5,10,20,50,100)) # L-moments estimation pot.lmom <- fevd(df$Precip, method = "Lmoments", type="GP", threshold=th) plot(pot.lmom) # diagnostic plots rl.lmom <- return.level(pot.lmom, conf = 0.05, return.period= c(2,5,10,20,50,100)) # return level plots par(mfcol=c(1,2)) # return level plot w/ MLE plot(pot.mle, type="rl", main="Return Level Plot for Oberwang w/ MLE", ylim=c(0,200), pch=16) loc <- as.numeric(return.level(pot.mle, conf = 0.05,return.period=100)) segments(100, 0, 100, loc, col= 'midnightblue',lty=6) segments(0.01,loc,100, loc, col='midnightblue', lty=6) # return level plot w/ LMOM plot(pot.lmom, type="rl", main="Return Level Plot for Oberwang w/ L-Moments", ylim=c(0,200)) loc <- as.numeric(return.level(pot.lmom, conf = 0.05,return.period=100)) segments(100, 0, 100, loc, col= 'midnightblue',lty=6) segments(0.01,loc,100, loc, col='midnightblue', lty=6) # comparison of return levels res <- t(data.frame(mle=as.numeric(rl.mle), lmom=as.numeric(rl.lmom))) colnames(res) <- c(2,5,10,20,50,100) round(res,1)

This is an example that illustrates nicely why the approach based on L-moments may be preferred over maximum likelihood estimation, as the right image proves clearly that the influence of the outlier is much smaller when using L-moments estimation.

*Reference: Coles, Stuart (2001). An Introduction to Statistical Modeling of Extreme Values. Springer Series in Statistics. London: Springer.*

## 4 Comments

You can post comments in this post.

Congratulations for this post. It is really helpful.

I am working on extremes in R and Ι have already estimated the parameters (scale, shape) for GEV and Pareto distributions using “MLE” and “L moments” methods. Moreover, I have to calculate the values of shape and scale for GEV distribution, using the Bayesian method. But I can’t estimate these parameters for Pareto distribution using the Bayesian method. More specifically I am writing

`fevd(data, threshold, type="GP", method="Bayesian")`

and I am taking the following message from R:

`Error in colMeans(x$chain.info[, 2:(pdim[2] - 1)], na.rm = TRUE):`

.'x' must be an array of at least two dimensions

Could you please help me with my problem?

Thanks in advance!

Georgia L 9 months ago

Hey,

it is a bit difficult to reproduce this error message. I guess that this is not an issue related to the extRemes package, but rather to your input data set.

What do your input data data look like?

A minimal working example for using Bayesian parameter estimation would be:

This takes a couple of seconds, but just works fine.

Regards,

Matthias

Matthias 6 months ago

Hi, thanks for this post. Very clear and simple to understand.

I’m working on extreme event in R using

`fevd {extRemes}`

. I have a big sample (more than 4 million rows). I’m trying to fit my data but I found a huge AIC and it doesn’t match with the plot. Bigger AIC better fits my data, the return period are empty for the smallest AIC. So my question is: should I select my model using the plots or the AIC?thanks for your help

my R code :

Feriel B 2 months ago

Hi,

without knowing your data set I can only provide the following recommendations:

`extRemes`

provides several useful options, which can be controlled by the argument`method`

`.`

I hope this helps.

Regards,

Matthias

Matthias 2 months ago

## Post A Reply