welcome to part 3 of our short introduction to extreme value analysis using the
extRemes package in R.
Having discussed the block maxima method the last time, we will now have a look at the threshold excess method.
According to Coles (2001), the threshold approach is more efficient than the block maxima method if complete (time) series without gaps are available, as all values exceeding a certain threshold can serve as a basis for model fitting. In some cases, fitting distributions to block maxima data is a wasteful approach as only one value per block is used for modelling, while a threshold excess approach potentially provides more information on extremes.
However, analogous to the choice of the block size in the block maxima approach, the selection of the threshold value for partial duration models is also subject to a trade-off between bias (low threshold) and variance (high threshold).
Coles (2001) describes two different methods for threshold selection. Firstly, there is an exploratory approach based on the mean residual life plot. This technique is applied prior to the actual model fitting. Secondly, an alternative approach is to assess the stability of parameter estimates. Hence, this sensitivity analysis for model fitting is performed across a range of different thresholds.
However, selecting an appropriate threshold is probably the most crucial part of performing an extreme value analysis using partial duration series. Scarrott and MacDonald provide a quite good overview of approaches for threshold estimation in their 2012 article A review of extreme value threshold estimation and uncertainty quantification (REVSTAT 10(1): 33–59).
Having found an appropriate threshold, the resulting subset of extreme values exceeding this threshold is used for fitting a generalized Pareto distribution.
According to the Pickands-Balkema-de Haan theorem, the distribution of values exceeding a threshold can be approximated by a generalized pareto distribution.
The following code shows a short practical example of fitting a generalized pareto distribution to a time series of precipitation data using the
extRemes package in R. The sample data set features precipitation data of Oberwang (Salzburg, Austria) from 1981 through 2013 and is provided by the hydrographic services of Austria via via eHYD.
# load packages library(extRemes) library(ismev) library(RCurl) # get data from eHYD link <- getURL("http://ehyd.gv.at/eHYD/MessstellenExtraData/nlv?id=105700&file=2") df <- read.csv(text = link, sep=";", dec=",", skip=21, header=FALSE) df[df == "Lücke"] = NA colnames(df) <- c("Date","Precip") df$Date <- as.Date(df$Date, format = "%d.%m.%Y") df$Precip <- as.numeric(gsub(",", ".", as.character(df$Precip))) df <- na.omit(df) head(df) # mean residual life plot: mrlplot(df$Precip, main="Mean Residual Life Plot") # The mean residual life plot depicts the Thresholds (u) vs Mean Excess flow. # The idea is to ﬁnd the lowest threshold where the plot is nearly linear; # taking into account the 95% conﬁdence bounds. # fitting the GPD model over a range of thresholds gpd.fitrange(df$Precip, umin=30, umax=45, nint = 16) # set threshold th <- 40 # maximum likelihood estimation pot.mle <- fevd(df$Precip, method = "MLE", type="GP", threshold=th) plot(pot.mle) # diagnostic plots rl.mle <- return.level(pot.mle, conf = 0.05, return.period= c(2,5,10,20,50,100)) # L-moments estimation pot.lmom <- fevd(df$Precip, method = "Lmoments", type="GP", threshold=th) plot(pot.lmom) # diagnostic plots rl.lmom <- return.level(pot.lmom, conf = 0.05, return.period= c(2,5,10,20,50,100)) # return level plots par(mfcol=c(1,2)) # return level plot w/ MLE plot(pot.mle, type="rl", main="Return Level Plot for Oberwang w/ MLE", ylim=c(0,200), pch=16) loc <- as.numeric(return.level(pot.mle, conf = 0.05,return.period=100)) segments(100, 0, 100, loc, col= 'midnightblue',lty=6) segments(0.01,loc,100, loc, col='midnightblue', lty=6) # return level plot w/ LMOM plot(pot.lmom, type="rl", main="Return Level Plot for Oberwang w/ L-Moments", ylim=c(0,200)) loc <- as.numeric(return.level(pot.lmom, conf = 0.05,return.period=100)) segments(100, 0, 100, loc, col= 'midnightblue',lty=6) segments(0.01,loc,100, loc, col='midnightblue', lty=6) # comparison of return levels res <- t(data.frame(mle=as.numeric(rl.mle), lmom=as.numeric(rl.lmom))) colnames(res) <- c(2,5,10,20,50,100) round(res,1)
This is an example that illustrates nicely why the approach based on L-moments may be preferred over maximum likelihood estimation, as the right image proves clearly that the influence of the outlier is much smaller when using L-moments estimation.
Reference: Coles, Stuart (2001). An Introduction to Statistical Modeling of Extreme Values. Springer Series in Statistics. London: Springer.