Hey there,
welcome to part 3 of our short introduction to extreme value analysis using the extRemes
package in R.
Having discussed the block maxima method the last time, we will now have a look at the threshold excess method.
According to Coles (2001), the threshold approach is more efficient than the block maxima method if complete (time) series without gaps are available, as all values exceeding a certain threshold can serve as a basis for model fitting. In some cases, fitting distributions to block maxima data is a wasteful approach as only one value per block is used for modelling, while a threshold excess approach potentially provides more information on extremes.
However, analogous to the choice of the block size in the block maxima approach, the selection of the threshold value for partial duration models is also subject to a trade-off between bias (low threshold) and variance (high threshold).
Coles (2001) describes two different methods for threshold selection. Firstly, there is an exploratory approach based on the mean residual life plot. This technique is applied prior to the actual model fitting. Secondly, an alternative approach is to assess the stability of parameter estimates. Hence, this sensitivity analysis for model fitting is performed across a range of different thresholds.
However, selecting an appropriate threshold is probably the most crucial part of performing an extreme value analysis using partial duration series. Scarrott and MacDonald provide a quite good overview of approaches for threshold estimation in their 2012 article A review of extreme value threshold estimation and uncertainty quantification (REVSTAT 10(1): 33–59).
Having found an appropriate threshold, the resulting subset of extreme values exceeding this threshold is used for fitting a generalized Pareto distribution.
According to the Pickands-Balkema-de Haan theorem, the distribution of values exceeding a threshold can be approximated by a generalized pareto distribution.
The following code shows a short practical example of fitting a generalized pareto distribution to a time series of precipitation data using the extRemes
package in R. The sample data set features precipitation data of Oberwang (Salzburg, Austria) from 1981 through 2014 and is is provided by the hydrographic services of Austria via via eHYD. The function read_ehyd()
for importing the data set can be found at Reading data from eHYD using R.
# load packages library(extRemes) library(xts) # get data from eHYD ehyd_url <- "http://ehyd.gv.at/eHYD/MessstellenExtraData/nlv?id=105700&file=2" precipitation_xts <- read_ehyd(ehyd_url) # mean residual life plot: mrlplot(precipitation_xts, main="Mean Residual Life Plot") # The mean residual life plot depicts the Thresholds (u) vs Mean Excess flow. # The idea is to find the lowest threshold where the plot is nearly linear; # taking into account the 95% confidence bounds. # fitting the GPD model over a range of thresholds threshrange.plot(precipitation_xts, r = c(30, 45), nint = 16) # ismev implementation is faster: # ismev::gpd.fitrange(precipitation_xts, umin=30, umax=45, nint = 16) # set threshold th <- 40 # maximum likelihood estimation pot_mle <- fevd(as.vector(precipitation_xts), method = "MLE", type="GP", threshold=th) # diagnostic plots plot(pot_mle) rl_mle <- return.level(pot_mle, conf = 0.05, return.period= c(2,5,10,20,50,100)) # L-moments estimation pot_lmom <- fevd(as.vector(precipitation_xts), method = "Lmoments", type="GP", threshold=th) # diagnostic plots plot(pot_lmom) rl_lmom <- return.level(pot_lmom, conf = 0.05, return.period= c(2,5,10,20,50,100)) # return level plots par(mfcol=c(1,2)) # return level plot w/ MLE plot(pot_mle, type="rl", main="Return Level Plot for Oberwang w/ MLE", ylim=c(0,200), pch=16) loc <- as.numeric(return.level(pot_mle, conf = 0.05,return.period=100)) segments(100, 0, 100, loc, col= 'midnightblue',lty=6) segments(0.01,loc,100, loc, col='midnightblue', lty=6) # return level plot w/ LMOM plot(pot_lmom, type="rl", main="Return Level Plot for Oberwang w/ L-Moments", ylim=c(0,200)) loc <- as.numeric(return.level(pot_lmom, conf = 0.05,return.period=100)) segments(100, 0, 100, loc, col= 'midnightblue',lty=6) segments(0.01,loc,100, loc, col='midnightblue', lty=6) # comparison of return levels results <- t(data.frame(mle=as.numeric(rl_mle), lmom=as.numeric(rl_lmom))) colnames(results) <- c(2,5,10,20,50,100) round(results,1)
This is an example that illustrates nicely why the approach based on L-moments may be preferred over maximum likelihood estimation, as the right image proves clearly that the influence of the outlier is much smaller when using L-moments estimation. In addition to these classical estimation methods, extRemes
offers Generalized Maximum Likelihood Estimation (GMLE, Martins and Stedinger, 2000) and Bayesian estimation methods (Gilleland and Katz, 2016).
Reference: Coles, Stuart (2001). An Introduction to Statistical Modeling of Extreme Values. Springer Series in Statistics. London: Springer.
9 Comments
You can post comments in this post.
Congratulations for this post. It is really helpful.
I am working on extremes in R and Ι have already estimated the parameters (scale, shape) for GEV and Pareto distributions using “MLE” and “L moments” methods. Moreover, I have to calculate the values of shape and scale for GEV distribution, using the Bayesian method. But I can’t estimate these parameters for Pareto distribution using the Bayesian method. More specifically I am writing
fevd(data, threshold, type="GP", method="Bayesian")
and I am taking the following message from R:
Error in colMeans(x$chain.info[, 2:(pdim[2] - 1)], na.rm = TRUE):
.'x' must be an array of at least two dimensions
Could you please help me with my problem?
Thanks in advance!
Georgia L 7 years ago
Hey,
it is a bit difficult to reproduce this error message. I guess that this is not an issue related to the extRemes package, but rather to your input data set.
What do your input data data look like?
A minimal working example for using Bayesian parameter estimation would be:
This takes a couple of seconds, but just works fine.
Regards,
Matthias
Matthias 7 years ago
Hi, thanks for this post. Very clear and simple to understand.
I’m working on extreme event in R using
fevd {extRemes}
. I have a big sample (more than 4 million rows). I’m trying to fit my data but I found a huge AIC and it doesn’t match with the plot. Bigger AIC better fits my data, the return period are empty for the smallest AIC. So my question is: should I select my model using the plots or the AIC?thanks for your help
my R code :
Feriel B 7 years ago
Hi,
without knowing your data set I can only provide the following recommendations:
extRemes
provides several useful options, which can be controlled by the argumentmethod
.
I hope this helps.
Regards,
Matthias
Matthias 7 years ago
Hi Matthias, your posts here are really amazing. Great job! I am also working now on a similar topic, basically on extreme events in R where I should use Block Maxima Method and Peaks Over Thershold Method, and also different distributions to calculate Value at Risk (VaR) and Expected Shortfall (ES) of given data. Could you be able to help me with the codes?
Best regards,
Martina
Martina 6 years ago
please can i get the code on how to draw the peak over threshold image drawn above
Emmanuel Koranteng 5 years ago
How can i read the precipitation value from NetCDF ( TRMM dataset ) file for analysis of Daily Extreme precipitation value.
Pawan 5 years ago
Hello and thank you for your really helpful blog.
How can I get the matrix of the mrplot? The one with the columns of the threshold, number of exceedances, mean excess etc. that is returned invisibly.
Thank you in advance,
Paspati Maria
Maria Paspati 5 years ago
I am working on extremes rainfall in R but I failed to estimate parameters for gamma pareto and gamma generalized pareto distributions using R. Could you please help me with the codes in R studio?
Jean 4 years ago
Post A Reply