## Introduction to Extreme Value Analysis in R – Part 2: Block Maxima Approach  Hey there,

welcome to part 2 of our short introduction to extreme value analysis using the `extRemes` package in R.

As some of you might know, there are two common approaches for practical extreme value analysis. Today, we will focus on the first of these two approaches, called the block maxima method.

This approach for modelling extremes of a (time) series of observations is based on the utilization of maximum or minimum values of these observations within a certain sequence of constant length. For a  sufficiently  large  number  n  of  established  blocks, the  resulting  peak  values of  these  n  blocks of equal  length  can  be  used  for  fitting  a  suitable  distribution  to  these  data.  While  the  block  size  is basically freely selectable, a trade-off has to be made between bias (small blocks) and variance (large blocks). Usually, the length of the sequences is often chosen to correspond to a certain familiar time period, in most cases a year. The resulting vector of annual maxima (or minima, respectively) is calles “Annual Maxima (Minima) Series” or simply AMS.

According to the Fisher–Tippett–Gnedenko theorem, the distribution of block maxima can be approximated by a generalized extreme value distribution.

The following code shows a short practical example of fitting a generalized extreme value distribution to a time series of precipitation data using the `extRemes` package in R. The sample data set features precipitation data of Bärnkopf (Lower Austria) from 1971 through 2014 and is is provided by the hydrographic services of Austria via eHYD. The function `read_ehyd()` for importing the data set can be found at Reading data from eHYD using R.

```# load required packages
library(extRemes)
library(xts)

# get data from eHYD

# derive AMS for maximum precipitation
ams <- apply.yearly(precipitation_xts, max)

# maximum-likelihood fitting of the GEV distribution
fit_mle <- fevd(as.vector(ams), method = "MLE", type="GEV")
# diagnostic plots
plot(fit_mle)
# return levels:
rl_mle <- return.level(fit_mle, conf = 0.05, return.period= c(2,5,10,20,50,100))

# fitting of GEV distribution based on L-moments estimation
fit_lmom <- fevd(as.vector(ams), method = "Lmoments", type="GEV")
# diagnostic plots
plot(fit_lmom)
# return levels:
rl_lmom <- return.level(fit_lmom, conf = 0.05, return.period= c(2,5,10,20,50,100))

# return level plots
par(mfcol=c(1,2))
# return level plot w/ MLE
plot(fit_mle, type="rl",
main="Return Level Plot for Bärnkopf w/ MLE",
ylim=c(0,200), pch=16)
loc <- as.numeric(return.level(fit_mle, conf = 0.05,return.period=100))
segments(100, 0, 100, loc, col= 'midnightblue',lty=6)
segments(0.01,loc,100, loc, col='midnightblue', lty=6)

# return level plot w/ LMOM
plot(fit_lmom, type="rl",
main="Return Level Plot for Bärnkopf w/ L-Moments",
ylim=c(0,200))
loc <- as.numeric(return.level(fit_lmom, conf = 0.05,return.period=100))
segments(100, 0, 100, loc, col= 'midnightblue',lty=6)
segments(0.01,loc,100, loc, col='midnightblue', lty=6)

# comparison of return levels
results <- t(data.frame(mle=as.numeric(rl_mle),
lmom=as.numeric(rl_lmom)))
colnames(results) <- c(2,5,10,20,50,100)
round(results,1)
``` In this case, both results are quite similar. In most cases, L-moments estimation is more robust than maximum likelihood estimation. In addition to these classical estimation methods, `extRemes` offers Generalized Maximum Likelihood Estimation (GMLE, Martins and Stedinger, 2000) and Bayesian estimation methods (Gilleland and Katz, 2016). Moreover, I have made the observation that maximum likelihood estimation works more reliable in other R packages in some cases (e.g. `fExtremes`, `ismev`).

Reference: Coles, Stuart (2001). An Introduction to Statistical Modeling of Extreme Values. Springer Series in Statistics. London: Springer.

Matthias studied Environmental Information Management at the University of Natural Resources and Life Sciences Vienna and holds a PhD in environmental statistics. The focus of his thesis was on the statistical modelling of rare (extreme) events as a basis for vulnerability assessment of critical infrastructure. He is working at the Austrian national weather and geophysical service (ZAMG) and at the Institute of Mountain Risk Engineering at BOKU University. He currently focuses the (statistical) assessment of adverse weather events and natural hazards, and disaster risk reduction. His main interests are statistical modelling of environmental phenomena as well as open source tools for data science, geoinformation and remote sensing.

You can post comments in this post.

• i want a help of you

• Sure. Can you specify your request?

• plz send me the r-coding for fitting generalized pareto distribution,estimating the parameters estimation by mle method

• my research topic is “non stationary frequency analysis of extreme precipitation”.but i did not know my track. how i will proceed my job?

• I also wrote a post about dealing with non-stationarity, you can find further information there:
Introduction to Extreme Value Analysis in R – Part 4: Dealing with trends
The `fevd()` function from the package `extRemes` is quite powerful and well documented, I recommend to have a look at its help page.

• Hi I am currently doing a thesis on Garch-EVT approach to estimate Value at Risk and Expected Shortfall. Can you send me the R procedure for this approach? Thanks

• Hi Gaby,

I have not yet worked with GARCH-EVT-copulas, if that’s what you intend to do. I am pretty sure that there is no specific R package targeted at this topic. So you probably have to split your procedure into several steps and deal with each step individually. Stack overflow is most likely a very good place to get some information and see the workflow of other people working with this topic. For a start, also take a look at `fGarch` (which is part of the `Rmetrics`-suite, just as the excellent package `fExtremes`) or `rugarch` for GARCH modelling.
In addition, you might want to have a look at the package `GEVStableGarch` which seemingly employs maximum likelihood extimation of GARCH models with a GEV distribution, maybe this is sufficient for your needs.

Regards,
Matthias

• I’ll just be working with garch and evt. The procedure consists of applying a garch model to the return series and then extract the residuals from the model. Then we apply evt on the residuals extracted from the garch model. However I am having difficulties selecting the threshold from the residuals so as to apply EVT-POT or block maxima method to it. Can you help me?

Regards
Gaby

• I’m shocked that I found this info so easily.

• Hi

Your website is very useful and make this topic easy to follow

How did you change your points on your return level plots to green?

• Hi,

unfortunately, the plotting methods for objects of type `fevd` are – albeit comprehensive – hardly customizable. Since I didn’t like the default plotting, I modified the plotting functions `plot.fevd.mle` and `plot.fevd.lmoments` to support better visualization options within the `extRemes` framework.

Regards,
Matthias

• Matthias,

i am still learning phase of R.

Would you be able to provide me with some guidance on how to get this done. Thanks

• Hi Misha,

I have written the following function for creating a simple return level plot (base graphics) from `fevd` objects that have been fitted of type GEV.
You can control both the color of points and the lines of the fitted function as well as the number of ticks on your y-axis.

```rlplot_gev <- function(fevd_obj,
pointcolor = "firebrick",
linecolor = "darkgreen",
yticks = seq(0,225,25)){
rperiods = c(1+1e-15, 1.25, 1.5, 2, 3,4,5, 7.5, 10, 15, 20,
30, 40, 50, 60, 75, 80, 100, 120, 200, 250)
model <- fevd_obj\$type
if(model != "GEV"){
stop("this plotting function only works for GEV")
}
pars <- fevd_obj\$results\$par
loc <- pars["location"]
scale <- pars["scale"]
shape <- pars["shape"]
xrl <- -1/(log(1 - 1/rperiods))
yrl <- rlevd(rperiods, loc = loc, scale = scale,
shape = shape)

# empty plot
plot(xrl, yrl, type = "n", log = "x", xlab = "Return Period",
ylab = "Return Level", lwd = 2, yaxt = "n", xlim = c(1,200),
main = "Return Level Plot")
par(las = 1)
axis(side = 2, at = yticks)

# background grid
abline(h = yticks,col = "lightgrey", lty = 2)
abline(v = c(1,2,5,10,20,50,100,200), col = "lightgrey", lty = 2)

# plot line
lines(xrl, yrl, col = linecolor, lwd = 2)

# ci lines for MLE base plot
bds <- ci(fevd_obj, return.period = rperiods)
lines(xrl, bds[, 1], col = linecolor, lty = 4)
lines(xrl, bds[, 3], col = linecolor, lty = 4)

# Weibull plotting position of points
n <- fevd_obj\$n
xp <- ppoints(n = n, a = 0)
ytmp <- datagrabber(fevd_obj)
y <- c(ytmp[, 1])
points(-1/log(xp), sort(y), pch = 16, col = pointcolor)
}
```

Please note that this is a very simple approach, I'm mainly re-using code from `plot.fevd.mle()`.
In order to create a similar plot for the threshold excess approach, you would have to make slight adjustments to parameters of the GP and to the plotting positions of the extreme values:

```sdat <- sort(y)
points(-1/log(xp2)[sdat > u]/x\$npy, sdat[sdat > u], pch="+", col = pointcolor)
```

Using `rlplot_gev(mlefit, pointcolor = "darkblue")` on the code of my post should yield return level plots similar to the default `plot(mlefit, type = "rl", xlim = c(1, 200), ylim = c(25, 225))`

Regards,
Matthias

• Hey!

I am currently writing my Bachelor Thesis on evds.

Would it be okay for you if I used your code as an example?

• Hi Lukas,

yes, sure. However, if you have enough time it might be worth to have a deeper look at the packages presented above.
This post is really meant as an introduction to the topic.

Regards,
Matthias

• When modelling the effects of extreme events (say of variable X) on volatility of prices of a commodity (say Y), is possible to fit GEV to X and then insert it as an exogenous variable in mean or volatility equation? If not, how can I approach this issue?

• Hi,

I am not really sure if I understand what you are intending to do.
What is the question you are trying to answer?
Modelling the influence of X on Y sounds more like a standard regression model in the first place.

• Hello!
I have a question:
How can I use R and the Gumbel distribution to predict discharge on specific return periods?
And how do I know the return period?
Thank you!

• Hi,

basically, you can stick to my example code above.
If you specifically want to fit a Gumbel distribution, simply set the `type` argument in the `fevd()` function to `type="Gumbel"`.
Return levels can be obtained by using the function `return.level()`.

Regards,
Matthias

• Hi Matthias, I have a question too:
I want to use Block maxima Method to estimate Value at risk. How can I do this using EVT BMM? From Generalized Extreme Value distribution I need to estimate the shape, scale and loval parameters, but which function in R delivers this result? And I should use different blocks in the length of “months”, “quartal” and “6 months”. And after that to estimate with the help of the parameters the Value at Risk.

Regards,
Martina

• Hi Matthias,

Thanks so much for this post, it’s helped me so much with my project.

Do you know what are the possible reasons why ML estimation might work for some sample data but not others? As I have found this during my project, and have had to apply L-moments instead.

Many thanks,
Peter

• Hi Peter,

apart from the fact that it is of course possible that MLE may fail to converge (due to the existence of an unbounded likelihood function), you might want to try the implementations of either `fExtremes` or `ismev`. I have found that they may produce sensible MLE estimates in case the `extRemes` implementation fails. I have not yet found the leisure to check the `fevd()`function thoroughly in this respect, since it is a real monster of a function, which is written in an overly way complex way, imho.

Best regards,
Matthias

• Okay, I don’t think it is due to the existence of an unbounded likelihood func, but L-moments estimation should be fine for my project anyway.

Thanks again, Peter

• Hi Matthias,
I’ve tried to make a “similar return level plot for the threshold excess approach” (see your comment above concerning `sdat < - sort(y)` and so on), but it does not work. Would you be so kind and give me some advice?
Thanks,
Tilo

• Hi Tilo,
basically, all you need is a slight modification of the above function. I will post my somewhat hacky solution using base graphics in the next couple of days. I always wanted to come up with a ggplot solution for return level plots, but I haven’t found time so far to implement it.

• Hey, I’m very excited I found this,
however I cant seem to download the sample data set, keep getting a

`Error during wrapup: scan() expected 'a real', got 'LÃ¼cke'`

when running `read_ehyd(ehyd_url)` using code you provided at http://www.gis-blog.com/read-ehyd/

I also tried using Rcurl (that one works, however I also get the error

```Error in if (ipars2["scale"] <= 0) ipars2["scale"] <- 1e-08 :
missing value where TRUE/FALSE needed
1: In mean.default(x) : argument is not numeric or logical: returning NA
2: In var(x) : NAs introduced by coercion
```

Which is the same when I'm using my own dataset, I'm guessing its a formatting issue on my end, maybe if you could provide just a sample of what the xts object is supposed to look like or code that formats dataset in the correct format I should be able to figure it out hopefully, I will be reading your awesome R-Help documentation and examples while I look forward to your reply.

Cameron Samson 4 years ago Reply

• I found a solution from using a An Introduction to Extreme Value Statistics by Marielle Pinheiro and Richard Grotjahn. (http://grotjahn.ucdavis.edu/EWEs/extremes_primer_v9_22_15.pdf) its based on the Extremes package and has a intro to the package and how to format it,

In the end it was data format and I managed to get it to work.

Thanks again for the blog though it was helpful,

Cameron

• Hi Cameron,

the problem you describe is due to an encoding issue. The data files contain the German word `Lücke’ (meaning `gap’ or `blank space’), which indicates an NA-value.
The trouble is caused by the Umlaut ü — you can find some information on that issue at the Mojibake article on Wikipedia.
I have set the target file encoding in the `read_ehyd()`-function to `latin1`, this should fix the issue.
The code runs fine on RStudio using Debian derivatives, but I think there might be some encoding issues with R on Windows machines, if the default encoding for the RStudio editor is windows-1252 instead of UTF-8.

Regards,
Matthias

• Hellow, am currently working on Gumbel and weighted inverse Rayleigh distribution. so using flood data i want to find out the return level through r for the given data. sir, please provide me r package and code please
Regards,

• Hey, thanks a lot for the post, it was quite useful!

Juan Ossa 3 years ago Reply

• Hi Matthias,
I want a return level plot, where the x-axis is turned by 180° or something like that, so that i can do the plot for low discharge events. Lower values should have a high return level. Is there a way to do this?

Katharina Müller 2 years ago Reply

• Hi Katharina,

this use case prominently occurs is some research areas, for instance when working with droughts. You might want to have a look at the `lfstat`package in this context.

•  