Unsupervised kMeans classification of satellite imagery using R

Lake Neusiedl as seen by Sentinel-2

Hey folks!
Today I am going to show you how to perform a very basic kMeans unsupervised classification of satellite imagery using R. We will do this on a small subset of a Sentinel-2 image.

Sentinel-2 is a satellite launched by the European Space Agency and its data is freely accessible for example here.

The image I am going to use is showing the northern part of the Lake Neusiedl (east of Vienna, Austria). The region is famous for saling, good wine and its beautiful nature (reeds and protected wetlands under the Ramsar convetion). I will be using bands 2 – blue, 3 – green, 4 – red and 8 – near infra red for the classification. Here is how the image looks like – on the left you can see the true color composite (432) and on the right a false color near infrared composite (843):

Lake Neusiedl as seen by Sentinel-2

Lake Neusiedl as seen by Sentinel-2. The lake is surrounded by a reed belt and intensive agricultural activity. In the north-west of the image a deciduous forest can be seen.

Please note that you can use any other image of your choice to perform the classification. The code will stay the same!

Load the image into R:

To perform the unsupervised image classification we first need to load the image into R. This is as simple as these two lines:

library(raster) #load raster package
image <- stack("path/To/YourImage/stack.tif)

Voila! Our image is in R! If you don’t have the raster package installed, please execute “install.packages(“raster”) first.

Classify!

kMeans unsupervised classification can sound very confusing and hard if you have never classified an image before or if you are new to machine learning. No worries! You will actually only need about 3-4 lines of code and were are done 🙂 All we need is the ‘kMeans’ function. We will need to specify the number of classes we want to “detect” in the image and the function will take care of the rest. It iteratively goes through the images and looks for so called clusters (=spectrally similar areas which constitute a land cover class). In my case I want to detect six classes, let’s see how the unsupervised classification will perform:

#execute the kMeans function on the image values (indicated by the squared bracket) 
#and search for 6 clusters (centers = 6)
kMeansResult <- kmeans(image[], centers=6)

#create a dummy raster using the first layer of our image 
#and replace the values of the dummy raster with the clusters (classes) of the kMeans classification
result <- raster(image[[1]])
result <- setValues(result, kMeansResult$cluster)

#plot the result
plot(result)

This is how the result looks like:

kMeans unsupervised classifcation unsing R

kMeans unsupervised classifcation unsing R

 

The default visualisation is not the best I have ever seen, but okay… it’s a good start. You can see six different colors each corresponding to a spectrally similar area.

The first and necessary step after an unsupervised kMeans classification is to assign class names to the detected clusters by the algorithm. Let’s have look at our classified image and compare it to our satellite scene from above, this will help us to assign names to the detected clusters/classes:

  • 1 and 2 indicates reeds or forest
  • Water is classified as 3
  • Agriculture is roughly depicted by 4,5 and 6.

Let’s change the coloring of our plot and see how it looks like:

plot(result, col=c("darkgreen", "darkgreen","blue",
 "orange", "orange","orange"))
Unsupervised kMeans classification with class coloring

Unsupervised kMeans classification with class coloring. Darkgreen indicates reeds and forest, orange depicst agriculture and water is shown in blue.

Conclusion

We can see that water is very well captured by the classifier and agriculture is also detecetd quite well. Reeds (surrounding the lake) and forest (north-west of the lake) however seem to be mixed and we can see a big confusion inbetween those two classes. The unsupervised kMeans classifier is a fast and easy way to detect patterns inside an image and is usually used to make a first raw classification. It is popular due of its good performance and widely used because no sample points are needed for its application (as opposed to a supervised classification). My intention however was to detect six different classes and the algorithm was only able to distinguish roughly three. By increasing the “centers” parameter it might be possible to detect more classes. A second option on how to make a better classification would be adding more bands as an input (for example the red-edge  or short wave infrared Sentinel-2 bands). In an upcoming post however I will show you how to perform a supervised classification (with sample points) using R and then we will compare both results.

Cheers

Martin

 

About This Author

Martin was born in Czech Republic and studied at the University of Natural Resources and Life Sciences, Vienna. He is currently working at GeoVille - an Earth Observation Company based in Austria, specialised in Land Monitoring. His main interests are: Open-source applications like R, (geospatial) statistics and data-management, web-mapping and visualization. He loves travelling, geocaching, photography and sports.

2 Comments

You can post comments in this post.


  • Thank you for this. code. could you please write the code for the unsupervised classification using maximum likelihood. I want to work on the Land use Land cover and its change detection. i want to do this using R code. you will be the coauthor of this paper.

    Muhammad Waqas 7 months ago Reply


Post A Reply

*