Today I am going to show you how to perform a very basic kMeans unsupervised classification of satellite imagery using R. We will do this on a small subset of a Sentinel-2 image.
Sentinel-2 is a satellite launched by the European Space Agency and its data is freely accessible for example here.
The image I am going to use is showing the northern part of the Lake Neusiedl (east of Vienna, Austria). The region is famous for saling, good wine and its beautiful nature (reeds and protected wetlands under the Ramsar convetion). I will be using bands 2 – blue, 3 – green, 4 – red and 8 – near infra red for the classification. Here is how the image looks like – on the left you can see the true color composite (432) and on the right a false color near infrared composite (843):
Please note that you can use any other image of your choice to perform the classification. The code will stay the same!
Load the image into R:
To perform the unsupervised image classification we first need to load the image into R. This is as simple as these two lines:
library(raster) #load raster package image <- stack("path/To/YourImage/stack.tif)
Voila! Our image is in R! If you don’t have the raster package installed, please execute “install.packages(“raster”) first.
kMeans unsupervised classification can sound very confusing and hard if you have never classified an image before or if you are new to machine learning. No worries! You will actually only need about 3-4 lines of code and were are done 🙂 All we need is the ‘kMeans’ function. We will need to specify the number of classes we want to “detect” in the image and the function will take care of the rest. It iteratively goes through the images and looks for so called clusters (=spectrally similar areas which constitute a land cover class). In my case I want to detect six classes, let’s see how the unsupervised classification will perform:
#execute the kMeans function on the image values (indicated by the squared bracket) #and search for 6 clusters (centers = 6) kMeansResult <- kmeans(image, centers=6) #create a dummy raster using the first layer of our image #and replace the values of the dummy raster with the clusters (classes) of the kMeans classification result <- raster(image[]) result <- setValues(result, kMeansResult$cluster) #plot the result plot(result)
This is how the result looks like:
The default visualisation is not the best I have ever seen, but okay… it’s a good start. You can see six different colors each corresponding to a spectrally similar area.
The first and necessary step after an unsupervised kMeans classification is to assign class names to the detected clusters by the algorithm. Let’s have look at our classified image and compare it to our satellite scene from above, this will help us to assign names to the detected clusters/classes:
- 1 and 2 indicates reeds or forest
- Water is classified as 3
- Agriculture is roughly depicted by 4,5 and 6.
Let’s change the coloring of our plot and see how it looks like:
plot(result, col=c("darkgreen", "darkgreen","blue", "orange", "orange","orange"))
We can see that water is very well captured by the classifier and agriculture is also detecetd quite well. Reeds (surrounding the lake) and forest (north-west of the lake) however seem to be mixed and we can see a big confusion inbetween those two classes. The unsupervised kMeans classifier is a fast and easy way to detect patterns inside an image and is usually used to make a first raw classification. It is popular due of its good performance and widely used because no sample points are needed for its application (as opposed to a supervised classification). My intention however was to detect six different classes and the algorithm was only able to distinguish roughly three. By increasing the “centers” parameter it might be possible to detect more classes. A second option on how to make a better classification would be adding more bands as an input (for example the red-edge or short wave infrared Sentinel-2 bands). In an upcoming post however I will show you how to perform a supervised classification (with sample points) using R and then we will compare both results.