Nearest Neighbour Search for Spatial Points in R

Dear all,

I just came across a very useful, fast and efficient function for matching points in one data frame to their nearest neighbours in another data frame. Of course, I’d like to share my newly acquired pearls of wisdom with you.

I’d like to outline the problem definition by providing a specific example of application: I wanted to match numerous GPS-tracks (about 200 GPX files indicating bike routes in Austria) to an underlying road graph covering all roads in Austria. Having read the track points from the GPX files, I had two simple feature collections of geometry type POINT I wanted to match:

  • object road_graph, which is an sf object containing a graph of the Austrian road network (frc between 000 and 005) in intervals of 50 meters (coordinates refer to the mean of each segment)
  • object bike_graph, which is an sf object containing the gps waypoints of the bike tracks

Basically, one could use the function snapPointsToLines() from the packagemaptools or the rgeos implementation of gDistance() to perform this task. However, these functions are extremely inefficient if you have large data sets, since you have to calculate all distances between all possible pairs of points and subsequently select the nearest point based on the minimum distance.

This is where the function nn2() from the package RANN comes into play.

# libraries

# get coordinate matrices
bike_coords <-, st_geometry(bike_graph))
bike_coords <- cbind(moto_coords, 1:nrow(bike_coords))
graph_coords <-, st_geometry(road_graph))

# fast nearest neighbour search
closest <- nn2(bike_coords[,1:2], graph_coords, k = 1, searchtype = "radius", radius = 0.001)
closest <- sapply(closest, cbind) %>% as_tibble

# create logical vector indicating bike routes and add it to the road graph
road_graph$bikeroute <-ifelse(closest$nn.idx == 0, FALSE, TRUE)

# define smoother function via run length encoding
track_smoother <- function(route, smooth_length=100){
  r <- rle(route)
  index <- r$lengths < smooth_length
  r$values[index] <- 1

# apply smoother on all tracks across all roads
road_graph <- road_graph %>%
  group_by(road) %>% 
  mutate(bikeroute_smooth = track_smoother(bikeroute))

Some explanatory remarks on the nn2() function:

  • The function uses a kd-tree to find the k number of near neighbours for each point. Specifying k = 1 yields only the ID of the nearest neighbor.
  • Since I basically simply wanted to flag bike routes, I used searchtype = "radius" to only searches for neighbours within a specified radius of the point. If no waypoints (i.e. bike routes) lie within this radius,  nn.idx will contain 0 and nn.dists will contain 1.340781e+154 for that point. I used this information to establish a logical vector indicating bike routes in the subsequent ifelse-statement.
  • Note that the radius is the distance based on the decimal between lon/lat coordinates. Having had a look at the Wikipedia page on decimal degrees (mpre precisely: the table about degree precision versus length), we can see that 3 decimal places (0.001 degrees) correspond to 111.32 m in N/S and 78.71 m E/W at 45N/S. Thus, radius = 0.001 will search for the nearest point within approx. 110 in N/S direction and approx. 75 meters in W/E direction in Austria.

Here are the results in a simple visualization via QGIS. The grey dots are the underlying road graph, blue stars indicate the GPS waypoints, red dots indicate the fitted segments lying within the radius of the GPS tracks and the green dots indicate the smoothed bike tracks:

Best regards,

About This Author

Matthias studied Environmental Information Management at the University of Natural Resources and Life Sciences Vienna and holds a PhD in environmental statistics. The focus of his thesis was on the statistical modelling of rare (extreme) events as a basis for vulnerability assessment of critical infrastructure. He is working at the Austrian national weather and geophysical service (ZAMG) and at the Institute of Mountain Risk Engineering at BOKU University. He currently focuses the (statistical) assessment of adverse weather events and natural hazards, and disaster risk reduction. His main interests are statistical modelling of environmental phenomena as well as open source tools for data science, geoinformation and remote sensing.

1 Comment

You can post comments in this post.

  • This is so helpful! Thank you!

    Robbie Roemer 6 years ago Reply

Post A Reply