
Let’s apply the calc() function to this dataset to calculate the cell-wise mean across layers. So we were fairly close in our estimates, looks like it takes a little under 70 Mb of memory to hold this object in the R session. To calculate the actual memory used, we can override the default raster behavior and read the contents of the file into R using readAll(). We can calculate the total number of values this raster can store and the associated memory requirements assuming 8 bytes per cell value. Names.attr = paste("Band", seq_len(nlayers(r))), #> names : large.raster.1, large.raster.2, large.raster.3, large.raster.4, large.raster.5, large.raster.6, large.raster.7, large.raster.8, large.raster.9 #> extent : 0, 1, 0, 1 (xmin, xmax, ymin, ymax) The raster has dimensions 1000x1000 and 9 layers. Let’s start by importing an example dataset generated using the simulate_species() function from the prioritizr package. Depending on the memory requirements for a given raster calculation, and the memory available, the package functions will either read the whole dataset into R for processing or process it in smaller chunks. The raster package addresses this by only storing references to raster files within its Raster* objects.

This poses a problem for processing raster datasets, which can be much larger than the available system memory.

In general, R holds objects in memory, which results in a limit to the size of objects that can be processed. Most of the material is covered in greater detail in the raster package vignette, especially Chapter 10 of that document. For example, we make weekly estimates of bird abundance at 3 km resolution across the entire Western Hemisphere, which results in raster stacks with billions of cells! To produce seasonal abundance maps, we need to average the weekly layers across all weeks within each season using the raster function calc(), and it takes forever with these huge files! In this post, I’m going to try to understand how raster processes data and explore how this can be tweaked to improve computational efficiency. We work with a lot of large raster datasets on the eBird Status & Trends project, and processing them is becoming a real bottleneck in our R workflow.
