Thursday, 7 May 2015

Extract values from numerous rasters in less time

These days I was working with a Shiny app for which the computation time is a big problem.
Basically this app takes some coordinates, extract values from 1036 rasters for these coordinates and make some computations. 
As far as I can (and please correct me if I'm wrong!) tell there are two ways of doing this task:
1) load all the 1036 rasters and extract the values from each of them in a loop
2) create a stack raster and extract the values only one time

In the first approach it helps if I have all my rasters in one single folder, since in that case I can run the following code:

f <- list.files(getwd()) 
ras <- lapply(f,raster) 
ext <- lapply(ras,extract,MapUTM) 
ext2 <- unlist(ext)
t1 <- Sys.time()


The first line creates a list of all the raster file in the working directory, then with the second line I can read them in R using the package raster.
The third line extracts from each raster the values that corresponds to the coordinates of the SpatialPoints object named MapUTM. The object ext is a list, therefore I have to change it into a numeric vector for the computations I will do later in the script.
This entire operation takes 1.835767 mins.

Since this takes too much time I thought of using a stack raster. I can just run the following line to create
a RasterStack object with 1036 layers. This is almost instantaneous.

STACK <- stack(ras)
 
The object looks like this:
> STACK
class       : RasterStack 
dimensions  : 1217, 658, 800786, 1036  (nrow, ncol, ncell, nlayers)
resolution  : 1000, 1000  (x, y)
extent      : 165036.8, 823036.8, 5531644, 6748644  (xmin, xmax, ymin, ymax)
coord. ref. : NA 
names       :      Dir_10,      Dir_11,      Dir_12,      Dir_13,      Dir_14,      Dir_15,      Dir_16,      Dir_17,      Dir_18,      Dir_19,      Dir_20,      Dir_21,      Dir_22,      Dir_23,      Dir_24, ... 
min values  :   59.032657,  141.913933,   84.781970,  147.634633,   39.723591,  154.615133,   45.868360,  197.306633,   85.839959,  272.336367,   93.234409,  339.732100,   79.106781,  566.522933,  175.075968, ... 
max values  :  685.689288, 2579.985700,  840.835621, 3575.341167, 1164.557067, 5466.193933, 2213.728126, 5764.541400, 2447.792437, 4485.639133, 1446.003349, 5308.407167, 1650.665136, 5910.945967, 2038.332471, ... 


At this point I can extract the coordinates from all the rasters in one go, with the following line:
ext <- extract(STACK,MapUTM)
  

This has the advantage of creating a numeric vector, but unfortunately this operation is only slightly faster than the previous one, with a total time of 1.57565 mins

At this point, from a suggestion of a colleague Kirill Müller (http://www.ivt.ethz.ch/people/muelleki), I tested ways of translating the RasterStack into a huge matrix and then query it to extract values.
I encountered two problems with this approach, first is the amount of RAM needed to create the matrix and second is identify the exact row to extract from it.
In the package raster I can transform a Raster object into a matrix simply by calling the function as.matrix. However my RasterStack object has 800786 cells and 1036 layers, meaning that I would need to create a 800786x1036 matrix and I do have enough RAM for that.
I solved this problem using the package ff. I can create a matrix object in R that is associated with a physical object on disk. This approach allowed me to use a minimum amount of RAM and achieve the same results. This is the code I used:

mat <- ff(vmode="double",dim=c(ncell(STACK),nlayers(STACK)),filename=paste0(getwd(),"/stack.ffdata"))
 
for(i in 1:nlayers(STACK)){
mat[,i] <- STACK[[i]][]
}
save(mat,file=paste0(getwd(),"/data.RData"))

With the first line I create an empty matrix with the characteristics above (800786 rows and 1036 columns) and place it on disk.
Then in the loop I fill the matrix row by row. There is probably a better way of doing it but this does the job and that is all I actually care about. finally I save the ff object into an RData object on disk, simply because I had difficulties loading the ff object from disk.
This process takes 5 minutes to complete, but it is something you need to do just once and then you can load the matrix from disk and do the rest.

At this point I had the problem of identifying the correct cell from which to extract all the values. I solved it by creating a new raster and fill it with integers from 1 to the maximum number of cells. I did this using the following two lines:

ID_Raster <- raster(STACK[[1]])
ID_Raster[] <- 1:ncell(STACK[[1]])

Now I can use the extract function on this raster to identify the correct cell and the extract the corresponding values from the ff matrix, with the following lines:

ext_ID <- extract(ID_Raster,MapUTM)
ext2 <- mat[as.numeric(ext_ID),]

If I do the extract this way I complete the process in 2.671 secs, which is of great importance for the Shiny interface.


R code snippets created by Pretty R at inside-R.org

No comments:

Post a Comment

Note: only a member of this blog may post a comment.