Classified raster images extract infos loop in R - r

I have 800 classified raster images(7 classes) and each classes from one image needs to be calculated in squaremeters. So far it works with one image but not with the loop. What can I do to solve it?
report_files<-list(list of 800 tif files)
for( i in report_files){
reportfiles_single<-raster(report_files[i])
df<-as.data.frame(table(reportfiles_single))
df2<-as.data.frame(df$Freq*(0.070218*0.070218))
{report_mean<- df2}
}
This piece works for one-and there is an example file: https://ufile.io/rb7tj
a<-raster("test060707.tif")
val<-values(a)
table_val<-data.frame(val)
df<-as.data.frame(table(table_val))
df2<-as.data.frame(df$Freq*(0.070218*0.070218))

Your loop is not correct. You do for( i in report_files){, but later on you use i as if you had for( i in 1:length(report_files)){
Inside the loop you do table(reportfiles_single), whereas in the working code example you have, the equivalent would be table(values(reportfiles_single)).
Furthermore, you do not keep any of the results, as you overwrite report_mean in each iteration.
So you should really start out by learning how looping works, and first make some simple examples.
Also, are you sure the spatial resolution of your data is about 7 by 7 cm? That is possible, but seems unlikely.
Here is an example of what you might do. I use the freq method because it is memory-safe, unlike table(values())
library(raster)
f <- system.file("external/test.grd", package="raster")
report_files <- c(f, f, f)
res <- list()
for( i in 1:length(report_files)){
r <- raster(report_files[i])
# note the -2 for this example, you probably do not want that with your data
f <- freq(r, digits= -2)
cellsize <- prod(res(r))
f[,2] <- f[,2] * cellsize
res[[i]] <- f
}
Instead of using a loop, you could do:
s <- stack(report_files)
x <- freq(s, digits=-2)
And multiply with the cellsize later.

Related

Speeding up a loop over rasters

I have a big dataset with 30000 rasters. My goal is to extract a mean value using the polygon located within the raster and create a file with extracted rasters values and dates from rasters filenames.
I succeeded in doing this by performing the following loop:
for (i in 1:length(rasters2014)){
a <- raster(rasters2014[i])
ext[i] <- as.vector(extract(a, poligon2, fun=mean, na.rm=TRUE, df=F))
}
output2 = data.frame(ext, filename=filename2014)
The problem is that the presented above loop takes about 2.5h hours to complete the calculation. Does anyone have an idea how I could speed up this process?
If your raster are all properly aligned (same ncol, nrow, extent, origin, resolution), you could try identifying the "cell numbers" to be extracted by looking on the first file, then
extracting based on those. This could speed-up the processing beacause raster does not need to compute which cells to extract. Something like this:
rast1 <- raster(rasters2014[1])
cells <- extract(rast1, poligon2, cellnumbers = TRUE, df = TRUE)[,"cells"]
ext <- list()
for (i in 1:length(rasters2014)){
a <- raster(rasters2014[i])
ext[[i]] <- as.vector(extract(a, cells, fun=mean, na.rm=TRUE, df=F))
}
Note that I am also using a list to store the results to avoid "growing" a vector, which is usually wasteful.
Alternatively, as suggested by #qdread, you could build a rasterStack using raster::stack(rasters2014, quick = TRUE) and call extract over the stack to avoid the for loop. Don't know which would be faster.
HTH
If your polygons do not overlap (and in most cases they don't) an alternative route is
library(raster)
x <- rasterize(poligon2, rasters2014[1])
s <- raster::stack(rasters2014, quick = TRUE)
z <- zonal(s, x, "mean")
PS: Faster is nicer, but I would suggest getting lunch while this runs.
Thanks for your help! I've tried all of the proposed solutions and the computation time generally the same regardless of the applied method. Therefore, I guess that it is just not possible to significantly speed up the computational process.

Replace rbind in for-loop with lapply? (2nd circle of hell)

I am having trouble optimising a piece of R code. The following example code should illustrate my optimisation problem:
Some initialisations and a function definition:
a <- c(10,20,30,40,50,60,70,80)
b <- c(“a”,”b”,”c”,”d”,”z”,”g”,”h”,”r”)
c <- c(1,2,3,4,5,6,7,8)
myframe <- data.frame(a,b,c)
values <- vector(length=columns)
solution <- matrix(nrow=nrow(myframe),ncol=columns+3)
myfunction <- function(frame,columns){
athing = 0
if(columns == 5){
athing = 100
}
else{
athing = 1000
}
value[colums+1] = athing
return(value)}
The problematic for-loop looks like this:
columns = 6
for(i in 1:nrow(myframe){
values <- myfunction(as.matrix(myframe[i,]), columns)
values[columns+2] = i
values[columns+3] = myframe[i,3]
#more columns added with simple operations (i.e. sum)
solution <- rbind(solution,values)
#solution is a large matrix from outside the for-loop
}
The problem seems to be the rbind function. I frequently get error messages regarding the size of solution which seems to be to large after a while (more than 50 MB).
I want to replace this loop and the rbind with a list and lapply and/or foreach. I have started with converting myframeto a list.
myframe_list <- lapply(seq_len(nrow(myframe)), function(i) myframe[i,])
I have not really come further than this, although I tried applying this very good introduction to parallel processing.
How do I have to reconstruct the for-loop without having to change myfunction? Obviously I am open to different solutions...
Edit: This problem seems to be straight from the 2nd circle of hell from the R Inferno. Any suggestions?
The reason that using rbind in a loop like this is bad practice, is that in each iteration you enlarge your solution data frame and then copy it to a new object, which is a very slow process and can also lead to memory problems. One way around this is to create a list, whose ith component will store the output of the ith loop iteration. The final step is to call rbind on that list (just once at the end). This will look something like
my.list <- vector("list", nrow(myframe))
for(i in 1:nrow(myframe)){
# Call all necessary commands to create values
my.list[[i]] <- values
}
solution <- rbind(solution, do.call(rbind, my.list))
A bit to long for comment, so I put it here:
If columns is known in advance:
myfunction <- function(frame){
athing = 0
if(columns == 5){
athing = 100
}
else{
athing = 1000
}
value[colums+1] = athing
return(value)}
apply(myframe, 2, myfunction)
If columns is not given via environment, you can use:
apply(myframe, 2, myfunction, columns) with your original myfunction definition.

How to automatically remove an empty raster from a rasterstack in r?

For my research I create rasterstack of satellite data of an area with a lot of ice, because of this, a lot of images are completely filled with NA's. These i would like to remove from the stacks automatically.
Suppose I have a rasterstack ,
r <- raster(nrow=10, ncol=10)
s1 <- s2<- list()
for (i in 1:12) {
s1[i] <- setValues(r, rnorm(ncell(r), i, 3) )
s2[i] <- setValues(r, rnorm(ncell(r), i, 3) )
}
s1 <- stack(s1)
s3 <- subset(s1,1)
s3[] <- NA
s2 <- stack(s2)
# regression of values in one brick (or stack) with another
s <- stack(s1,s3, s2)
The middle image, image 13, is completely NA, now I could delete this using the subset function, but how could I get r to remove this layer automatically, so I get the same as;
s_no_na <- stack(s1,s2)
Here is another approach. If all values are NA, the minimum value is also NA. So you can do:
i <- !is.na(minValue(s))
s_no_na <- s[[i]]
This could be very fast if the minValue is known (otherwise it needs to be computed).
What do you mean by "automatically"? You have to test for it.
Try testing each raster with something like !any(is.na(values(s))) or all(is.na(values(s))) where s is a raster. Put that in a loop in a function that builds your final stack.
If you want a one-liner, this uses Filter to select from a list, and then do.call to apply stack to the filtered list:
sf = do.call(stack, Filter(function(e){!all(is.na(values(e)))},list(s1,s3,s2)))
I prefer this approach, which is a bit short and sweet:
result <- rasters[!sapply(rasters, is.null)]

How to create a loop for generate a list of random samples in R?

I'm trying to create a loop that creates a series of objects that contains a random sample, like this:
sample <- ceiling(runif(9, min=0, max=20))
(This is an example for a rounded uniform, but it can be replaced by a normal, poisson or whatever you want).
So, I built a loop for generate automatically various of those generators, with the objective of include them in a data frame. Then, the loop I designed was this:
N=50
dep=as.vector(N)
count=1
for (i in 1:N){
dep[count] <- ceiling(runif(9, min=0, max=20))
count=count+1
}
But it didn't work! For each dep[i] I have only a number, not a list of nine.
How I should do it? And if I want to include every dep[i] in a data frame?
Thanks so much, I hope you understand what i want.
It's because you've made dep a vector (these are 1D by default), but you're trying to store a 2-dimensional object in it.
You can dep off as NULL and rbind (row-bind) to it in the loop.Also, note that instead of using count in your loop you can just use i:
dep <- NULL
for (i in 1:N){
dep <- rbind(dep, ceiling(runif(9, min=0, max=20)))
}
# if you look at dep now it's a 2D matrix.
# We'll convert to data frame
dep <- as.data.frame(dep)
However, there's a simpler way to do this. You don't have to generate dep row-by-row, you can generate it up front, by making a vector containing 9*N of your rounded uniform distribution numbers:
dep <- ceiling(runif(9*N,min=0,max=20))
Now, dep is currently a vector of length 9*N. Let's make it into a Nx9 matrix:
dep <- matrix(dep,nrow=N)
Done!
So you can do all your code above in one line:
dep <- matrix( ceiling(runif(9*N,min=0,max=20)), nrow=N )
If you want you can call data.frame on dep (after it's been put into its 2D matrix form) to get a data frame.
As #mathematical.coffee explained. But also, it seems in your case for runif, you can use sample instead. And actually sample.int is more reliable. ...And about 3x faster than using runif here):
N <- 1000000
system.time( dep <- matrix(sample.int(20, 9*N, replace=TRUE), N) ) # 0.16 secs
range(dep) # 1 20
system.time( dep <- matrix(ceiling(runif(9*N, min=0, max=20)), N) ) # 0.45 secs
range(dep) # 1 20

plyr application, creating a list of matrices each of which corresponds to a subset of the data

With some help, I figured out how to transform an edgelist, aka, an adjacency list into an adjacency matrix. I want to learn how to automate this for a large number of edgelists and then put the resulting adjacency matrices in a list.
I'm guessing plyr is the best way to do this, but if you want to tell me how to do it with loops I'd be grateful for that as well. For the curious, the data represents social networks in different schools.
Here's what I've got so far:
# extract one school edgelist from the dataframe
aSchool <- myDF[which(myDF$school==1), c("school", "id", "x1","x2","x3","x4","x5","x6","x7","x8","x9","x10")]
# figure out unique ids
edgeColumns <- c("x1","x2","x3","x4","x5","x6","x7","x8","x9","x10")
ids <- unique(unlist(aSchool[edgeColumns]))
ids <- ids[!is.na(ids)]
# make an empty matrix
m <- matrix(0,nrow=length(ids),ncol=length(ids))
rownames(m) <- colnames(m) <- as.character(ids)
# fill in the matrix
for(col in edgeColumns){
theseEdges <- aSchool[c("id",col)]
theseEdges <- na.omit(theseEdges)
theseEdges <- apply(theseEdges,1,as.character)
theseEdges <- t(theseEdges)
m[theseEdges] <- m[theseEdges] + 1
}
for(i in 1:nrow(m)) m[i,i] <- 0
Check out the SNA package and the as.edgelist.sna() and as.sociomatrix.sna() functions.
In particular, as.sociomatrix.sna() seems like the perfect solution here: it's designed to convert an edgelist to an adjacency matrix in a single step (without losing attributes such as vertex names, etc.). Wrap it all up in a call to lapply() and I think you've got yourself yet another (maybe less labor intensive?) solution.
If you'd like to see a more expressive answer, I think it would be helpful to either provide more complete sample data or a clearer description of exactly what is in myDF
Also, I don't have the reputation on SO to do so, but I would add some tags to this post to signal that it's about network analysis.
Its hard to answer your question without a workable example. But if I understand your question correctly here is a function that should work (returns a list containing symmetrican adjacency matrices):
makeADJs <- function(...)
{
require(plyr)
dfs <- list(...)
e2adj <- function(x)
{
IDs <- unique(c(as.matrix(x)))
df <- apply(x,2,match,IDs)
adj <- matrix(0,length(IDs),length(IDs))
colnames(adj) <- rownames(adj) <- IDs
a_ply(rbind(df,df[,2:1]),1,function(y){adj[y[1],y[2]] <<- 1})
return(adj)
}
llply(dfs,e2adj)
}
Example:
makeADJs(
cbind(letters[sample(1:26)],letters[sample(1:26)]),
cbind(letters[sample(1:26)],letters[sample(1:26)]),
cbind(letters[sample(1:26)],letters[sample(1:26)]),
cbind(letters[sample(1:26)],letters[sample(1:26)])
)
Edit:
Or without plyr:
makeADJs <- function(...)
{
dfs <- list(...)
e2adj <- function(x)
{
IDs <- unique(c(as.matrix(x)))
df <- apply(x,2,match,IDs)
adj <- matrix(0,length(IDs),length(IDs))
colnames(adj) <- rownames(adj) <- IDs
apply(rbind(df,df[,2:1]),1,function(y){adj[y[1],y[2]] <<- 1})
return(adj)
}
lapply(dfs,e2adj)
}
Edit2:
And to plot them all in a single pdf file:
library(qgraph)
pdf("ADJplots.pdf")
l_ply(adjs,function(x)qgraph(x,labels=colnames(x)))
dev.off()

Resources