For my research I create rasterstack of satellite data of an area with a lot of ice, because of this, a lot of images are completely filled with NA's. These i would like to remove from the stacks automatically.
Suppose I have a rasterstack ,
r <- raster(nrow=10, ncol=10)
s1 <- s2<- list()
for (i in 1:12) {
s1[i] <- setValues(r, rnorm(ncell(r), i, 3) )
s2[i] <- setValues(r, rnorm(ncell(r), i, 3) )
}
s1 <- stack(s1)
s3 <- subset(s1,1)
s3[] <- NA
s2 <- stack(s2)
# regression of values in one brick (or stack) with another
s <- stack(s1,s3, s2)
The middle image, image 13, is completely NA, now I could delete this using the subset function, but how could I get r to remove this layer automatically, so I get the same as;
s_no_na <- stack(s1,s2)
Here is another approach. If all values are NA, the minimum value is also NA. So you can do:
i <- !is.na(minValue(s))
s_no_na <- s[[i]]
This could be very fast if the minValue is known (otherwise it needs to be computed).
What do you mean by "automatically"? You have to test for it.
Try testing each raster with something like !any(is.na(values(s))) or all(is.na(values(s))) where s is a raster. Put that in a loop in a function that builds your final stack.
If you want a one-liner, this uses Filter to select from a list, and then do.call to apply stack to the filtered list:
sf = do.call(stack, Filter(function(e){!all(is.na(values(e)))},list(s1,s3,s2)))
I prefer this approach, which is a bit short and sweet:
result <- rasters[!sapply(rasters, is.null)]
Related
For this problem, I'm trying to create multiple vectors that I can apply a function on. The function im using is patchwork::area.
For example, if I were to explicitly write out a design area to plot using patchwork I would do something like this:
library(patchwork)
# Explicitly writing out each area
design <- c(
area(1, 1, 9, 9),
area(1,10),
area(2,10),
area(3,10),
area(4,10),
area(5,10),
area(6,10),
area(7,10),
area(8,10),
area(9,10)
)
# example of what the plot area would look like
plot(design)
And this would look like:
Essentially, I am trying to automate the design vector above. I attempted this by using lapply to create a list of areas, like so:
# create some data
vals = seq(1:9)
maxVal <- max(vals)
# use lapply
areaList <- lapply(vals, function(x) area(x, maxVal+1))
This creates a list of the areas, excluding the first row from the design object above... but I cant figure out how to turn it into the design object above.
A naive attempt (that doesn't work) is to do something like the code below (which tries to include the 1st row of the design object)
designTest <- c(area(1, 1, maxVal, maxVal),
areaList)
Any suggestion as to how I could achieve this?
You need to combine the elements of areaList like this before plotting:
library(patchwork)
library(purrr)
vals = seq(1:9)
maxVal <- max(vals)
areaList <- lapply(vals, function(x) area(x, maxVal+1))
a <- reduce(areaList, c)
plot(a)
This should work!
I have 800 classified raster images(7 classes) and each classes from one image needs to be calculated in squaremeters. So far it works with one image but not with the loop. What can I do to solve it?
report_files<-list(list of 800 tif files)
for( i in report_files){
reportfiles_single<-raster(report_files[i])
df<-as.data.frame(table(reportfiles_single))
df2<-as.data.frame(df$Freq*(0.070218*0.070218))
{report_mean<- df2}
}
This piece works for one-and there is an example file: https://ufile.io/rb7tj
a<-raster("test060707.tif")
val<-values(a)
table_val<-data.frame(val)
df<-as.data.frame(table(table_val))
df2<-as.data.frame(df$Freq*(0.070218*0.070218))
Your loop is not correct. You do for( i in report_files){, but later on you use i as if you had for( i in 1:length(report_files)){
Inside the loop you do table(reportfiles_single), whereas in the working code example you have, the equivalent would be table(values(reportfiles_single)).
Furthermore, you do not keep any of the results, as you overwrite report_mean in each iteration.
So you should really start out by learning how looping works, and first make some simple examples.
Also, are you sure the spatial resolution of your data is about 7 by 7 cm? That is possible, but seems unlikely.
Here is an example of what you might do. I use the freq method because it is memory-safe, unlike table(values())
library(raster)
f <- system.file("external/test.grd", package="raster")
report_files <- c(f, f, f)
res <- list()
for( i in 1:length(report_files)){
r <- raster(report_files[i])
# note the -2 for this example, you probably do not want that with your data
f <- freq(r, digits= -2)
cellsize <- prod(res(r))
f[,2] <- f[,2] * cellsize
res[[i]] <- f
}
Instead of using a loop, you could do:
s <- stack(report_files)
x <- freq(s, digits=-2)
And multiply with the cellsize later.
I'm working on a project that has large raster objects that are associated with variables and modified inside a function. I already sort the variables I need inside the function but I now want to return not just the sorted matrix of my variables but the raster associated with those values. I could run the function twice and return the second object I want in the second iteration but that seems terrible and a waste of time. I am very new to programming and R is my first language. This code below throws the same error as my more complicated function,
"Error in temp2[i, ] = t(as.matrix(temp)) :
incorrect number of subscripts on matrix "
Any advice would be very helpful, thank you.
require('raster')
r1 <- raster(nrows=108, ncols=21, xmn=0, xmx=10)
Test = function(x,y,z){
temp = matrix(NA,4,length(x))
temp2 = matrix(NA,4,length(x))
for(i in 1:length(x)){
temp=c(r1,x[i],y[i],z[i])
temp2[i,]=t(as.matrix(temp))
}
return(temp2)
}
x = c(1,2,3,4)
y = c(1,2,3,4)
z = c(1,2,3,4)
final answer = Test(x,y,z)
Your questions is not clear at all. But here is how you can sort raster values
library(raster)
s <- stack(system.file("external/rlogo.grd", package="raster"))
ss <- calc(s, sort, na.last=TRUE)
I have this set of sequences with 2 variables for a 3rd variable(device). Now i want to break the sequence for each device into sets of 300. dsl is a data frame that contains d being the device id and s being the number of sequences of length 300.
First, I am labelling (column Sid) all the sequences rep(1,300) followed by rep(2,300) and so on till rep(s,300). Whatever remains unlabelled i.e. with initialized labels(=0) needs to be ignored. The actual labelling happens with seqid vector though.
I had to do this as I want to stack the sets of 300 data points and then transpose it. This would form one row of my predata data.frame. For each predata data frame i am doing a k-means to generate 5 clusters that I am storing in final data.
Essentially for every device I will have 5 clusters that I can then pull by referencing the row number in final data (mapped to device id).
#subset processed data by device
for (ds in 1:387){
d <- dsl[ds,1]
s <- dsl[ds,3]
temp.data <- subset(data,data$Device==d)
temp.data$Sid <- 0
temp.data[1:(s*300),4] <- rep(1:300,s)
temp.data <- subset(temp.data,temp.data$Sid!="0")
seqid <- NA
for (j in 1:s){ seqid[(300*(j-1)+1):(300*j)] <- j }
temp.data$Sid <- seqid
predata <- as.data.frame(matrix(numeric(0),s,600))
for(k in 1:s){
temp.data2 <- subset(temp.data[,c(1,2)], temp.data$Sid==k)
predata[k,] <- t(stack(temp.data2)[,1])
}
ob <- kmeans(predata,5,iter.max=10,algorithm="Hartigan-Wong")
finaldata <- rbind(finaldata,(unique(fitted(ob,method="centers"))))
}
Being a noob to R, I ended up with 3 nested loops (the function did work for the outermost loop being one value). This has taken 5h and running. Need a faster way to go about this.
Any help will be appreciated.
Thanks
Ok, I am going to suggest a radical simplification of your code within the loop. However, it is hard to verify that I in fact did assume the right thing without having sample data. So please ensure that my predata in fact equals yours.
First the code:
for (ds in 1:387){
d <- dsl[ds,1]
s <- dsl[ds,3]
temp.data <- subset(data,data$Device==d)
temp.data <- temp.data[1:(s*300),]
predata <- cbind(matrix(temp.data[,1], byrow=T, ncol=300), matrix(temp.data[,2], byrow=T, ncol=300))
ob <- kmeans(predata,5,iter.max=10,algorithm="Hartigan-Wong")
finaldata <- rbind(finaldata,(unique(fitted(ob,method="centers"))))
}
What I understand you are doing: Take the first 300*s elements from your subset(data, data$Devide == d. This might easily be done using the command
temp.data <- temp.data[1:(s*300),]
Afterwards, you collect a matrix that has the first row c(temp.data[1:300, 1], temp.data[1:300, 2]), and so on for all further rows. I do this using the matrix command as above.
I assume that your outer loop could be transformed in a call to tapply or something similar, but therefore, we would need more context.
With some help, I figured out how to transform an edgelist, aka, an adjacency list into an adjacency matrix. I want to learn how to automate this for a large number of edgelists and then put the resulting adjacency matrices in a list.
I'm guessing plyr is the best way to do this, but if you want to tell me how to do it with loops I'd be grateful for that as well. For the curious, the data represents social networks in different schools.
Here's what I've got so far:
# extract one school edgelist from the dataframe
aSchool <- myDF[which(myDF$school==1), c("school", "id", "x1","x2","x3","x4","x5","x6","x7","x8","x9","x10")]
# figure out unique ids
edgeColumns <- c("x1","x2","x3","x4","x5","x6","x7","x8","x9","x10")
ids <- unique(unlist(aSchool[edgeColumns]))
ids <- ids[!is.na(ids)]
# make an empty matrix
m <- matrix(0,nrow=length(ids),ncol=length(ids))
rownames(m) <- colnames(m) <- as.character(ids)
# fill in the matrix
for(col in edgeColumns){
theseEdges <- aSchool[c("id",col)]
theseEdges <- na.omit(theseEdges)
theseEdges <- apply(theseEdges,1,as.character)
theseEdges <- t(theseEdges)
m[theseEdges] <- m[theseEdges] + 1
}
for(i in 1:nrow(m)) m[i,i] <- 0
Check out the SNA package and the as.edgelist.sna() and as.sociomatrix.sna() functions.
In particular, as.sociomatrix.sna() seems like the perfect solution here: it's designed to convert an edgelist to an adjacency matrix in a single step (without losing attributes such as vertex names, etc.). Wrap it all up in a call to lapply() and I think you've got yourself yet another (maybe less labor intensive?) solution.
If you'd like to see a more expressive answer, I think it would be helpful to either provide more complete sample data or a clearer description of exactly what is in myDF
Also, I don't have the reputation on SO to do so, but I would add some tags to this post to signal that it's about network analysis.
Its hard to answer your question without a workable example. But if I understand your question correctly here is a function that should work (returns a list containing symmetrican adjacency matrices):
makeADJs <- function(...)
{
require(plyr)
dfs <- list(...)
e2adj <- function(x)
{
IDs <- unique(c(as.matrix(x)))
df <- apply(x,2,match,IDs)
adj <- matrix(0,length(IDs),length(IDs))
colnames(adj) <- rownames(adj) <- IDs
a_ply(rbind(df,df[,2:1]),1,function(y){adj[y[1],y[2]] <<- 1})
return(adj)
}
llply(dfs,e2adj)
}
Example:
makeADJs(
cbind(letters[sample(1:26)],letters[sample(1:26)]),
cbind(letters[sample(1:26)],letters[sample(1:26)]),
cbind(letters[sample(1:26)],letters[sample(1:26)]),
cbind(letters[sample(1:26)],letters[sample(1:26)])
)
Edit:
Or without plyr:
makeADJs <- function(...)
{
dfs <- list(...)
e2adj <- function(x)
{
IDs <- unique(c(as.matrix(x)))
df <- apply(x,2,match,IDs)
adj <- matrix(0,length(IDs),length(IDs))
colnames(adj) <- rownames(adj) <- IDs
apply(rbind(df,df[,2:1]),1,function(y){adj[y[1],y[2]] <<- 1})
return(adj)
}
lapply(dfs,e2adj)
}
Edit2:
And to plot them all in a single pdf file:
library(qgraph)
pdf("ADJplots.pdf")
l_ply(adjs,function(x)qgraph(x,labels=colnames(x)))
dev.off()