I have a loop with a series of raster images and I want to extract the values equal to 150 and then add the total amount of pixels for the entire length of the loop. With the code that I have only managed to get the total values for each image separately and not in total form. Thanks
m=52419 #total pixels basin
for(i in 1:4){
b1<-raster(myras1[i])
bc = b1 == 150 #Values eq 150
nbc = cellStats(bc,stat="sum")
print(nbc)
[1] 34962
[1] 38729
[1] 52389
[1] 52176
pc=nbc*100/m
}
In general, it is recommended not to use loops in R for things that can be easily vectorised. Rather than trying to fix the (several) problems with your loop, I show instead a better way. You can perform the whole calculation in a single vectorised line:
sum(cellStats(myras1==150, stat="sum")) * 100/m
Breaking this down: cellStats performed on a raster stack will return a vector of values, one for each layer. sum then adds these together. Then we divide by number of cells in the whole stack (all layers combined) and multiply by 100 to convert to a percnetage.
Testing this on some reproducible dummy test data:
set.seed(123)
myras1 = list(
raster(nrows = 100, ncols = 100, vals = sample(140:150,10000,T)),
raster(nrows = 100, ncols = 100, vals = sample(140:150,10000,T)),
raster(nrows = 100, ncols = 100, vals = sample(140:150,10000,T)),
raster(nrows = 100, ncols = 100, vals = sample(140:150,10000,T))
)
myras1 = stack(myras1)
m = ncol(myras1) * nrow(myras1) * nlayers(myras1)
sum(cellStats(myras1==150, stat="sum")) * 100/m
# [1] 8.815
Related
How does one draw a sample within a sapply function without replacement? Consider the following MWE below. What I am trying to achieve is for a number in idDRAW to receive a letter from chrSMPL (given the sample size of chrSMPL). Whether a number from idDRAW receives a letter is determined by the respective probabilities, risk factors and categories. This is calculated in the sapply function and stored in tmp.
The issue is sample replacement, leading to a number being named with a letter more than once. How can one avoid replacement whilst still using the sapply function? I have tried to adjust the code from this question (Alternative for sample) to suit my needs, but no luck. Thanks in advance.
set.seed(3)
chr<- LETTERS[1:8]
chrSMPL<- sample(chr, size = 30, replace = TRUE)
idDRAW<- sort(sample(1:100, size = 70, replace = FALSE))
p_mat<- matrix(runif(16, min = 0, max = 0.15), ncol = 2); rownames(p_mat) <- chr ## probability matrix
r_mat <- matrix(rep(c(0.8, 1.2), each = length(chr)), ncol = 2); rownames(r_mat) <- chr ## risk factor matrix
r_cat<- sample(1:2, 70, replace = TRUE) ## risk categories
# find number from `idDRAW` to be named a letter:
Out<- sapply(chrSMPL, function(x){
tmp<- p_mat[x, 1] * r_mat[x, r_cat]
sample(idDRAW, 1, prob = tmp)
})
> sort(Out)[1:3]
G B B
5 5 5
I managed with an alternative solution using a for loop as seen below. If anyone can offer suggestions on how the desired result can be achieved without using a for loop it would be greatly appreciated.
set.seed(3)
Out <- c()
for(i in 1:length(chrSMPL)){
tmp <- p_mat[chrSMPL[i], 1] * r_mat[chrSMPL[i], r_cat]
Out <- c(Out, sample(idDRAW, 1, prob = tmp))
rm <- which(idDRAW == Out[i])
idDRAW <- idDRAW[-rm]
r_cat <- r_cat[-rm]
}
names(Out) <- chrSMPL
sort(Out)[1:3]
I would like to generate 1000 samples of size 25 from a standard normal distribution, calculate the variance of each one, and create a histogram. I have the following:
samples = replicate(1000, rnorm(25,0,1), simplify=FALSE)
hist(sapply(samples, var))
Then I would like to randomly select one sample from those 1000 samples and take 1000 bootstraps from that sample. Then calculate the variance of each and plot a histogram. So far, I have:
sub.sample = sample(samples, 1)
Then this is where I'm stuck, I know a for loop is needed for bootstrapping here so I have:
rep.boot2 <- numeric(lengths(sub.sample))
for (i in 1:lengths(sub.sample)) {
index2 <- sample(1:1000, size = 25, replace = TRUE)
a.boot <- sub.sample[index2, ]
rep.boot2[i] <- var(a.boot)[1, 2]
}
but running the above produces an "incorrect number of dimensions" error. Which part is causing the error?
I can see 2 problems here. One is that you are trying to subset sub.sample with as you would with a vector but it is actually a list of length 1.
a.boot <- sub.sample[index2, ]
To fix this, you can change
sub.sample = sample(samples, 1)
to
sub.sample = as.vector(unlist(sample(samples, 1)))
The second problem is that you are generating a sample of 25 indexes from between 1 and 1000
index2 <- sample(1:1000, size = 25, replace = TRUE)
but then you try to extract these indexes from a list with a length of only 25. So you will end up with mostly NA values in a.boot.
If I understand what you want to do correctly then this should work:
samples = replicate(1000, rnorm(25,0,1), simplify=FALSE)
hist(sapply(samples, var))
sub.sample = as.vector(unlist(sample(samples, 1)))
rep.boot2=list()
for (i in 1:1000) {
index2 <- sample(1:25, size = 25, replace = TRUE)
a.boot <- sub.sample[index2]
rep.boot2[i] <- var(a.boot)
}
I would like to get non-NA values extracted from random coordinates of a raster within each grid cell.
An example of a raster
library(raster)
r <- raster(ncol = 10, nrow = 10, xmx = -80, xmn = -150, ymn = 20, ymx = 60)
values(r) <- runif(ncell(r))
An example of a grid
grid <- raster(extent(r))
res(grid) <- 15
proj4string(grid)<- proj4string(r)
gridpolygon <- rasterToPolygons(grid)
plot(r)
plot(gridpolygon, add = T)
How can I extract a value with random coordinates for each raster portions inside each grid cells?
I am really new at this kind of stuff so any suggestions will be very welcome.
Thanks.
You didn't specify all the condition for sampling, so I'm going by some assumptions here.
One can sample a point per grid polygon and extract the value. Here's how you can do it in one go and hope for the best:
# pick random points per each grid cell and plot
set.seed(357)
pickpts <- sapply(gridpolygon#polygons, spsample, n = 1, type = "random")
sapply(pickpts, plot, add = TRUE)
# extract values of raster cells at specified points
sapply(pickpts, FUN = extract, x = r)
Or you can do it in a loop and sample until you get a non-NA value.
N <- length(gridpolygon#polygons)
result <- rep(NA, times = N)
for (i in 1:N) {
message(sprintf("Trying polygon %d", i))
pl <- gridpolygon#polygons[[i]]
candval <- result[i] # start with NA
# sample until you get a non-NA hit
while (is.na(candval)) {
pickpoint <- spsample(pl, n = 1, type = "random")
candval <- extract(x = r, y = pickpoint)
}
result[i] <- candval
}
result
[1] 0.4235214 0.6081435 0.9126583 0.1710365 0.7788590 0.9413206 0.8589753
[8] 0.0376722 0.9662231 0.1421353 0.0804440 0.1969363 0.1519467 0.1398272
[15] 0.4783207
I have a function in R which I call
RS1 = t(cbind(Data[,18], Data[,20]))
RS2 = t(cbind(Data[,19], Data[,21]))
p = t(Data[23:24])
rand_x <- function (p, x) {
n.goods <- dim (p)[1]
n.obs <- dim (p)[2]
xRC = NaN*matrix(1, n.goods, n.obs)
for(i in 1:n.obs) {
xRC[1,i] <- RS1[1,i] + RS1[2,i]
xRC[2,i] <- RS2[1,i] + RS2[2,i]
}
result <- xRC
return(result)
}
This function by having these two inputs generates a vector (2x50) with some random numbers. I want to call this function rand_x 1000 times and derive 1000 matrices and then bind the results in a final matrix. I have tried to create a loop to sort this problem but I am still struggling. Any help will be much appreciated.
If you intend to add each element of column 18 to 20 (that is what your code does), try using rowSums().
Try:
xRC <- rbind(
rowSums (Data [, c(18, 20)])
rowSums (Data [, c(19, 21)])
)
The output will be a matrix.
I do not see, where randomness appears in your function though. If you just want a 2x50 matrix with random numbers you may want to use:
xRC <- matrix (rnorm(50*2), 2) # for standard-normal generated numbers
xRC <- matrix (sample(1:100, replace = T, size = 100), 2) # for numbers between 1 and 100, uniformly distributed
To do this 1000 times, try:
for (i in 1:1000) {
rbind(xRC,
rowSums (Data [, c(18, 20)])
rowSums (Data [, c(19, 21)])
)
}
# or if you just want to generate random numbers, performance is way faster when you use:
xRC <- matrix(rnorm(1000 * 2 * 50), ncol = 50)
I'm trying to create an evenly spaced (in time or in depth) subset of a larger dataset in R. My original data isn't evenly spaced.
These are the functions that need improvement:
# calculate step size and subsets df accordingly
spacedSS <- function(df, n, var){
stp <- (max(var)-min(var))/(n - 1) #calculate step size
stps <- min(var)+0:(n-1)*stp #calculate step values
res <- lookupDepth(df, stps, var)
return(as.data.frame(res))
}
# finds values in var closest to stps, returns subsetted df
lookupDepth <- function(df, stps, var){
indxs <- rep(0, times=length(stps)) # create empty index vector
for(i in seq_along(stps)) { # for every subsample row
# find the one closest to the step value
# TODO: only if it isn't already in the df
indxs[i] <- which.min((var - stps[i])^2)
}
sampls <- df[indxs, ] #subset by these new indexes
return(as.data.frame(sampls))
}
And here they are applied to data similar to my own to illustrate the problem:
# generate data
depth <- c(seq(650, 750, length.out = 50), seq(750, 760, length.out = 3),
seq(760, 780, length.out = 5), seq(780, 800, length.out = 20))
age <- c(seq(40, 41, length.out = 50), seq(41, 42, length.out = 3),
seq(42, 47, length.out = 5), seq(47, 48, length.out = 20))
id <- seq_along(age)
dat <- data.frame(id, depth, age)
# subset 10 samples of dat evenly spaced in depth/age
ss.depth <- spacedSS(dat, 10, dat$depth)
ss.age <- spacedSS(dat, 10, dat$age)
Here's a plot of the data:
# plot it using my depthplotter function
source("https://raw.githubusercontent.com/japhir/DepthPlotter/master/DepthPlotter.R")
DepthPlotter(dat[, c("depth", "age")], xlab = "Age (Ma)")
segments(30, ss.depth$depth, ss.depth$age, col = "blue")
segments(ss.age$age, 640, y1 = ss.age$depth, col = "red")
So the problem I'm trying to solve is that the subset function currently doesn't look at the indeces that are already used:
# the problem I'm trying to solve:
length(unique(ss.age$id)) != length(unique(ss.depth$id))
TRUE
# it picked the same samples sometimes because they were the closest ones!
ss.age$id
[1] 1 45 53 55 55 56 57 57 61 78
So as you can see, the problem is that when it is subsetting, it currently doesn't take into account the samples that have already been selected. Any idea on how to fix this?
So I ended up asking a friend to help me out, and we've constructed a rather complicated Simulated Annealing approach.
Basically we created a function to see if there are any duplicate index values, and if so fixes them really simply. A mutation function then randomly changes the index values. The loss of this new subset is checked against the original dataset, random mutations are generated and selected if they are better than the previous selection. The selection criteria are rather loose at first but get more stringent over time, resulting in a pretty cool optimised subset of the data!
If you are interested in the code we used, comment below and we'll put it up somewhere.