Generating "Non-Random" Numbers in R? - r

I know how to generate 100 random numbers in R (without replacement):
random_numbers = sample.int(100, 100, replace = FALSE)
I was now curious about learning how to generate 100 "non random" numbers (without replacement). The first comes to mind is to generate a random number, and the next number will be the old number + 1 with a probability of 0.5 or an actual random number with probability 0.5. Thus, these numbers are not "fully random".
This was my attempt to write this code for numbers in a range of 0 to 100 (suppose I want to repeat this procedure 100 times):
library(dplyr)
all_games <- vector("list", 100)
for (i in 1:100){
index_i = i
guess_sets <- 1:100
prob_i = runif(n=1, min=1e-12, max=.9999999999)
guess_i = ifelse(prob_i> 0.5, sample.int(1, 100, replace = FALSE), guess_i + 1)
guess_sets_i <- setdiff(guess_sets_i, guess_i)
all_games_i = as.list(index_i, guess_i, all_games_i)
all_games[[i]] <- all_games_i
}
all_games <- do.call("rbind", all_games)
I tried to make a list that stores all guesses such that the range for the next guess automatically excludes numbers that have already been guessed, but I get this error:
Error in sample.int(1, 100, replace = FALSE) :
cannot take a sample larger than the population when 'replace = FALSE'
Ideally, I am trying to get the following results (format doesn't matter):
index_1 : 5,6,51,4,3,88,87,9 ...
index_2 77,78,79,2,65,3,1,99,100,4...
etc.
Can someone please show me how to do this? Are there easier ways in R to generate "non-random numbers"?
Thank you!
Note: I think an extra line of logic needs to be added - Suppose I guess the number 100, after guessing the number 100 I must guess a new random number since 100+1 is not included in the original range. Also, if I guess the number 5, 17 then 4 - and after guessing 4, the loop tells me to guess 4+1, this is impossible because 5 has already been guessed. In such a case, I would also have to guess a new random number?

It would be tricky to make your algorithm very efficient in R... it doesn't lend itself nicely to vectorization. Here's how I'd write it directly as a for loop:
semirandom = function(n) {
safe_sample = function(x, ...) {
if(length(x) == 1) return(x)
sample(x, ...)
}
result = numeric(n)
result[1] = sample.int(n, size = 1)
for(i in 2:length(result)) {
if(runif(1) < .5 &&
result[i - 1] < n &&
!((result[i - 1] + 1) %in% result)) {
result[i] = result[i - 1] + 1
} else {
result[i] = safe_sample(x = setdiff(1:n, result), size = 1)
}
}
result
}
# generate 10 semirandom numbers 5 times
replicate(semirandom(10), n = 5)
# [,1] [,2] [,3] [,4] [,5]
# [1,] 6 4 4 2 6
# [2,] 3 5 5 3 7
# [3,] 4 3 6 4 5
# [4,] 5 1 2 5 2
# [5,] 7 9 3 6 3
# [6,] 9 10 10 1 1
# [7,] 10 2 8 9 4
# [8,] 2 8 1 8 10
# [9,] 1 7 9 10 9
# [10,] 8 6 7 7 8

You get the error cannot take a sample larger than the population when 'replace = FALSE' because you attempt to extract 100 values from a vector of length one without replacement.
The following draws numbers between 1 and 100, draws each number not more than once, has a 50 percent chance of drawing the previous number + 1 and a 50 percent chance of drawing another random number, if the previous number + 1 has not been drawn yet, and a 100 percent chance to draw another random number, if the previous number + 1 has been drawn.
i <- sample.int(100, 1)
j <- i
for(x in 1:99) {
if((i + 1L) %in% j) {
i <- sample((1:100)[-j], 1L)
} else {
if(runif(1L) > 0.5 || i == 100L) {
i <- sample((1:100)[-j], 1L)
} else {
i <- i + 1L
}
}
j <- c(j, i)
}

Related

R: Logical Conditions Not Being Respected

I am working with the R programming language. I am trying to build a loop that performs the following :
Step 1: Keep generating two random numbers "a" and "b" until both "a" and "b" are greater than 12
Step 2: Track how many random numbers had to be generated until it took for Step 1 to be completed
Step 3: Repeat Step 1 and Step 2 100 times
Since I do not know how to keep generating random numbers until a condition is met, I tried to generate a large amount of random numbers hoping that the condition is met (there is probably a better way to write this):
results <- list()
for (i in 1:100){
# do until break
repeat {
# repeat many random numbers
a = rnorm(10000,10,1)
b = rnorm(10000,10,1)
# does any pair meet the requirement
if (any(a > 12 & b > 12)) {
# put it in a data.frame
d_i = data.frame(a,b)
# end repeat
break
}
}
# select all rows until the first time the requirement is met
# it must be met, otherwise the loop would not have ended
d_i <- d_i[1:which(d_i$a > 10 & d_i$b > 10)[1], ]
# prep other variables and only keep last row (i.e. the row where the condition was met)
d_i$index = seq_len(nrow(d_i))
d_i$iteration = as.factor(i)
e_i = d_i[nrow(d_i),]
results[[i]] <- e_i
}
results_df <- do.call(rbind.data.frame, results)
Problem: When I look at the results, I noticed that the loop is incorrectly considering the condition to be met, for example:
head(results_df)
a b index iteration
4 10.29053 10.56263 4 1
5 10.95308 10.32236 5 2
3 10.74808 10.50135 3 3
13 11.87705 10.75067 13 4
1 10.17850 10.58678 1 5
14 10.14741 11.07238 1 6
For instance, in each one of these rows - both "a" and "b" are smaller than 12.
Does anyone know why this is happening and can someone please show me how to fix this problem?
Thanks!
How about this way? As you tag while-loop, I tried using it.
res <- matrix(0, nrow = 0, ncol = 3)
for (j in 1:100){
a <- rnorm(1, 10, 1)
b <- rnorm(1, 10, 1)
i <- 1
while(a < 12 | b < 12) {
a <- rnorm(1, 10, 1)
b <- rnorm(1, 10, 1)
i <- i + 1
}
x <- c(a,b,i)
res <- rbind(res, x)
}
head(res)
[,1] [,2] [,3]
x 12.14232 12.08977 399
x 12.27158 12.01319 1695
x 12.57345 12.42135 302
x 12.07494 12.64841 600
x 12.03210 12.07949 82
x 12.34006 12.00365 782
dim(res)
[1] 100 3

How to do large combinations with condition in R efficiently?

Survey shows average score of 4.2 out of 5, with sample size of 14. How do I create a dataframe that provides a combination of results to achieve score of 4.2?
I tried this but it got too big
library(tidyverse)
n <- 14
avg <- 4.2
df <- expand.grid(rep(list(c(1:5)),n))
df <- df %>%
rowwise() %>%
mutate(avge = mean(c_across())) %>%
filter(ave >= 4)
The aim for this is, given the limited information above, I want to know the distribution of combinations of individual scores and see which combination is more likely to occur and how many low scores + high scores needed to have an average of that score above.
Thanks!
If you can tolerate doing this randomly, then
set.seed(42) # only so that you get the same results I show here
n <- 14
iter <- 1000000
scores <- integer(0)
while (iter > 0) {
tmp <- sample(1:5, size = n, replace = TRUE)
if (mean(tmp) > 4) {
scores <- tmp
break
}
iter <- iter - 1
}
mean(scores)
# [1] 4.142857
scores
# [1] 5 3 5 5 5 3 3 5 5 2 5 5 4 3
Notes:
The reason I use iter in there is to preclude the possibility of an "infinite" loop. While here it reacts rather quickly and is highly unlikely to go that far, if you change the conditions then it is possible your conditions could be infeasible or just highly improbable. If you don't need this, then remove iter and use instead while (TRUE) ...; you can always interrupt R with Escape (or whichever mechanism your IDE provides).
The reason I prefill scores with an empty vector and use tmp is so that you won't accidentally assume that scores having values means you have your average. That is, if the constraints are too tight, then you should find nothing, and therefore scores should not have values.
FYI: if you're looking for an average of 4.2, two things to note:
change the conditional to be what you need, such as looking for 4.2 ... but ...
looking for floating-point equality is going to bite you hard (see Why are these numbers not equal?, Is floating point math broken?, and https://en.wikipedia.org/wiki/IEEE_754), I suggest looking within a tolerance, perhaps
tol <- 0.02
# ...
if (abs(mean(tmp) - 4.2) < tol) {
scores <- tmp
break
}
# ...
where tol is some meaningful number. Unfortunately, using this seed (and my iter limit) there is no combination of 14 votes (of 1 to 5) that produce a mean that is within tol = 0.01 of 4.2:
set.seed(42)
n <- 14
iter <- 100000
scores <- integer(0)
tol <- 0.01
while (iter > 0) {
tmp <- sample(1:5, size = n, replace = TRUE)
# if (mean(tmp) > 4) {
if (abs(mean(tmp) - 4.2) < tol) {
scores <- tmp
break
}
iter <- iter - 1
}
iter
# [1] 0 # <-- this means the loop exited on the iteration-limit, not something found
scores
# integer(0)
if you instead set tol = 0.02 then you will find something:
tol <- 0.02
# ...
scores
# [1] 4 4 4 4 4 5 4 5 5 5 3 4 3 5
mean(scores)
# [1] 4.214286
You can try the code below
n <- 14
avg <- 4.2
repeat{
x <- sample(1:5, n, replace = TRUE)
if (sum(x) == round(avg * n)) break
}
and you will see
> x
[1] 5 5 5 5 5 5 4 5 5 4 1 5 1 4
> mean(x)
[1] 4.214286

Choose closest x elements by index in a list/vector

If I have a vector such as x <-c(1,2,3,4,5,6,7,8,9), I want a function f such that
f(vector,index,num) where it takes the vector and gives me num "closest" elements to that one on the index
Examples:
f(x,3,4) = c(1,2,4,5)
f(x,1,5) = c(2,3,4,5,6)
f(x,8,3) = c(6,7,9)
Since there is also the issue where if we have an odd num, we will need to choose whether to pick left or right side by symmetry, let's go with choosing the left side (but right side is ok too)
i.e f(x,4,5) = c(1,2,3,5,6) and f(x,7,3) = c(5,6,8)
I hope my question is clear, thank you for any help/responses!
edit: The original vector of c(1:9) is arbitrary, the vector could be a vector of strings, or a vector of length 1000 with shuffled numbers with repeats etc.
i.e c(1,7,4,2,3,7,2,6,234,56,8)
num_closest_by_indices <- function(v, idx, num) {
# Try the base case, where idx is not within (num/2) of the edge
i <- abs(seq_along(x) - idx)
i[idx] <- +Inf # sentinel
# If there are not enough elements in the base case, incrementally add more
for (cutoff_idx in seq(floor(num/2), num)) {
if (sum(i <= cutoff_idx) >= num) {
# This will add two extra indices every iteration. Strictly if we have an even length, we should add the leftmost one first and `continue`, to break ties towards the left.
return(v[i <= cutoff_idx])
}
}
}
Here's an illustration of this algorithm: we rank the indices in order of desirability, then pick the lowest num legal ones:
> seq_along(x)
1 2 3 4 5 6 7 8 9
> seq_along(x) - idx
-2 -1 0 1 2 3 4 5 6
> i <- abs(seq_along(x) - idx)
2 1 0 1 2 3 4 5 6
> i[idx] <- +Inf # sentinel to prevent us returning the element itself
2 1 Inf 1 2 3 4 5 6
Now we can just find num elements with smallest values (break ties arbitrarily, unless you have a preference (left)).
Our first guess is all indices <= (num/2) ; this might not be enough if index is within (num/2) of the start/end.
> i <= 2
TRUE TRUE FALSE TRUE TRUE FALSE FALSE FALSE FALSE
> v[i <= 2]
1 2 4 5
So, adapting #dash2's code to handle the corner cases where some indices are illegal (nonpositive, or > length(x)), i.e. ! %in% 1:L. Then min(elems) would be the number of illegal indices which we cannot pick, hence we must pick abs(min(elems)) more.
Notes:
in the end the code is simpler and faster to handle it by three piecewise cases. Aww.
it actually seems to simplify things if we pick (num+1) indices, then remove idx before returning the answer. Using result[-idx] to remove it.
Like so:
f <- function (vec, elem, n) {
elems <- seq(elem - ceiling(n/2), elem + floor(n/2))
if (max(elems) > length(vec)) elems <- elems - (max(elems) - length(vec))
if (elems[1] < 1) elems <- elems + (1 - elems[1])
elems <- setdiff(elems, elem)
vec[elems]
}
Giving results:
> f(1:9, 1, 5)
[1] 2 3 4 5 6
> f(1:9, 9, 5)
[1] 4 5 6 7 8
> f(1:9, 2, 5)
[1] 1 3 4 5 6
> f(1:9, 4, 5)
[1] 1 2 3 5 6
> f(1:9, 4, 4)
[1] 2 3 5 6
> f(1:9, 2, 4)
[1] 1 3 4 5
> f(1:9, 1, 4)
[1] 2 3 4 5
> f(1:9, 9, 4)
[1] 5 6 7 8
Start a function with the variable argument x first, and the reference table and n after
.nearest_n <- function(x, table, n) {
The algorithm assumes that table is numeric, without any duplicates, and all values finite; n has to be less than or equal to the length of the table
## assert & setup
stopifnot(
is.numeric(table), !anyDuplicated(table), all(is.finite(table)),
n <= length(table)
)
Sort the table and then 'clamp' maximum and minimum values
## sort and clamp
table <- c(-Inf, sort(table), Inf)
len <- length(table)
Find the interval in table where x occurs; findInterval() uses an efficient search. Use the interval index as the initial lower index, and add 1 for the upper index, making sure to stay in-bounds.
## where to start?
lower <- findInterval(x, table)
upper <- min(lower + 1L, len)
Find the nearest n neighbors by comparing the lower and upper index distance to x, record the nearest value, and increment the lower or upper index as appropriate and making sure to stay in-bounds
## find
nearest <- numeric(n)
for (i in seq_len(n)) {
if (abs(x - table[lower]) < abs(x - table[upper])) {
nearest[i] = table[lower]
lower = max(1L, lower - 1L)
} else {
nearest[i] = table[upper]
upper = min(len, upper + 1L)
}
}
Then return the solution and finish the function
nearest
}
The code might seem verbose, but is actually relatively efficient because the only operations on the entire vector (sort(), findInterval()) are implemented efficiently in R.
A particular advantage of this approach is that it can be vectorized in it's first argument, calculating the test for using lower (use_lower = ...) as a vector and using pmin() / pmax() as clamps.
.nearest_n <- function(x, table, n) {
## assert & setup
stopifnot(
is.numeric(table), !anyDuplicated(table), all(is.finite(table)),
n <= length(table)
)
## sort and clamp
table <- c(-Inf, sort(table), Inf)
len <- length(table)
## where to start?
lower <- findInterval(x, table)
upper <- pmin(lower + 1L, len)
## find
nearest <- matrix(0, nrow = length(x), ncol = n)
for (i in seq_len(n)) {
use_lower <- abs(x - table[lower]) < abs(x - table[upper])
nearest[,i] <- ifelse(use_lower, table[lower], table[upper])
lower[use_lower] <- pmax(1L, lower[use_lower] - 1L)
upper[!use_lower] <- pmin(len, upper[!use_lower] + 1L)
}
# return
nearest
}
For instance
> set.seed(123)
> table <- sample(100, 10)
> sort(table)
[1] 5 29 41 42 50 51 79 83 86 91
> .nearest_n(c(30, 20), table, 4)
[,1] [,2] [,3] [,4]
[1,] 29 41 42 50
[2,] 29 5 41 42
Generalize this by taking any argument and coercing it to the required form using a reference look-up table table0 and the indexes into it table1
nearest_n <- function(x, table, n) {
## coerce to common form
table0 <- sort(unique(c(x, table)))
x <- match(x, table0)
table1 <- match(table, table0)
## find nearest
m <- .nearest_n(x, table1, n)
## result in original form
matrix(table0[m], nrow = nrow(m))
}
As an example...
> set.seed(123)
> table <- sample(c(letters, LETTERS), 30)
> nearest_n(c("M", "Z"), table, 5)
[,1] [,2] [,3] [,4] [,5]
[1,] "o" "L" "O" "l" "P"
[2,] "Z" "z" "Y" "y" "w"

How to generate random integers in R so that no two consecutive numbers are the same

Is there a method to generate random integers in R such that any two consecutive numbers are different? It is probably along the lines of x[k+1] != x[k] but I can't work out how to put it all together.
Not sure if there is a function available for that. Maybe this function can do what you want:
# n = number of elements
# sample_from = draw random numbers from this range
random_non_consecutive <- function(n=10,sample_from = seq(1,5))
{
y=c()
while(length(y)!=n)
{
y= c(y,sample(sample_from,n-length(y),replace=T))
y=y[!c(FALSE, diff(y) == 0)]
}
return(y)
}
Example:
random_non_consecutive(20,c(2,4,6,8))
[1] 6 4 6 2 6 4 2 8 4 2 6 2 8 2 8 2 8 4 8 6
Hope this helps.
The function above has a long worst-case runtime. We can keep that worst-case more constant with for example the following implementation:
# n = number of elements
# sample_from = draw random numbers from this range
random_non_consecutive <- function(n=10,sample_from = seq(1,5))
{
y= rep(NA, n)
prev=-1 # change this if -1 is in your range, to e.g. max(sample_from)+1
for(i in seq(n)){
y[i]=sample(setdiff(sample_from,prev),1)
prev = y[i]
}
return(y)
}
Another approach is to over-sample and remove the disqualifying ones as follows:
# assumptions
n <- 5 # population size
sample_size <- 1000
# answer
mu <- sample_size * 1/n
vr <- sample_size * 1/n * (1 - 1/n)
addl_draws <- round(mu + vr, 0)
index <- seq(1:n)
sample_index <- sample(index, sample_size + addl_draws, replace = TRUE)
qualified_sample_index <- sample_index[which(diff(sample_index) != 0)]
qualified_sample_index <- qualified_sample_index[1:sample_size]
# In the very unlikely event the number of qualified samples < sample size,
# NA's will fill the vector. This will print those N/A's
print(which(is.na(qualified_sample_index) == TRUE))

Identifying overlap zones in R raster package

Package:
raster
Data:
A rasterStack with 10 bands.
Each of the bands contains an image area surrounded by NAs
Bands are logical, i.e. "1" for image data and "0"/NA for surrounding area
The "image areas" of each band do not align completely with each other, though most have partial overlaps
Objective:
Write a fast function that can return either a rasterLayer or cell numbers for each "zone", for instance a pixel containing data only from bands 1 and 2 falls in zone 1, a pixel containing data only from bands 3 and 4 falls in zone 2, etc. If a rasterLayer is returned, I need to be able to match the zone value with band numbers later.
First attempt:
# Possible band combinations
values = integer(0)
for(i in 1:nlayers(myraster)){
combs = combn(1:nlayers(myraster), i)
for(j in 1:ncol(combs)){
values = c(values, list(combs[,j]))
}
}
# Define the zone finding function
find_zones = function(bands){
# The intersection of the bands of interest
a = subset(myraster, 1)
values(a) = TRUE
for(i in bands){
a = a & myraster[[i]]
}
# Union of the remaining bands
b = subset(myraster, 1)
values(b) = FALSE
for(i in seq(1:nlayers(myraster))[-bands]){
b = b | myraster[[i]]
}
#plot(a & !b)
cells = Which(a & !b, cells=TRUE)
return(cells)
}
# Applying the function
results = lapply(values, find_zones)
My current function takes a very long time to execute. Can you think of a better way? Note that I don't simply want to know how many bands have data at each pixel, I also need to know which bands. The purpose of this is to process different the areas differently afterwards.
Note also that the real-life scenario is a 3000 x 3000 or more raster with potentially more than 10 bands.
EDIT
Some sample data consisting of 10 offset image areas:
# Sample data
library(raster)
for(i in 1:10) {
start_line = i*10*1000
end_line = 1000000 - 800*1000 - start_line
offset = i * 10
data = c(rep(0,start_line), rep(c(rep(0,offset), rep(1,800), rep(0,200-offset)), 800), rep(0, end_line))
current_layer = raster(nrows=1000, ncols=1000)
values(current_layer) = data
if(i == 1) {
myraster = stack(current_layer)
} else {
myraster = addLayer(myraster, current_layer)
}
}
NAvalue(myraster) = 0 # You may not want to do this depending on your solution...
EDIT : Answer updated using Nick's trick and matrix multiplication.
You could try the following function, optimized by using Nick's trick and matrix multiplication. The bottleneck now is filling up stack with the seperate layers, but I guess the timings are quite OK now. Memory usage is a bit less, but given your data and the nature of R, I don't know if you can nibble of a bit without hampering the performance big time.
> system.time(T1 <- FindBands(myraster,return.stack=T))
user system elapsed
6.32 2.17 8.48
> system.time(T2 <- FindBands(myraster,return.stack=F))
user system elapsed
1.58 0.02 1.59
> system.time(results <- lapply(values, find_zones))
Timing stopped at: 182.27 35.13 217.71
The function returns either a rasterStack with the different level combinations present in the plot (that's not all possible level combinations, so you have some gain there already), or a matrix with the level number and level names. This allows you to do something like :
levelnames <- attr(T2,"levels")[T2]
to get the level names for each cell point. As shown below, you can easily put that matrix inside a rasterLayer object.
The function :
FindBands <- function(x,return.stack=F){
dims <- dim(x)
Values <- getValues(x)
nn <- colnames(Values)
vec <- 2^((1:dims[3])-1)
#Get all combinations and the names
id <- unlist(
lapply(1:10,function(x) combn(1:10,x,simplify=F))
,recursive=F)
nameid <- sapply(id,function(i){
x <- sum(vec[i])
names(x) <- paste(i,collapse="-")
x
})
# Nicks approach
layers <- Values %*% vec
# Find out which levels we need
LayerLevels <- unique(sort(layers))
LayerNames <- c("No Layer",names(nameid[nameid %in% LayerLevels]))
if(return.stack){
myStack <- lapply(LayerLevels,function(i){
r <- raster(nr=dims[1],nc=dims[2])
r[] <- as.numeric(layers == i)
r
} )
myStack <- stack(myStack)
layerNames(myStack) <- LayerNames
return(myStack)
} else {
LayerNumber <- match(layers,LayerLevels)
LayerNumber <- matrix(LayerNumber,ncol=dims[2],byrow=T)
attr(LayerNumber,"levels") <- LayerNames
return(LayerNumber)
}
}
Proof of concept, using the data of RobertH :
r <- raster(nr=10, nc=10)
r[]=0
r[c(20:60,90:93)] <- 1
s <- list(r)
r[]=0
r[c(40:70,93:98)] <- 1
s <- c(s, r)
r[]=0
r[50:95] <- 1
s <- (c(s, r))
aRaster <- stack(s)
> X <- FindBands(aRaster,return.stack=T)
> plot(X)
> X <- FindBands(aRaster,return.stack=F)
> X
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 1 1 1 1 1 1 1 1 1 1
[2,] 1 1 1 1 1 1 1 1 1 2
[3,] 2 2 2 2 2 2 2 2 2 2
[4,] 2 2 2 2 2 2 2 2 2 4
[5,] 4 4 4 4 4 4 4 4 4 8
[6,] 8 8 8 8 8 8 8 8 8 8
[7,] 7 7 7 7 7 7 7 7 7 7
[8,] 5 5 5 5 5 5 5 5 5 5
[9,] 5 5 5 5 5 5 5 5 5 6
[10,] 6 6 8 7 7 3 3 3 1 1
attr(,"levels")
[1] "No Layer" "1" "2" "3" "1-2" "1-3"
"2-3" "1-2-3"
> XX <- raster(ncol=10,nrow=10)
> XX[] <- X
> plot(XX)
I'm not familiar with raster, but from what I grasp from the above, you essentially have a 10*3000*3000 array, right?
If so, for each position in the raster (second and third indices, currow and curcol), you can calculate a unique identifier for its 'zone' by using binary: run i over the 'bands' (first index) and sum r[i,currow, curcol]*2^(i-1). Depending on the internal workings of raster, it should be possible to have a rather quick implementation of this.
This results in a new 'raster' of size 3000*3000 holding the unique identifiers of each position. Finding the unique values in there gives you back the zones that actually occur in your data, and reversing the binary logic should give you the bands that belong to a given zone.
Pardon me if my interpretation of raster is incorrect: then please ignore my musings. Either way not a complete solution.
How about this?
library(raster)
#setting up some data
r <- raster(nr=10, nc=10)
r[]=0
r[c(20:60,90:93)] <- 1
s <- list(r)
r[]=0
r[c(40:70,93:98)] <- 1
s <- c(s, r)
r[]=0
r[50:95] <- 1
s <- (c(s, r))
plot(stack(s))
# write a vectorized function that classifies the data
#
fun=function(x,y,z)cbind(x+y+z==0, x==1&y+z==0, y==1&x+z==0, z==1&x+y==0, x==0&y+z==2, y==0&x+z==2, z==0&x+y==2,x+y+z==3)
z <- overlay(s[[1]], s[[2]], s[[3]], fun=fun)
# equivalent to
#s <- stack(s)
#z <- overlay(s[[1]], s[[2]], s[[3]], fun=fun)
ln <- c("x+y+z==0", "x==1&y+z==0", "y==1&x+z==0", "z==1&x+y==0", "x==0&y+z==2", "y==0&x+z==2", "z==0&x+y==2", "x+y+z==3")
layerNames(z) <- ln
x11()
plot(z)
more generic:
s <- stack(s)
fun=function(x)as.numeric(paste(which(x==1), collapse=""))
x <- calc(s,fun)
this is not good when nlayers(s) has double digits ("1", "2" is the same as "12", and in those cases you could use the function below (fun2) instead:
fun2=function(x)as.numeric(paste(c(9, x), collapse=""))
x2 <- calc(s,fun2)
unique(x)
# [1] 1 2 3 12 13 23 123
unique(x2)
# [1] 9000 9001 9010 9011 9100 9101 9110 9111
for the toy example only:
plot(x)
text(x)
p=rasterToPolygons(x)
plot(p, add=T)
I've written code for #Nick Sabbe's suggestion, which I think is very concise and relatively fast. This assumes that the input rasterStack already has logical 1 or 0 data:
# Set the channels to 2^i instead of 1
bands = nlayers(myraster)
a = stack()
for (i in 1:bands) {
a = addLayer(a, myraster[[i]] * 2^i)
}
coded = sum(a)
#plot(coded)
values = unique(coded)[-1]
remove(a, myraster)
# Function to retrieve which coded value means which channels
which_bands = function(value) {
single = numeric()
for (i in bands:1) {
if ((0 < value) & (value >= 2^i)) {
value = value - 2^i
single = c(single, i)
}
}
return(single)
}

Resources