I'm working with a matrix where I need to replace values coded as 1 with new randomly generated numbers
The starting point is a matrix like this
set.seed(38921)
p <- matrix(nrow = 10, ncol = 25)
for(i in 1:10){
p[i, seq(1, floor(runif(1, min = 1, max = 25)), 1)] = 1
}
In the resulting p matrix, on each row I need the value of 1 to be replaced with a randomly generated integer bound between 1 and 25, where the numbers cannot repeat.
For example, on the first row of the matrix, there should be 6 randomly drawn numbers between 1 and 25, where none of the numbers are repeated, and 19 NA columns. On row two, there should be 12 randomly drawn numbers between 1 and 25 with no repeats and 13 NA columns.
Any help is greatly appreciated.
You can simply multiply your matrix by another matrix of random numbers. NA's will remain as NA.
p*matrix(sample(1:25), 10, 25)
Or if the dimensions change:
p*matrix(sample(1:25), nrow(p), ncol(p))
Where you have:
p[i, seq(1, floor(runif(1, min = 1, max = 25)), 1)] = 1
You're assigning to a range of inputs. So instead of assigning 1, you need to assign an appropriately sized vector with the elements you want. This can be generated with: sample(1:25, desiredLength, replace=F)
set.seed(38921)
p <- matrix(nrow = 10, ncol = 25)
for(i in 1:10){
n = floor(runif(1, min = 1, max = 25))
p[i, seq(1, n, 1)] = sample(1:25, n, replace=F)
}
Assuming you've created your initial matrix, here's one way to do it.
apply(t(p), 1, function(x) ifelse(x == 1, sample(1:25, sum(x[x == 1], na.rm = T), replace = F)))
Related
I'm trying to run a simulation with a combination of static variables and values within columns, sum the output, and store the individual outputs in a vector or dataframe.
mean1 <- 2.4
sd1 <- 0.5
df <- data.frame(x = c(2, 3, 4), y = c(5, 6, 7))
What I want to do is :
divide each row in column x by each row in column y
multiply by a normal distribution using mean1 and sd1
sum the resultant row values, so I'd have a single value per simulation.
I think I understand how I'd get the value if I wasn't going row by row, so for row 1 it'd be:
v1 <- replicate(n = 1, expr = rnorm(n = 100, mean = mean1, sd = sd1) * 2 / 5, simplify = TRUE)
But where I'm drawing a blank is how to run that for each row, then sum the results of each row for each simulation, in this case sum the three values from each of the three rows 100 times, so I'd have an output with 100 values.
Dividing x by y is constant so you can do it once and save it in a variable. You can then use replicate 100 times and generate 1 random number at every iteration to multiply and take sum.
val <- df$x/df$y
n <- 100
replicate(n, {
sum(val * rnorm(n = 1, mean = mean1, sd = sd1))
})
Or you can also generate 100 random values together and sum them with sapply.
r_val <- rnorm(n, mean = mean1, sd = sd1)
sapply(r_val, function(x) sum(val * x))
Ronak answered my question with:
val <- df$x/df$y
n <- 100
replicate(n, {
sum(val * rnorm(n = 1, mean = mean1, sd = sd1))
})
I had to add back the df$column reference (df$x here) as opposed to creating a constant since the actual application had more variables and math that was more complicated than the example, but the structure worked perfectly.
Thank you!
I've googled extensively and can't seem to find an answer to my problem. Apologies if this has been asked before. I have two matrices, a & b, each with the same dimensions. What I am trying to do is iterate over the rows of a (from i = 1 to number of rows in a) and check if any elements found in row i of matrix a appear in the corresponding row in matrix b. I have a solution using sapply but this becomes quite slow with very large matrices. I wondered if it is possible to vectorise my solution somehow? Examples below:
# create example matrices
a = matrix(
1:9,
nrow = 3
)
b = matrix(
4:12,
nrow = 3
)
# iterate over rows in a....
# returns TRUE for each row of a where any element in ith row is found in the corresponding row i of matrix b
sapply(1:nrow(a), function(x){ any(a[x,] %in% b[x,])})
# however, for large matrices this performs quite poorly. is it possible to vectorise?
a = matrix(
runif(14000000),
nrow = 7000000
)
b = matrix(
runif(14000000),
nrow = 7000000
)
system.time({
sapply(1:nrow(a), function(x){ any(a[x,] %in% b[x,])})
})
Use apply to find any 0 differences:
a <- sample(1:3, 9, replace = TRUE)
b <- sample(1:3, 9, replace = TRUE)
a <- matrix(a, ncol = 3)
b <- matrix(b, ncol = 3)
diff <- (a - b)
apply(diff, 1, function(x) which(x == 0)) # actual indexes = 0
apply(diff, 1, function(x) any(x == 0)) # row check only
or
Maybe you can try intersect + asplit like below
lengths(Map(intersect, asplit(a, 1), asplit(b, 1))) > 0
I want to identify two integer vectors with size equals to 50 where the first vector can vary from 0 to 20 and the second vector from 0 to 100.
The combination of the first vector and the second vector is unique (50 different couples of indices). But for example, you can select two times 1 in the first vector or two times 100 in the second one.
Bad solution:
Vector A (1, 1, ....)
Vector B (100, 100, ...)
Good solution:
Vector A (1, 1, 2, ... )
Vector B (100 , 99 , 100, ...)
At the moment I'm using:
a = sample(1:20, 50,replace = T)
b = sample(1:100, 50,replace = T)
But of course, I have many non-unique values.
You could increase your sample a bit an only keep those that have unique combinations:
a = sample(1:20, 100,replace = T)
b = sample(1:100, 100,replace = T)
df <- tibble(a = a, b = b)
df %>%
distinct() %>%
slice_head(n = 50)
I am repeatetly drawing large matrices with random values from a Monte Carlo Simulation. As I explore a large parameter space, the simulation will most likely run for several days, therefore I am trying to find most efficient way to shave off as much time as possible. Consider the following code with a 500x18 Matrix as an example.
U = matrix(sample.int(500, size = 500*18, replace = TRUE), nrow = 500, ncol = 18)
X = matrix(nrow= 500, ncol = 18)
Marginals = matrix(runif(500*18, min = 0, max = 1),500,18)
for (i in 1:18){
for (k in 1:500){
X[k,i] = Marginals[U[k,i],i]
}
}
The randomly drawn values in U serve as the row index, while the col index is provided by column of the respective U.
I know that loops are usually not the R away, is there a more efficient way to use e.g. apply here?
By Yogos Suggesiton, the most efficient code can make due without the k loop:
U = matrix(sample.int(500, size = 500*18, replace = TRUE), nrow = 500, ncol = 18)
X = matrix(nrow= 500, ncol = 18)
Marginals = matrix(runif(500*18, min = 0, max = 1),500,18)
for (i in 1:18){
X[, i] <- Marginals[U[, i], i]
}
You can speed up by calculating column by column:
for (i in 1:18) X[, i] <- Marginals[U[, i], i]
Eventually the following is equivalent to your code:
X <- replicate(18, sample(runif(500), repl=TRUE))
(this will not be much faster than my first variant, but the code is more compact)
I have a binary raster(r) at 1 meter resolution and I want to convert it into a percentage value at 4m resolution.This new raster would have each pixel value representing the percent, calculated on basis of total frequency of 1 out of 16 pixels.I looked at the raster package which has aggregate function. However, this doesn't work.
newras <-aggregate(r, fact=4, fun= percent)
What you do does not work because there is no function called percentage. But you can make one. In this case, the mean value is the fraction, so you multiply that with 100 to get the percentage.
Example data
library(raster)
r <- raster()
set.seed(0)
values(r) <- sample(0:1, ncell(r), replace=TRUE)
Aggregate
a <- aggregate(r, 4, fun=function(x,...) 100 * mean(x))
# or
a <- 100 * aggregate(r, 4, mean)
Consider NA values
r[sample(ncell(r), 0.9 * ncell(r))] <- NA
# Make a function and use it
percentage <- function(x, ...) { x <- na.omit(x); 100 * mean(x) }
a <- aggregate(r, 4, fun=percentage)
# or do
a <- 100 * aggregate(r, 4, fun=mean, na.rm=TRUE)
Here's a method just using matrices. I am using a 40 by 40 matrix. The method will require some thought if dimensions are not multiples of 4.
Original matrix:
mtx <- matrix(sample(0:1, 40^2, TRUE), 40, 40)
Indices to use as arguments for grouping:
inds <- Map(seq, seq(1, 37, 4), seq(4, 40, 4))
Group into 4 by 4 blocks. blockarray has 16 rows (each element within groups) and 100 columns (representing groups). Note that 40 x 40 = 16 x 100.
blockarray <- mapply(function(i, j) mtx[i, j],
rep(inds, times = 10),
rep(inds, each = 10))
To get the percentage matrix:
pcts <- matrix(colMeans(blockarray)*100, 10, 10)
Visual inspection of results:
image(mtx, zlim = 0:1, col = c("white", "black"))
image(pcts, zlim = c(0, 100), col = colorRampPalette(c("white", "black"))(11))
Validation of results:
sum(mtx[1:4, 5:8])/16*100
pcts[1, 2]