Does anyone know a fast way to create a matrix
like the following one in R.
[,1] [,2] [,3] [,4]
[1,] 1 1 1 1
[2,] 1 2 2 2
[3,] 1 2 3 3
[4,] 1 2 3 4
The matrix above is 4x4 and I want to create something like 10000x10000.
You can do:
N <- 4
m <- matrix(nrow = N, ncol = N)
m[] <- pmin.int(col(m), row(m))
or a shorter version as suggested by #dickoa:
m <- outer(1:N, 1:N, pmin.int)
These also work and are both faster:
m <- pmin.int(matrix(1:N, nrow = N, byrow = TRUE),
matrix(1:N, nrow = N, byrow = FALSE))
m <- matrix(pmin.int(rep(1:N, each = N), 1:N), nrow = N)
Finally, here is a cute one using a matrix product but it is rather slow:
x <- matrix(1, N, N)
m <- lower.tri(x, diag = TRUE) %*% upper.tri(x, diag = TRUE)
Note that a 10k-by-10k matrix for R seems big, I hope you don't run out of memory.
Related
I want to do something like :
vector <- c(runif(3),rnorm(1), runif(3), rnorm(1))
I've tried :
vector <- rep( c(runif(3), rnorm(1) ), times = 2) )
But the problem is that it's two times the same sequence.
If you can help me please.
Have a nice day
This is exactly what replicate is meant for.
From the help('replicate') page (my emphasis):
replicate is a wrapper for the common use of sapply for repeated evaluation of an expression (which will usually involve random number generation).
set.seed(1234)
vector <- replicate(2, c(runif(3),rnorm(1)))
vector
# [,1] [,2]
#[1,] 0.1137034 0.640310605
#[2,] 0.6222994 0.009495756
#[3,] 0.6092747 0.232550506
#[4,] 0.3143686 0.429124689
Edit
After the explanation in this comment, I believe the follwing is closer to what the question asks for. Note that each matrix 2x2 has the elements in the previous output in the correct order.
set.seed(1234)
W <- array(dim = c(2, 2, 2))
W[] <- replicate(2, c(runif(3), rnorm(1)))
W
#, , 1
#
# [,1] [,2]
#[1,] 0.1137034 0.6092747
#[2,] 0.6222994 0.3143686
#
#, , 2
#
# [,1] [,2]
#[1,] 0.640310605 0.2325505
#[2,] 0.009495756 0.4291247
You can do this by filling declaring the full vector first, then filling the indices for each distribution at once:
out_length = 4L * 2L
# every fourth element will come from rnorm; the rest from runif
norm_idx = seq(4L, out_length, by = 4L)
n_norm = length(norm_idx)
# declare output
out = numeric(out_length)
out[norm_idx] = rnorm(n_norm)
out[-norm_idx] = runif(out_length - n_norm)
Alternatively, here's a tricky way to accomplish this using matrix indexing:
set.seed(394839)
m = matrix(0, nrow = 4L, ncol = 2L)
m[1:3, ] = runif(3L * ncol(m))
m[4L, ] = rnorm(ncol(m))
c(m)
# [1] 0.4478556 0.1336022 0.5860134 -0.1626707 0.7055598 0.7631879 0.3132743 1.5485366
in R, matrices are just vectors with dimensions, and they filled column-by-column -- hence we can declare this matrix:
# [,1] [,2] [,3] [,4] [,5]
# [1,] 1 3 5 7 9
# [2,] 2 4 6 8 10
like this:
matrix(1:10, nrow = 2L, ncol = 5L)
with that in mind, we can replicate your 3-1-3-1 pattern by making 3-1 be the pattern within each column.
You can confirm it's working by scaling up (so small sample effects are muted):
nrep = 1e4
set.seed(39893)
m = matrix(0, nrow = 4L, ncol = nrep)
m[1:3, ] = runif(3L * nrep)
m[4L, ] = rnorm(nrep)
out = c(m)
idx = seq(4L, length(out), by = 4L)
plot(density(out[idx]), main = 'Normally distributed')
plot(density(out[-idx]), main = 'Uniformly distributed')
I am thinking about problem. How to count in R
(A-square matrix,k-any natural number) WITHOUT "for"?
If I've interpreted your notation correctly, perhaps something like this in base R...
A <- matrix(c(1,2,3,4), nrow = 2) #example matrix
k <- 10
B <- Reduce(`%*%`, (rep(list(A), k)), accumulate = TRUE) #list of A^(1:k)
BB <- lapply(1:k, function(k) B[[k]]/k) #list of A^(1:k)/k
Reduce(`+`, BB) #sum of series BB
[,1] [,2]
[1,] 603684.8 1319741
[2,] 879827.1 1923425
I have the following four equations (a,b,c,d), with several different variables (x,t,v,w,n,f). My goal would be to try and find all sets of variable values that would generate all positive (and non-zero) numbers for equations(a,b,c,d). A regular loop would just go through each number of the sequence generated and systematically check if it generates a positive value or not. I want it to pick up random numbers from each sequence and test it against the others in R.
For example (x=8, t = 2.1,v=13,w=1,n=10,f=1) is a possible set of combinations.
Please do not suggest analytically solving for these and then finding out values. These are simply representations of equations I'm dealing with. The equations I have are quite complex, and more than 15 variables.
#Equations
a <- x * t - 2*x
b <- v - x^2
c <- x - w*t - t*t
d <- (n - f)/t
x <- seq(from = 0.0001, to = 1000, by = 0.1)
t <- seq(from = 0.0001, to = 1000, by = 0.1)
v <- seq(from = 0.0001, to = 1000, by = 0.1)
w <- seq(from = 0.0001, to = 1000, by = 0.1)
n <- seq(from = 0.0001, to = 1000, by = 0.1)
f <- seq(from = 0.0001, to = 1000, by = 0.1)
For a start, it might be better to organize your equations and your probe values into lists:
set.seed(1222)
values <- list(x = x, t = t, v = v, w = w, n = n, f = f)
eqs <- list(
a = expression(x * t - 2 * x),
b = expression(v - x^2),
c = expression(x - w*t - t*t),
d = expression((n - f)/t)
)
Then we can define a number of samples to take randomly from each probe vector:
samples <- 3
values.sampled <- lapply(values, sample, samples)
$x
[1] 642.3001 563.1001 221.3001
$t
[1] 583.9001 279.0001 749.1001
$v
[1] 446.6001 106.7001 0.7001
$w
[1] 636.0001 208.8001 525.5001
$n
[1] 559.8001 28.4001 239.0001
$f
[1] 640.4001 612.5001 790.1001
We can then iterate over each stored equation, evaluating the equation within the "sampled" environment:
results <- sapply(eqs, eval, envir = values.sampled)
a b c d
[1,] 373754.5 -412102.82 -711657.5 -0.1380373
[2,] 155978.8 -316975.02 -135533.2 -2.0935476
[3,] 165333.3 -48973.03 -954581.8 -0.7356827
From there you can remove any value that is 0 or less:
results[results <= 0] <- NA
If every independent value can take on the same value (e.g. seq(from = 0.0001, to = 1000, by = 0.1)), we can approach this with much greater rigor and avoid the possibility of generating duplicates. First we create a masterFun that is essentially a wrapper for all of the functions you want to define:
masterFun <- function(y) {
## y is a vector with 6 values
## y[1] -->> x
## y[2] -->> t
## y[3] -->> v
## y[4] -->> w
## y[5] -->> n
## y[6] -->> f
fA <- function(x, t) {x * t - 2*x}
fB <- function(v, x) {v - x^2}
fC <- function(x, w, t) {x - w*t - t*t}
fD <- function(n, f, t) {(n - f)/t}
## one can easily filter out negative
## results as #jdobres has done.
c(a = fA(y[1], y[2]), b = fB(y[3], y[1]),
c = fC(y[1], y[4], y[2]), d = fD(y[5], y[6], y[2]))
}
Now, using permuteSample, which is capable of generating random permutations of a vector and subsequently applying any given user defined function to each permutation, from RcppAlgos (I am the author), we have:
## Not technically the domain, but this variable name
## is concise and very descriptive
domain <- seq(from = 0.0001, to = 1000, by = 0.1)
library(RcppAlgos)
## number of variables ... x, t, v, w, n, f
## ||
## \/
permuteSample(domain, m = 6, repetition = TRUE,
n = 3, seed = 123, FUN = masterFun)
[[1]]
a b c d
218830.316100 -608541.146040 -310624.596670 -1.415869
[[2]]
a b c d
371023.322880 -482662.278860 -731052.643620 1.132836
[[3]]
a b c d
18512.60761001 -12521.71284001 -39722.27696002 -0.09118721
In short, the underlying algorithm is capable of generating the nth lexicographical result, which allows us to apply a mapping from 1 to "# of total permutations" to the permutations themselves. For example, given the permutations of the vector 1:3:
permuteGeneral(3, 3)
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 1 3 2
[3,] 2 1 3
[4,] 2 3 1
[5,] 3 1 2
[6,] 3 2 1
We can easily generate the 2nd and the 5th permutation above without generating the first permutation or the first four permutations:
permuteSample(3, 3, sampleVec = c(2, 5))
[,1] [,2] [,3]
[1,] 1 3 2
[2,] 3 1 2
This allows us to have a more controlled and tangible grasp of our random samples as we can now think of them in a more familiar way (i.e. a random sample of numbers).
If you actually want to see which variables were used in the above calculation, we simply drop the FUN argument:
permuteSample(domain, m = 6, repetition = TRUE, n = 3, seed = 123)
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 780.7001 282.3001 951.5001 820.8001 289.1001 688.8001
[2,] 694.8001 536.0001 84.9001 829.2001 757.3001 150.1001
[3,] 114.7001 163.4001 634.4001 80.4001 327.2001 342.1001
I would like to compute the standard deviation of the nearest neighbors (3*3 moving window) of each element in a matrix. I wrote some code in R to implement it:
library(FNN)
df <- matrix(1:10000, nrow = 100, ncol = 100, byrow = TRUE)
df_ <- reshape2::melt(df)
df_index <- df_[, c(1,2)]
df_query <- df_index
neighbor_index <- knnx.index(df_index, df_query, k = 9, algorithm = 'kd_tree')
neighbor_coor<- apply(neighbor_index, 1, function(x) df_query[x, ])
neighbor_sd <- lapply(neighbor_coor, function(x) sd(df[x[, 1], x[, 2]]))
sd <- do.call(rbind, neighbor_sd)
But the speed is too slow. Would you give me some advice to speed up? Are there other ways to implement it?
As #romanlustrik proposed in his comment, we can use a raster::focal() for this problem.
library(raster)
df <- matrix(1:10000, nrow = 100, ncol = 100, byrow = TRUE)
dfR <- raster(df)
dfSD <- as.matrix(focal(dfR, w = matrix(1,3,3), fun = sd))
where, w is the a matrix representing the nearest neighbors and their weighting within fun (in this case 3x3 which is the cell itself and it 8 neighbors). Thus, any neighborhood pattern is imaginable as long as it it can be represented by a matrix.
matrix(1,3,3)
# [,1] [,2] [,3]
# [1,] 1 1 1
# [2,] 1 1 1
# [3,] 1 1 1
An example with only the 4 neighbors (excluding diagonals and the cell itself):
matrix(c(0,1,0,1,0,1,0,1,0), 3, 3)
# [,1] [,2] [,3]
# [1,] 0 1 0
# [2,] 1 0 1
# [3,] 0 1 0
I am trying to get a function that is the opposite of diff()
I want to add the values of adjacent columns in a matrix for each column in the matrix.
I do NOT need the sum of the entire column or row.
For example:
If I had:
[ 1 2 4;
3 5 8 ]
I would end up with:
[ 3 6;
8 13 ]
Of course for just one or two columns this is simple as I can just do x[,1]+x[,2], but these matrices are quite large.
I'm surprised that I cannot seem to find an efficient way to do this.
m <- matrix(c(1,3,2,5,4,8), nrow=2)
m[,-1] + m[,-ncol(m)]
[,1] [,2]
[1,] 3 6
[2,] 8 13
Or, just for the fun of it:
n <- ncol(m)
x <- suppressWarnings(matrix(c(1, 1, rep(0, n-1)),
nrow = n, ncol = n-1))
m %*% x
[,1] [,2]
[1,] 3 6
[2,] 8 13
Dummy data
mat <- matrix(sample(0:9, 100, replace = TRUE), nrow = 10)
Solution:
sum.mat <- lapply(1:(ncol(mat)-1), function(i) mat[,i] + mat[,i+1])
sum.mat <- matrix(unlist(sum.mat), byrow = FALSE, nrow = nrow(mat))
You could use:
m <- matrix(c(1,2,4,3,5,8), nrow=2, byrow=T)
sapply(2:ncol(m), function(x) m[,x] + m[,(x-1)])