(Very) amateur coder and statistician working on a problem in R.
I have four integer lists: A, B, C, D.
A <- [1:133]
B <- [1:266]
C <- [1:266]
D <- [1:133, 267-400]
I want R to generate all of the permutations from picking 1 item from each of these lists (I know this code will take forever to run), and then take the mean of each of those permutations. So, for instance, [1, 100, 200, 400] -> 175.25.
Ideally what I would have at the end is a list of all of these means then.
Any ideas?
Here's how I'd do this for a smaller but similar problem:
A <- 1:13
B <- 1:26
C <- 1:26
D <- c(1:13, 27:40)
mymat <- expand.grid(A, B, C, D)
names(mymat) <- c("A", "B", "C", "D")
mymat <- as.matrix(mymat)
mymeans <- rowSums(mymat)/4
You'll probably crash R if you just up all the indices, but you could probably set up a loop, something like this (not tested):
B <- 1:266
C <- 1:266
D <- c(1:133, 267:400)
for(A in 1:133) {
mymat <- expand.grid(A, B, C, D)
names(mymat) <- c("A", "B", "C", "D")
mymat <- as.matrix(mymat)
mymeans <- rowSums(mymat)/4
write.table(mymat, file = paste("matrix", A, "txt", sep = "."))
write.table(mymeans, file = paste("means", A, "txt", sep = "."))
rm(mymat, mymeans)
}
to get them all. That still might be too big, in which case you could do a nested loop, or loop over D (since it's the biggest)
Alternatively,
n <- 1e7
A <- sample(133, size = n, replace= TRUE)
B <- sample(266, size = n, replace= TRUE)
C <- sample(266, size = n, replace= TRUE)
D <- sample(x = c(1:133, 267:400), size = n, replace= TRUE)
mymeans <- (A+B+C+D)/4
will give you a large sample of the means and take no time at all.
hist(mymeans)
Even creating a vector of means as large as your permutations will use up all of your memory. You will have to split this into smaller problems, look up writing objects to excel and then removing objects from memory here (both on SO).
As for the code to do this, I've tried to keep it as simple as possible so that it's easy to 'grow' your knowledge:
#this is how to create vectors of sequential integers integers in R
a <- c(1:33)
b <- c(1:33)
c <- c(1:33)
d <- c(1:33,267:300)
#this is how to create an empty vector
means <- rep(NA,length(a)*length(b)*length(c)*length(d))
#set up for a loop
i <- 1
#how you run a loop to perform this operation
for(j in 1:length(a)){
for(k in 1:length(b)){
for(l in 1:length(c)){
for(m in 1:length(d)){
y <- c(a[j],b[k],c[l],d[m])
means[i] <- mean(y)
i <- i+1
}
}
}
}
#and to graph your output
hist(means, col='brown')
#lets put a mean line through the histogram
abline(v=mean(means), col='white', lwd=2)
Related
Script:
a <- c(10, 20)
b <- c(100, 200)
c <- c(50 , 1000)
d <- c(3000, 4300)
for (i in c(a,b,c,d))
{
print(prop.test(a,b))
}.
So essentially I want every 2 objects to be paired up. I hope I am somewhat clear.
You can put the vectors in a list and use a for loop as follows -
list_data <- list(a, b, c, d)
result <- vector('list', length(list_data)/2)
for(i in seq_along(result)) {
n <- (i -1) * 2 + 1
result[[i]] <- prop.test(list_data[[n]], list_data[[n+1]])
print(result[[i]])
}
If I have a function
estimator <- function(A,B) {
A*(B+23)
}
How can I reverse this function to find the value of A for B as a sequence between 0 and 120 (B=1,2,3,4,...,120) that would give a fixed result, say C = 20?
I would use it to map the values for which satisfy the equation A*(B+23)= C = 20 with B being a list b.list between 0 and 120, for c.list, of different C?
b.list <- seq(0,120,by=1)
c.list <- tibble(seq(10,32,by=2))
In the end, I would like to plot the lines of curves of the function for different C using purrr or similar.
I.e.: Given that the height of a tree in metres at age 100 will follow the function, C = A*(B+23), solve for A that will give the result C=10 when B, Age is a list of years between 0 and 120?
Here's a link showing what I'm trying to make!
Here's another one
Many thanks!
For the inverse it is a quick inversion :
A = C/(B+23)
One answer could be :
B <- seq(0, 120)
C <- seq(10, 32, 2)
A <- matrix(0,
nrow = length(B),
ncol = length(C))
for(i in 1:ncol(M)){
A[,i] <- C[i] / (B + 23)
}
matplot(B, A, type ="l", col = "black")
In case of a more complex function indeed you need an automatic solving problem. One way is to see it like an optimisation problem where you want to minimise the distance from C :
B <- seq(1, 120)
C <- seq(10, 32, 2)
A <- matrix(0,
nrow = length(B),
ncol = length(C))
fct <- function(A, B, C){
paramasi <- 25
parambeta<- 7395.6
paramb2 <- -1.7829
refB <- 100
d <- parambeta*(paramasi^paramb2)
r <- (((A-d)^2)+(4*parambeta*A*(B^paramb2)))^0.5
si_est <- (A+d+r)/ (2+(4*parambeta*(refB^paramb2)) / (A-d+r))
return(sum(si_est - C)^2)}
for(c in 1:length(C)){
for(b in 1:length(B)){
# fixe parameters + optimisation
res <- optim(par = 1, fn = fct, B = B[b], C = C[c])
A[b, c] <- res$par
}
}
matplot(B, A, type = "l", col = "black")
You need to be careful because in your case I think that you could find an analytical formula for the inverse which would be better.
Good luck !
I know I can use expand.grid for this, but I am trying to learn actual programming. My goal is to take what I have below and use a recursion to get all 2^n binary sequences of length n.
I can do this for n = 1, but I don't understand how I would use the same function in a recursive way to get the answer for higher dimensions.
Here is for n = 1:
binseq <- function(n){
binmat <- matrix(nrow = 2^n, ncol = n)
r <- 0 #row counter
for (i in 0:1) {
r <- r + 1
binmat[r,] <- i
}
return(binmat)
}
I know I have to use probably a cbind in the return statement. My intuition says the return statement should be something like cbind(binseq(n-1), binseq(n)). But, honestly, I'm completely lost at this point.
The desired output should basically recursively produce this for n = 3:
binmat <- matrix(nrow = 8, ncol = 3)
r <- 0 # current row of binmat
for (i in 0:1) {
for (j in 0:1) {
for (k in 0:1) {
r <- r + 1
binmat[r,] <- c(i, j, k)}
}
}
binmat
It should just be a matrix as binmat is being filled recursively.
I quickly wrote this function to generate all N^K permutations of length K for given N characters. Hope it will be useful.
gen_perm <- function(str=c(""), lst=5, levels = c("0", "1", "2")){
if (nchar(str) == lst){
cat(str, "\n")
return(invisible(NULL))
}
for (i in levels){
gen_perm(str = paste0(str,i), lst=lst, levels=levels)
}
}
# sample call
gen_perm(lst = 3, levels = c("x", "T", "a"))
I will return to your problem when I get more time.
UPDATE
I modified the code above to work for your problem. Note that the matrix being populated lives in the global environment. The function also uses the tmp variable to pass rows to the global environment. This was the easiest way for me to solve the problem. Perhaps, there are other ways.
levels <- c(0,1)
nc <- 3
m <- matrix(numeric(0), ncol = nc)
gen_perm <- function(row=numeric(), lst=nc, levels = levels){
if (length(row) == lst){
assign("tmp", row, .GlobalEnv)
with(.GlobalEnv, {m <- rbind(m, tmp); rownames(m) <- NULL})
return(invisible(NULL))
}
for (i in levels){
gen_perm(row=c(row,i), lst=lst, levels=levels)
}
}
gen_perm(lst=nc, levels=levels)
UPDATE 2
To get the expected output you provided, run
m <- matrix(numeric(0), ncol = 3)
gen_perm(lst = 3, levels = c(0,1))
m
levels specifies a range of values to generate (binary in our case) to generate permutations, m is an empty matrix to fill up, gen_perm generates rows and adds them to the matrix m, lst is a length of the permutation (matches the number of columns in the matrix).
I am trying to find a way to efficiently use permutations of a vector in R without blowing up my computer. This is what I am trying to do:
n = 3 # I would need n>1000 instead, this is just to show what I am trying to achieve
t = 3
library(gtools)
m <- permutations(n = n, r = t, repeats.allowed = F, v = 1:n)
mm <- as.numeric(m)
df = data.frame()
for (i in 1:nrow(m)) {
mat <- matrix(0, nrow = ncol(m), ncol = n)
idx = m[i,]
mat[cbind(seq_along(idx), idx)] = 1
df = rbind(df, mat)
}
However using permutations, it is too time/memory consuming to work with large n (e.g. >1000). It looks like using "sample" is a great solution (proposed here):
v = 1:n
N <- t(replicate(length(v)^4, sample(v, t)))
# compare with: permutations(n = n, r = t, repeats.allowed = F, v = 1:n)
sum(duplicated(N))
m <- N[!(duplicated(N)), ] # then continue with code above
However, I am still unsure about the number of samples to take to be sure to cover all the possibilities. Does anybody have any ideas on number of samples, or how to make sure that all possibilities are covered? Thank you!
I've got a dataset where each column has 4 binary variables. How do i create 4 x 4 grid with the tally of each pair combination of the variables?
Here's an example data frame:
Person <- c("Bob", "Jim", "Sarah", "Dave")
A <- c(1,0,1,1)
B <- c(1,1,1,0)
C <- c(0,0,0,1)
D <- c(1,0,0,0)
So in the 4x4 grid, the intersection of A and B would have a 2 because Bob and Sarah have 1 for A and B.
For two vectors A and B it will be a cross product:
res <- A %*% B
or
res <- crossprod(A, B)
to make a matrix of all combinations use two level for or apply:
data <- list(A,B,C,D)
res <- matrix(NA, nrow = n, ncol = m, dimnames = dimnames(product.m))
for(i in 1:n) {
for(j in 1:i) {
res[i,j] <- crossprod(data[[i]], data[[j]])
}
}
Here I fill only one half of the matrix. You then can copy the values across like this:
res[upper.tri(res)] <- t(res)[upper.tri(res)]