Related
Just some smaller changes which do not need to be considered.
This for loop may be helpful.
1. Run all of your codes
s <- 60000
t <- 20
mu <- function(x, t) {
A <- .00022
B <- 2.7*10^(-6)
c <- 1.124
mutemp <- A + B*c^(x + t)
out <- ifelse(t <= 2, 0.9^(2 - t)*mutemp, mutemp)
out}
f <- function(x) (s - x - 0.05*(0.04*x + 1810.726 - mu(40, t)*(s - x)))
2. Run the for loop below for iteration
2.1 Predefine the length of the outcome. In your case is 400 (t/0.05 = 400).
output <- vector(mode = "numeric", length = t/0.05)
2.2 Run through the for loop from 1 to 400. Save each uniroot result to step 2.1, and then reassign both s and t accordingly.
for (i in 1:400) {
output[i] <- uniroot(f, lower=0.1, upper=100000000)$root
s <- output[i]
t <- 20 - i * 0.05
}
3. Inspect the result
output
Hope this is helpful.
You could use vapply on a defined t sequence.
s <- 6e4
tseq <- seq.int(19.95, 0, -.05)
x <- vapply(tseq, \(t) {
s <<- uniroot(\(x) (s - x - 0.05*(0.04*x + 1810.726 - mu(40, t)*(s - x))), lower=0.1, upper=100000000)$root
}, numeric(1L))
Note, that <<- changes s in the global environment, and at the end gets the last value.
s
# [1] 2072.275
res <- cbind(t=tseq, x)
head(res)
# t x
# [1,] 19.95 59789.92
# [2,] 19.90 59580.25
# [3,] 19.85 59371.01
# [4,] 19.80 59162.18
# [5,] 19.75 58953.77
# [6,] 19.70 58745.77
I am trying to fill a matrix with specific values based on a different matrix.
I created an empty matrix with:
n <- 64 #length of matrix
vector_walk<- matrix(0, ncol = 1, nrow = n)
Then I filled that matrix from this Distribution
n <- 100000 #How many observations in the distribution
mu <- 180 #Mean vector for distribution
kappa <- 1.5 #How wide the distribution is
mu <- circular(mu, units = c("degrees"),
modulo = c("2pi"),
template = ("geographics"))
Dist <- rvonmises(n, mu, kappa, control.circular=list(units="degrees"))
Filling the matrix
for (i in 1:n){
vector[i] = sample(Dist, 1)
}
Now here is where I am stuck. I want to fill a new matrix "grid_walk" with 2 columns and conditionally fill it with values based on the "vector_walk" matrix. I want to put a 0 in the first column and a 1 in the second whenever the "vector matrix has a value greater than 0 and less than 23 and a few others listed below. I am trying this for loop with if statements
for (j in 1:2){
for(i in 1:n){
if(vector_walk[i] > 0 & vector_walk[i] < 23) grid_walk[i,j] <- c(0,1)
if(vector_walk[i] > 23 & vector_walk[i] < 68) grid_walk[i,j] <- c(1,1)
if(vector_walk[i] > 68 & vector_walk[i] < 113) grid_walk[i,j] <- c(1,0)
if(vector_walk[i] > 113 & vector_walk[i] < 158) grid_walk[i,j] <- c(1,-1)
if(vector_walk[i] > 158 & vector_walk[i] < 203) grid_walk[i,j] <- c(0,-1)
if(vector_walk[i] > 203 & vector_walk[i] < 248) grid_walk[i,j] <- c(-1,-1)
if(vector_walk[i] > 248 & vector_walk[i] < 293) grid_walk[i,j] <- c(-1,0)
if(vector_walk[i] > 293 & vector_walk[i] < 338) grid_walk[i,j] <- c(-1,1)
if(vector_walk[i] > 338 & vector_walk[i] < 361) grid_walk[i,j] <- c(0,1)
}
}
Any help would be appreciated as I have never used if or for statements in R
I would implement this one column at a time, because for loops are (generally) slow in R. We start by creating a matrix of zeros, and replacing values if they fall within the conditions.
library(Rfast)
library(circular)
n <- 100000 #How many observations in the distribution
mu <- 180 #Mean vector for distribution
kappa <- 1.5 #How wide the distribution is
mu <- circular(mu, units = c("degrees"),
modulo = c("2pi"),
template = ("geographics"))
Dist <- rvonmises(n, mu, kappa, control.circular=list(units="degrees"))
# Equivalent to your current "Filling the matrix" step but avoids the loop
vector_walk <- sample(Dist)
# Start by filling everything with zeros
grid_walk <- matrix(numeric(1), nrow = length(vector_walk), ncol = 2)
# If vector_walk is between 23 and 158, replace the first column's default zero with a 1
grid_walk[vector_walk>23&vector_walk<158, 1] <- 1
# If vector_walk is between 203 and 338, replace the first column's default zero with a -1
grid_walk[vector_walk>203&vector_walk<338, 1] <- -1
# Repeat for the second column
grid_walk[(vector_walk>0&vector_walk<68)|(vector_walk>293&vector_walk<361), 2] <- 1
grid_walk[vector_walk>113&vector_walk<248, 2] <- -1
I have a matrix of 1s and 0s where the rows are individuals and the columns are events. A 1 indicates that an event happened to an individual and a 0 that it did not.
I want to find which set of (in the example) 5 columns/events that cover the most rows/individuals.
Test Data
#Make test data
set.seed(123)
d <- sapply(1:300, function(x) sample(c(0,1), 30, T, c(0.9,0.1)))
colnames(d) <- 1:300
rownames(d) <- 1:30
My attempt
My initial attempt was just based on combining the set of 5 columns with the highest colMeans:
#Get top 5 columns with highest row coverage
col_set <- head(sort(colMeans(d), decreasing = T), 5)
#Have a look the set
col_set
>
197 199 59 80 76
0.2666667 0.2666667 0.2333333 0.2333333 0.2000000
#Check row coverage of the column set
sum(apply(d[,colnames(d) %in% names(col_set)], 1, sum) > 0) / 30 #top 5
>
[1] 0.7
However this set does not cover the most rows. I tested this by pseudo-random sampling 10.000 different sets of 5 columns, and then finding the set with the highest coverage:
#Get 5 random columns using colMeans as prob in sample
##Random sample 10.000 times
set.seed(123)
result <- lapply(1:10000, function(x){
col_set2 <- sample(colMeans(d), 5, F, colMeans(d))
cover <- sum(apply(d[,colnames(d) %in% names(col_set2)], 1, sum) > 0) / 30 #random 5
list(set = col_set2, cover = cover)
})
##Have a look at the best set
result[which.max(sapply(result, function(x) x[["cover"]]))]
>
[[1]]
[[1]]$set
59 169 262 68 197
0.23333333 0.10000000 0.06666667 0.16666667 0.26666667
[[1]]$cover
[1] 0.7666667
The reason for supplying the colMeans to sample is that the columns with the highest coverages are the ones I am most interested in.
So, using pseudo-random sampling I can collect a set of columns with higher coverage than when just using the top 5 columns. However, since my actual data sets are larger than the example I am looking for a more efficient and rational way of finding the set of columns with the highest coverage.
EDIT
For the interested, I decided to microbenchmark the 3 solutions provided:
#Defining G. Grothendieck's coverage funciton outside his solutions
coverage <- function(ix) sum(rowSums(d[, ix]) > 0) / 30
#G. Grothendieck top solution
solution1 <- function(d){
cols <- tail(as.numeric(names(sort(colSums(d)))), 20)
co <- combn(cols, 5)
itop <- which.max(apply(co, 2, coverage))
co[, itop]
}
#G. Grothendieck "Older solution"
solution2 <- function(d){
require(lpSolve)
ones <- rep(1, 300)
res <- lp("max", colSums(d), t(ones), "<=", 5, all.bin = TRUE, num.bin.solns = 10)
m <- matrix(res$solution[1:3000] == 1, 300)
cols <- which(rowSums(m) > 0)
co <- combn(cols, 5)
itop <- which.max(apply(co, 2, coverage))
co[, itop]
}
#user2554330 solution
bestCols <- function(d, n = 5) {
result <- numeric(n)
for (i in seq_len(n)) {
result[i] <- which.max(colMeans(d))
d <- d[d[,result[i]] != 1,, drop = FALSE]
}
result
}
#Benchmarking...
microbenchmark::microbenchmark(solution1 = solution1(d),
solution2 = solution2(d),
solution3 = bestCols(d), times = 10)
>
Unit: microseconds
expr min lq mean median uq max neval
solution1 390811.850 497155.887 549314.385 578686.3475 607291.286 651093.16 10
solution2 55252.890 71492.781 84613.301 84811.7210 93916.544 117451.35 10
solution3 425.922 517.843 3087.758 589.3145 641.551 25742.11 10
This looks like a relatively hard optimization problem, because of the ways columns interact. An approximate strategy would be to pick the column with the highest mean; then delete the rows with ones in that column, and repeat. You won't necessarily find the best solution this way, but you should get a fairly good one.
For example,
set.seed(123)
d <- sapply(1:300, function(x) sample(c(0,1), 30, T, c(0.9,0.1)))
colnames(d) <- 1:300
rownames(d) <- 1:30
bestCols <- function(d, n = 5) {
result <- numeric(n)
for (i in seq_len(n)) {
result[i] <- which.max(colMeans(d))
d <- d[d[,result[i]] != 1,, drop = FALSE]
}
cat("final dim is ", dim(d))
result
}
col_set <- bestCols(d)
sum(apply(d[,colnames(d) %in% col_set], 1, sum) > 0) / 30 #top 5
This gives 90% coverage.
The following provides a heuristic to find an approximate solution. Find the N=20 columns, say, with the most ones, cols, and then use brute force to find every subset of 5 columns out of those 20. The subset having the highest coverage is shown below and its coverage is 93.3%.
coverage <- function(ix) sum(rowSums(d[, ix]) > 0) / 30
N <- 20
cols <- tail(as.numeric(names(sort(colSums(d)))), N)
co <- combn(cols, 5)
itop <- which.max(apply(co, 2, coverage))
co[, itop]
## [1] 90 123 197 199 286
coverage(co[, itop])
## [1] 0.9333333
Repeating this for N=5, 10, 15 and 20 we get coverages of 83.3%, 86.7%, 90% and 93.3%. The higher the N the better the coverage but the lower the N the less the run time.
Older solution
We can approximate the problem with a knapsack problem that chooses the 5 columns with largest numbers of ones using integer linear programming.
We get the 10 best solutions to this approximate problem, get all columns which are in at least one of the 10 solutions. There are 14 such columns and we then use brute force to find which subset of 5 of the 14 columns has highest coverage.
library(lpSolve)
ones <- rep(1, 300)
res <- lp("max", colSums(d), t(ones), "<=", 5, all.bin = TRUE, num.bin.solns = 10)
coverage <- function(ix) sum(rowSums(d[, ix]) > 0) / 30
# each column of m is logical 300-vector defining possible soln
m <- matrix(res$solution[1:3000] == 1, 300)
# cols is the set of columns which are in any of the 10 solutions
cols <- which(rowSums(m) > 0)
length(cols)
## [1] 14
# use brute force to find the 5 best columns among cols
co <- combn(cols, 5)
itop <- which.max(apply(co, 2, coverage))
co[, itop]
## [1] 90 123 197 199 286
coverage(co[, itop])
## [1] 0.9333333
You can try to test if there is a better column and exchange this with the one currently in the selection.
n <- 5 #Number of columns / events
i <- rep(1, n)
for(k in 1:10) { #How many times itterate
tt <- i
for(j in seq_along(i)) {
x <- +(rowSums(d[,i[-j]]) > 0)
i[j] <- which.max(colSums(x == 0 & d == 1))
}
if(identical(tt, i)) break
}
sort(i)
#[1] 90 123 197 199 286
mean(rowSums(d[,i]) > 0)
#[1] 0.9333333
Taking into account, that the initial condition influences the result you can take random starts.
n <- 5 #Number of columns / events
x <- apply(d, 2, function(x) colSums(x == 0 & d == 1))
diag(x) <- -1
idx <- which(!apply(x==0, 1, any))
x <- apply(d, 2, function(x) colSums(x != d))
diag(x) <- -1
x[upper.tri(x)] <- -1
idx <- unname(c(idx, which(apply(x==0, 1, any))))
res <- sample(idx, n)
for(l in 1:100) {
i <- sample(idx, n)
for(k in 1:10) { #How many times itterate
tt <- i
for(j in seq_along(i)) {
x <- +(rowSums(d[,i[-j]]) > 0)
i[j] <- which.max(colSums(x == 0 & d == 1))
}
if(identical(tt, i)) break
}
if(sum(rowSums(d[,i]) > 0) > sum(rowSums(d[,res]) > 0)) res <- i
}
sort(res)
#[1] 90 123 197 199 286
mean(rowSums(d[,res]) > 0)
#[1] 0.9333333
I am trying to calculate the square of the first k elements of a vector v.
If k is greater than the length of the vector n, then set k = n.
myfun1 = function(v,k){
p = ifelse(k > length(v), k = length(v), v[1:k]^2)
return(p)
}
myfun1(v=seq(1,20,by=0.5), k=10)
When I run the code it gives me this error:
Error in ifelse(k > length(v), k = length(v), v[1:k]^2) :
unused argument (k = length(v))
Can anyone help to resolve this?
k = length(v) inside your ifelse does not make sense. Here's a simpler approach -
myfun1 <- function(v, k) {
v[1:min(k, length(v))]^2
}
myfun1(1:10, 5)
[1] 1 4 9 16 25
myfun1(1:10, 15)
[1] 1 4 9 16 25 36 49 64 81 100
Thanks to #josliber, here's another simple way to do this -
myfun1 <- function(v, k) {
head(v, k)^2
}
How would I efficiently go about taking a 1-by-1 ascending random sample of the values 1:n, making sure that each of the randomly sampled values is always higher than
the previous value?
e.g.:
For the values 1:100, get a random number, say which is 61. (current list=61)
Then pick another number between 62 and 100, say which is 90 (current list=61,90)
Then pick another number between 91 and 100, say which is 100.
Stop the process as the max value has been hit (final list=61,90,100)
I have been stuck in loop land, thinking in this clunky manner:
a1 <- sample(1:100,1)
if(a1 < 100) {
a2 <- sample((a+1):100,1)
}
etc etc...
I want to report a final vector being the concatenation of a1,a2,a(n):
result <- c(a1,a2)
Even though this sounds like a homework question, it is not. I thankfully left the days of homework many years ago.
Coming late to the party, but I think this is gonna rock your world:
unique(cummax(sample.int(100)))
This uses a while loop and is wrapped in a function
# from ?sample
resample <- function(x, ...) x[sample.int(length(x), ...)]
sample_z <- function(n){
z <- numeric(n)
new <- 0
count <- 1
while(new < n){
from <- seq(new+1,n,by=1)
new <- resample(from, size= 1)
z[count] <- new
if(new < n) count <- count+1
}
z[1:count]
}
set.seed(1234)
sample_z(100)
## [1] 12 67 88 96 100
Edit
note the change to deal with when the new sample is 100 and the way sample deals with an integer as opposed to a vector for x
Edit 2
Actually reading the help for sample gave the useful resample function. Which avoids the pitfalls when length(x) == 1
Not particularly efficient but:
X <- 0
samps <- c()
while (X < 100) {
if(is.null(samps)) {z <- 1 } else {z <- 1 + samps[length(samps)]}
if (z == 100) {
samps <- c(samps, z)
} else {
samps <- c(samps, sample(z:100, 1))
}
X <- samps[length(samps)]
}
samps
EDIT: Trimming a little fat from it:
samps <- c()
while (is.null(samps[length(samps)]) || samps[length(samps)] < 100 ) {
if(is.null(samps)) {z <- 1 } else {z <- 1 + samps[length(samps)]}
if (z == 100) {
samps <- c(samps, z)
} else {
samps <- c(samps, sample(z:100, 1))
}
}
samps
even later to the party, but just for kicks:
X <- Y <- sample(100L)
while(length(X <- Y) != length(Y <- X[c(TRUE, diff(X)>0)])) {}
> print(X)
[1] 28 44 60 98 100
Sorting Random Vectors
Create a vector of random integers and sort it afterwards.
sort(sample(1:1000, size = 10, replace = FALSE),decreasing = FALSE)
Gives 10 random Integers between 1 and 1000.
> sort(sample(1:1000, size = 10, replace = FALSE),decreasing = FALSE)
[1] 44 88 164 314 617 814 845 917 944 995
This of course also works with random decimals and floats.