I have written the following code which should generate an output matrix of 1 row and 10 columns. This did work properly before I corrected my loop by replacing nchrom labels with q, but now generates the following error code:
Error in rmultinom(1, size * q/2, prob = c(num_chrom)) :
no positive probabilities
If anyone can help me identify where the problem in the code is, I would be highly appreciative. My code at the moment is as follows:
randomdiv <- function(nchrom, ndivs, size) {
chrom <- matrix(nrow = 1, ncol = ndivs)
{q <- nchrom
for (i in 1:ndivs)
{
{sz <- matrix(nrow = nchrom, ncol = ndivs)
for (j in 1:nchrom) {
n <- size
for (i in 1:ndivs)
{
old_subs <- rbinom (1, n, 0.5) #roughly halving the number of subunits per chromosome, representing segregation of chromosomes
num_chrom <- rep(1 / q, q) #vector to determine probabilities for multinomial - based on number of chromosomes per cell
new_subs <- rmultinom(1, size * q / 2, prob = c(num_chrom)) #multinomial to generate randomness in number of new subunits translated per cell (based on ideal being half of the total subunit pool)
total_subs <- cbind(old_subs, new_subs) #required step to allow ifelse function to properly work on individual rows
m <- as.matrix(ifelse(total_subs[,1]>0, total_subs[,1] + total_subs[,2], total_subs[,1])) #ifelse function to ensure that if a chromosome reaches 0 subunits, there will be no new subunits added to that chromosom
zeros <- colSums(m==0) #calculates number of zeros in the columns of m and will form a vector (2 values will be shown - only interested in the first for m[1,1])
k <- c(-1, 1)
s <- sample(k, zeros[1], replace = TRUE) #random samples taken from -1 and - number of samples is equal to the number of zeros that have occurred
new_nchrom <- q + sum(s) #Sum of samples determines the number of chromosomes to add or remove from the cell (random element)
chrom[,i] <- new_nchrom #Inserts new number of chromosomes into the matrix for output
q <- new_nchrom
sz[j,i] <- m[1,1] #puts in m matrix as the next column in sz matrix - need to keep a matrix of subunit numbers because the number of subunits reaching 0 determines changes in chromosome number
n <- m
}
}
}
}
}
return (chrom)
}
>randomdiv(10, 10, 3)
Related
I want to solve the optimazation problem to search best weights for groups of vectors. Would you like to give some suggestions about how to solve it by R? Thanks very much.
The problem is as follows.
Given there are N groups, we know their similarity matrix among these N groups. The dimension of S is N*N.
In each group, there are K vectors . There are M elements in each vector which value is 0 or 1. .
we can fit an average vector based on these K vectors. For example, average vector
Based on these avearge vectors in each group, we could calculate the correlation among these avearge vectors.
The object is to minimize the differene between correlation matrix C and known similarity matrix S.
Beacuse you didn't provide any data I will generate random and demonstrate way you can approach your problem.
Similarity matrix:
N <- 6
S <- matrix(runif(N^2, -1, 1), ncol = N, nrow = N)
similarity_matrix <- (S + t(S)) / 2
N is number of groups. Each value of similarity matrix is between -1 and 1 and matrix is symmetric (beacuse you want to compare it to covariance matrix these makes sense).
group vectors:
M <- 10
K <- 8
group_vectors <- replicate(N, replicate(K, sample(c(0, 1), M, TRUE)), FALSE)
M is dimension of vector and K is number of binary vectors in each group.
fitness function
fitness <- function(W, group_vectors, similarity_matrix){
W <- as.data.frame(matrix(W, nrow = K, ncol = N))
SS <- cov(
mapply(function(x,y) rowSums(sweep(x, 2, y, "*")), group_vectors, W)
)
sum(abs(SS - similarity_matrix))
}
fitness for given weights calculates described covariance matrix and its distance from similarity_matrix.
differential evolution approach
res <- DEoptim::DEoptim(
fn = fitness,
lower = rep(-1, K*N),
upper = rep(1, K*N),
group_vectors = group_vectors,
similarity_matrix = similarity_matrix,
control = DEoptim::DEoptim.control(VTR = 0, itermax = 1000, trace = 50, NP = 100)
)
W <- matrix(res$optim$bestmem, nrow = K, ncol = N)
genetic algorithm approach
res <- GA::ga(
type = "real-valued",
fitness = function(W, ...) -fitness(W, ...),
lower = rep(-1, K*N),
upper = rep(1, K*N),
group_vectors = group_vectors,
similarity_matrix = similarity_matrix,
maxiter = 10000,
run = 200
)
W <- matrix(res#solution[1,], nrow = K, ncol = N)
I am trying to write a code to solve the following problem (As stated in HW5 in the CalTech course Learning from Data):
In this problem you will create your own target function f
(probability in this case) and data set D to see how Logistic
Regression works. For simplicity, we will take f to be a 0=1
probability so y is a deterministic function of x. Take d = 2 so you
can visualize the problem, and let X = [-1; 1]×[-1; 1] with uniform
probability of picking each x 2 X . Choose a line in the plane as the
boundary between f(x) = 1 (where y has to be +1) and f(x) = 0 (where y
has to be -1) by taking two random, uniformly distributed points from
X and taking the line passing through them as the boundary between y =
±1. Pick N = 100 training points at random from X , and evaluate the
outputs yn for each of these points xn. Run Logistic Regression with
Stochastic Gradient Descent to find g, and estimate Eout(the cross
entropy error) by generating a sufficiently large, separate set of
points to evaluate the error. Repeat the experiment for 100 runs with
different targets and take the average. Initialize the weight vector
of Logistic Regression to all zeros in each run. Stop the algorithm
when |w(t-1) - w(t)| < 0:01, where w(t) denotes the weight vector at
the end of epoch t. An epoch is a full pass through the N data points
(use a random permutation of 1; 2; · · · ; N to present the data
points to the algorithm within each epoch, and use different
permutations for different epochs). Use a learning rate of 0.01.
I am required to calculate the nearest value to Eout for N=100, and the average number of epochs for the required criterion.
I wrote and ran the code but I'm not getting the right answers (as stated in the solutions, these are Eout is near 0.1 and the number of epochs is near 350). The required number of epochs for a delta w of 0.01 comes to far too small (around 10), leaving the error too big (around 2). I then tried to replace the criterion with |w(t-1) - w(t)| < 0.001 (rather than 0.01). Then, the average required number of epochs was about 250 and out of sample error was about 0.35.
Is there something wrong with my code/solution, or is it possible that the answers provided are faulty? I've added comments to indicate what I intend to do at each step. Thanks in advance.
library(pracma)
h<- 0 # h will later be updated to number of required epochs
p<- 0 # p will later be updated to Eout
C <- matrix(ncol=10000, nrow=2) # Testing set, used to calculate out of sample error
d <- matrix(ncol=10000, nrow=1)
for(i in 1:10000){
C[, i] <- c(runif(2, min = -1, max = 1)) # Sample data
d[1, i] <- sign(C[2, i] - f(C[1, i]))
}
for(g in 1:100){ # 100 runs of the experiment
x <- runif(2, min = -1, max = 1)
y <- runif(2, min = -1, max = 1)
fit = (lm(y~x))
t <- summary(fit)$coefficients[,1]
f <- function(x){ # Target function
t[2]*x + t[1]
}
A <- matrix(ncol=100, nrow=2) # Sample data
b <- matrix(ncol=100, nrow=1)
norm_vec <- function(x) {sqrt(sum(x^2))} # vector norm calculator
w <- c(0,0) # weights initialized to zero
for(i in 1:100){
A[, i] <- c(runif(2, min = -1, max = 1)) # Sample data
b[1, i] <- sign(A[2, i] - f(A[1, i]))
}
q <- matrix(nrow = 2, ncol = 1000) # q tracks the weight vector at the end of each epoch
l= 1
while(l < 1001){
E <- function(z){ # cross entropy error function
x = z[1]
y = z[2]
v = z[3]
return(log(1 + exp(-v*t(w)%*%c(x, y))))
}
err <- function(xn1, xn2, yn){ #gradient of error function
return(c(-yn*xn1, -yn*xn2)*(exp(-yn*t(w)*c(xn1,xn2))/(1+exp(-yn*t(w)*c(xn1,xn2)))))
}
e = matrix(nrow = 2, ncol = 100) # e will track the required gradient at each data point
e[,1:100] = 0
perm = sample(100, 100, replace = FALSE, prob = NULL) # Random permutation of the data indices
for(j in 1:100){ # One complete Epoch
r = A[,perm[j]] # pick the perm[j]th entry in A
s = b[perm[j]] # pick the perm[j]th entry in b
e[,perm[j]] = err(r[1], r[2], s) # Gradient of the error
w = w - 0.01*e[,perm[j]] # update the weight vector accorng to the formula involving step size, gradient
}
q[,l] = w # the lth entry is the weight vector at the end of the lth epoch
if(l > 1 & norm_vec(q[,l] - q[,l-1])<0.001){ # given criterion to terminate the algorithm
break
}
l = l+1 # move to the next epoch
}
for(n in 1:10000){
p[g] = mean(E(c(C[1,n], C[2, n], d[n]))) # average over 10000 data points, of the error function, in experiment no. g
}
h[g] = l #gth entry in the vector h, tracks the number of epochs in the gth iteration of the experiment
}
mean(h) # Mean number of epochs needed
mean(p) # average Eout, over 100 experiments
Is there a way to assign vector elements to multiple subarrays in R, using sample() or split() (or a combination of both functions)?
Essentially what I need is a function that randomly assigns values to multiple subarrays
Here's my full specific code:
K <- 2 # number of subarrays
N <- 100
Hstar <- 10
perms <- 10000
probs <- rep(1/Hstar, Hstar)
K1 <- c(1:5)
K2 <- c(6:10)
specs <- 1:N
pop <- array(dim = c(c(perms, N), K))
haps <- as.character(1:Hstar)
for(j in 1:perms){
for(i in 1:K){
if(i == 1){
pop[j, specs, i] <- sample(haps, size = N, replace = TRUE, prob = probs)
} else{
pop[j,, 1] <- sample(haps[K1], size = N, replace = TRUE, prob = probs[K1])
pop[j,, 2] <- sample(haps[K2], size = N, replace = TRUE, prob = probs[K2])
}
}
}
pop[j,, 1] is the first subarray in pop, while pop[j,, 2] is the second subarray in pop
If I have 20 subarrays, using sample() 20 times is tedious. I just want a way to assign values to the any number of subarrays quickly and easily.
Any ideas?
It depends whether you want replacement (the possibility of duplicate/omitted elements). Regardless, it's a one liner
sample(x,length(x),replace=FALSE)
Not 100% clear on the whole multiple subarray thing, but my approach would be something like:
num.intervals<-5
interval.size<-length(x)/5 #need to make sure this is evenly divisible I suppose
arr.master<-rep(NA,0)
for (i in 1:num.intervals){
arr.master<-rbind(arr.mater,sample(x,interval.size,replace=TRUE)
}
Basically, just take samples and keep mashing them together? Would this accomplish your goal?
Do you want to have the sum of num_elements of all subarrays equal to num_elements in the original array? If so, then it's just a random sorting problem (really easy) and then cut it up after into any number of subarrays. If not, then you could fix the number of elems in all subarrays in advance; randomly sample from original a new vector of this size; then partition it into arbitrary subarrays.
Having the following matrix and vector:
a<-matrix(c(1,4,7,
2,5,8,
3,6,9), nrow = 3)
b <- c(1,1,1)
How do I sum recursiverly over each line of the matrix inside a funciton till obtain a desired result using last result to calculate next operation as shown:
b<-b+a[1,]
b<-b+a[2,]
b<-b+a[3,]
b<-b+a[1,]
b<-b+a[2,]
sum(b)>100 # Sum recursiverly till obtain this result sum(b)>100
This operation looks similar to this answer Multiply recursiverly in r. However it uses results from previews operations to calculate next ones.
Here's a recursive function to do what you're after,
# Sample Data
a<-matrix(c(1,4,7,
2,5,8,
3,6,9), nrow = 3)
b <- c(1,1,1)
We create a function that references itself with a value that increments modulo the number of rows
recAdd <- function(b, a, start = 1, size = NROW(a)) {
if(sum(b) > 100) return(b)
return(recAdd(b + a[start,], a, start = start %% size + 1, size))
}
> recAdd(b,a)
[1] 30 38 46
EDIT: Alternatively, here's a way with no recursion at all, which is much faster on large ratios of target number to sum of the matrix (but is slower on data of this size). Basically we get to take advantage of Euclid
nonrecAdd <- function(b, a, target = 100) {
Remaining <- target - sum(b)
perloop <- sum(a)
nloops <- Remaining %/% perloop
Remaining <- Remaining %% perloop
if(Remaining > 0) {
cumulativeRowsums <- cumsum(rowSums(a))
finalindex <- which((Remaining %/% cumulativeRowsums) == 0)[1]
b + colSums(a) * nloops + colSums(a[1:finalindex,,drop = FALSE])
} else {
b + colSums(a) * nloops
}
}
When I run
weights <- 1:50
profits <- 1:50
library(adagio)
knapsack(w = weights, p = profits, cap = 30)
I get the error
Error in F[, k] <- G :
number of items to replace is not a multiple of replacement length
In addition: Warning message:
In pmax(G, H) : an argument will be fractionally recycled
but when I run smaller sized vectors, like
weights <- 1:20
profits <- 1:20
knapsack(w = weights, p = profits, cap = 30)
it runs fine. Does knapsack() just slow down (and prevent running) for larger sets? I'm looking to use lengths in the thousands eventually.
This is an issue with passing elements with weight exceeding the total capacity. To see the issue, let's look at the first few lines of the knapsack function:
function (w, p, cap)
{
n <- length(w)
x <- logical(n)
F <- matrix(0, nrow = cap + 1, ncol = n)
G <- matrix(0, nrow = cap + 1, ncol = 1)
for (k in 1:n) {
F[, k] <- G
H <- c(numeric(w[k]), G[1:(cap + 1 - w[k]), 1] + p[k])
G <- pmax(G, H)
}
When iteratively filling the F matrix one column at a time, the algorithm creates a vector H with the following command (and then immediately computing pmax(G, H)):
H <- c(numeric(w[k]), G[1:(cap + 1 - w[k]), 1] + p[k])
numeric(w[k]) has length w[k], and when w[k] <= cap, G[1:(cap + 1 - w[k]), 1] + p[k] has length cap + 1 - w[k], meaning the entire vector H has length cap+1, matching the size of G. On the other hand, when w[k] == cap + 1 we will end up with an H vector of size cap+2, which doesn't match the size of G and gives us trouble, and with w[k] > cap + 1 we will get an error for mixing positive and negative indices.
Getting back to your example function call, you have weights up to 50 but only a capacity of 30, yielding an error:
weights <- 1:50
profits <- 1:50
knapsack(w = weights, p = profits, cap = 30)
# Error in F[, k] <- G :
# number of items to replace is not a multiple of replacement length
# In addition: Warning message:
# In pmax(G, H) : an argument will be fractionally recycled
However when you limit to elements with weight not exceeding the capacity, you get no errors:
knapsack(w = weights[weights <= 30], p = profits[weights <= 30], cap = 30)
# $capacity
# [1] 30
#
# $profit
# [1] 30
#
# $indices
# [1] 1 2 3 4 5 7 8
It would be most ideal if the knapsack function gracefully removed any object with weight exceeding the capacity (since no such elements could ever be used in a feasible solution) and gave you a solution for the code you posted, but as a workaround you could simply remove them yourself from the input to the knapsack function.
I received the same error (which is how I got this SO post..) I think that the adagio knapsack function doesn't like profits or weights that are fractional values. I used rnorm() to generate profits and weights, to compare their results to another knapsack function that I personally wrote. Even with a capacity that was several times larger than all the weights put together, I was getting the 'recycling' error. However when I rounded off the rnorm() vectors before passing them as arguments to knapsack, no problems.