I'm computing ft for values of k, and then storing them in the list funcList. Now I want to store Xt as a function of t as the sum of all elements in funclist. This is what I've done:
n = 100
funcList = list()
ft = function(t) {(abs(t) <= 1)*(1-(t)^2)+(abs(t) > 1)*(0)}
for (k in 1:100){
funcList[[k]] = ft(t+k/n)*rnorm(100,0,1)
}
Xt = rowSums(sapply(funcList, ?)
However, I'm not sure at all how I should express Xt here and how to use the functions rowSums() and sapply().
Realy not sure what you try to do.
Your code throws an error, since in your for loop there is no value defined for t.
Are you tring to do something like this:
n = 100
funcList = list(100)
ft = function(t) {(abs(t) <= 1)*(1-(t)^2)+(abs(t) > 1)*(0)}
for (k in 1:100){
funcList[[k]] = function(t) {ft(t+k/n)*rnorm(100,0,1)}
}
for a fixed t (say t = -1) you can than do :
result <- sapply(funcList,function(ft_i)ft_i(-1))
and afterwards call rowSums
result2 <- rowSums(result)
but to be honest, that sounds like a relay complicated thing to do....
Related
I'm trying to write a function for CLV that stores the values for each year, indexed 0 to t, in a list and sums the list. What I have:
CLV_simple <- function (r, t, M, Alpha){
CLV = list()
for (i in 0:t){
CLV[i] <- M*(Alpha/(1+r))^i
return(sum(CLV))
}
}
CLV_simple(.10, 4, M, Alpha)
where CLV[0] = M*(Alpha/(1+r))^0,
CLV[1] = M*(Alpha/(1+r))^1, so on to CLV[4].
Alpha and M are defined as variables earlier. Assume Alpha is .81 and M is 40.23
the return should be the sum of each year or:
(M*(Alpha/(1+.10))^0) + (M*(Alpha/(1+.10))^1) + (M*(Alpha/(1+.10))^2) + (M*(Alpha/(1+.10))^3) + (M*(Alpha/(1+.10))^4)
[1] 121.4646
When I run my function all I get is this:
CLV_simple(.10, 4, M, Alpha)
list()
so something isn't being saved to the list right and I'm not sure why.
In your code,
return makes your code jump out of for loop after the first iteration. You should move return to the end of your function body where for loops are done.
Also, i is an index to access CLV, which should starts from 1 instead of 0 from 0:t.
One more thing, CLV should be initialized as c(), since elements in list() cannot summed up directly.
You can try for loop like below
CLV_simple <- function (r, t, M, Alpha){
CLV <- c()
for (i in 0:t){
CLV[i+1] <- M*(Alpha/(1+r))^i
}
return(sum(CLV))
}
or a simpler one
CLV_simple2 <- function(r,t,M,Alpha) sum(M*(Alpha/(1+r))^(0:t))
Is there some way to make this loop faster in r?
V=array(NA, dim=c(nrow(pixDF), n))
for(i in 1:n)
{
sdC<-sqrt(det(Cov[,i,]))
iC<-inv(Cov[,i,])
V[,i]<-apply(pixDF,1,function(x)(sdC*exp(-0.5*((x-Mean[i,])%*%iC%*%as.matrix((x-Mean[i,]))))))
}
where, in this case, pixDF is a matrix with 490000 rows and 4 columns filled with doubles. n = 5. Cov is a (4,5,4) array filled with "doubles". Mean is a (5,4) array filled with doubles as well.
This loop was taking about 30min on my computer. (before editing).
Right now it's taking 1min.
As Ronak notes, it is hard to help without reproducible example. But, I think that apply could be avoided. Something like this COULD work:
V <- array(NA, dim = c(nrow(pixDF), n))
tpixDF <- t(pixDF)
for (i in 1:n) {
x <- Cov[, i, ]
sdC <- sqrt(det(x))
iC <- solve(x)
mi <- Mean[i, ]
k <- t(tpixDF - mi)
V[, i] <- sdC*exp(-0.5*rowSums(k %*% iC * k))
}
Also, as Roland mentions inv probably is equal solve.
I know I can use expand.grid for this, but I am trying to learn actual programming. My goal is to take what I have below and use a recursion to get all 2^n binary sequences of length n.
I can do this for n = 1, but I don't understand how I would use the same function in a recursive way to get the answer for higher dimensions.
Here is for n = 1:
binseq <- function(n){
binmat <- matrix(nrow = 2^n, ncol = n)
r <- 0 #row counter
for (i in 0:1) {
r <- r + 1
binmat[r,] <- i
}
return(binmat)
}
I know I have to use probably a cbind in the return statement. My intuition says the return statement should be something like cbind(binseq(n-1), binseq(n)). But, honestly, I'm completely lost at this point.
The desired output should basically recursively produce this for n = 3:
binmat <- matrix(nrow = 8, ncol = 3)
r <- 0 # current row of binmat
for (i in 0:1) {
for (j in 0:1) {
for (k in 0:1) {
r <- r + 1
binmat[r,] <- c(i, j, k)}
}
}
binmat
It should just be a matrix as binmat is being filled recursively.
I quickly wrote this function to generate all N^K permutations of length K for given N characters. Hope it will be useful.
gen_perm <- function(str=c(""), lst=5, levels = c("0", "1", "2")){
if (nchar(str) == lst){
cat(str, "\n")
return(invisible(NULL))
}
for (i in levels){
gen_perm(str = paste0(str,i), lst=lst, levels=levels)
}
}
# sample call
gen_perm(lst = 3, levels = c("x", "T", "a"))
I will return to your problem when I get more time.
UPDATE
I modified the code above to work for your problem. Note that the matrix being populated lives in the global environment. The function also uses the tmp variable to pass rows to the global environment. This was the easiest way for me to solve the problem. Perhaps, there are other ways.
levels <- c(0,1)
nc <- 3
m <- matrix(numeric(0), ncol = nc)
gen_perm <- function(row=numeric(), lst=nc, levels = levels){
if (length(row) == lst){
assign("tmp", row, .GlobalEnv)
with(.GlobalEnv, {m <- rbind(m, tmp); rownames(m) <- NULL})
return(invisible(NULL))
}
for (i in levels){
gen_perm(row=c(row,i), lst=lst, levels=levels)
}
}
gen_perm(lst=nc, levels=levels)
UPDATE 2
To get the expected output you provided, run
m <- matrix(numeric(0), ncol = 3)
gen_perm(lst = 3, levels = c(0,1))
m
levels specifies a range of values to generate (binary in our case) to generate permutations, m is an empty matrix to fill up, gen_perm generates rows and adds them to the matrix m, lst is a length of the permutation (matches the number of columns in the matrix).
I wrote the following code trying to find all the prime numbers from a random generated data set. sadly it seems something went wrong, could anybody help me.
set.seed(20171106)
n <- 10000
num <- sample(1:100000,n,replace=TRUE)
findPrime <- function(x){
apple<-c()
n<-length(x)
for(i in n){
if(any(x[i]%%(1:(x[i]-1))!=0)) apple <-c(apple,x[i])
}
return(apple)
}
To get results:
type:findPrime(num)
This is the warning message:
Warning message:
In if (x[i]%%(1:(x[i] - 1)) == 0) apple <- c(apple, x[i]) :
the condition has length > 1 and only the first element will be used
so how can I fix the problem?
if statements only accept single elements and in your declaration seems to get the whole vector. I have rewritten your function using a ifelse expression wrapped inside a sapply loop.
I hope this works for you.
findPrime <- function(x = 0){
primes <- c()
# Prime finder
primes <- sapply(X = x,FUN = function(x) {
ifelse(any(x %% (1:(x - 1)) != 0), T, F)}
)
# Select primes
primes <- num[primes]
return(primes)
}
findPrime(num)
I have checked another silly mistake... Inside the function change num for x in the select primes step and invert the F, T outcomes. It should look like this:
findPrime <- function(x = 0){
primes <- c()
# Prime finder
primes <- sapply(X = x,FUN = function(x) {
ifelse(any(x %% (2:(x - 1)) == 0), F, T)}
)
# Select primes
primes <- x[primes]
return(primes)
}
I have just tried it and it works fine.
use package "gmp" which has a function "isprime" which returns 0 for non prime numbers and 2 for prime numbers and then subset the data based on the same
say you have a vector a = c(1:10)
a = c(1:10)
b = gmp::isprime(a)
c = cbind(a,b)
c = as.data.frame(c)
c = c[c$b==2,]
a1 = c$a
a1
In your code: for(i in 1:n), there is the error
I need to generate simulated data where the percent censored cannot be 0 or 1. That's why I use while loop. The problem is if I increase count to 10,000 (instead of 5), the program is very slow. I have to repeat this with 400 different scenarios so it is extremely slow. I'm trying to figure out places where I can vectorize my code piece by piece. How can I avoid while-loop and still able to keep the condition?
Another approach is keep the while loop and generate a list of 10,000 dataset that meet my criteria and then apply the function to the list. Here I use summary function as an example but my real function use both X_after and delta (ie. mle(X_after,delta)). Is this a better option if I have to use while loop?
Another concern I have is memory issue. How can I avoid using up memory while doing such large simulation?
mu=1 ; sigma=3 ; n=10 ; p=0.10
dset <- function (mu,sigma, n, p) {
Mean <- array()
Median <- array()
Pct_cens_array <- array()
count = 0
while(count < 5) {
lod <- quantile(rlnorm(100000, log(mu), log(sigma)), p = p)
X_before <- rlnorm(n, log(mu), log(sigma))
X_after <- ifelse(X_before <= lod, lod, X_before)
delta <- ifelse(X_before <= lod, 1, 0)
pct_cens <- sum(delta)/length(delta)
# print(pct_cens)
if (pct_cens == 0 | pct_cens == 1 ) next
else {
count <- count +1
if (pct_cens > 0 & pct_cens < 1) {
sumStats <- summary(X_after)
Median[count] <- sumStats[3]
Mean [count]<- sumStats[4]
Pct_cens_array [count] <- pct_cens
print(list(pct_cens=pct_cens,X_after=X_after, delta=delta, Median=Median,Mean=Mean,Pct_cens_array=Pct_cens_array))
}
}
}
return(data.frame(Pct_cens_array=Pct_cens_array, Mean=Mean, Median=Median))
}
I've made a few little tweaks to your code without changing the whole style of it. It would be good to heed Yoong Kim's advice and try to break up the code into smaller pieces, to make it more readable and maintainable.
Your function now gets two "n" arguments, for how many samples you have in each row, and how many iterations (columns) you want.
You were growing the arrays Median and Mean in the loop, which requires a lot of messing about reallocating memory and copying things, which slows everything down. I've predefined X_after and moved the mean and median calculations after the loop to avoid this. (As a bonus, mean and median only get called once instead of n_iteration times.)
The calls to ifelse weren't really needed.
It is a little quicker to call rlnorm once, generating enough values for x and the lod, than to call it twice.
Here's the updated function.
dset2 <- function (mu, sigma, n_samples, n_iterations, p) {
X_after <- matrix(NA_real_, nrow = n_iterations, ncol = n_samples)
pct_cens <- numeric(n_iterations)
count <- 1
while(count <= n_iterations) {
random_values <- rlnorm(2L * n_samples, log(mu), log(sigma))
lod <- quantile(random_values[1:n_samples], p = p)
X_before <- random_values[(n_samples + 1L):(2L * n_samples)]
X_after[count, ] <- pmax(X_before, lod)
delta <- X_before <= lod
pct_cens[count] <- mean(delta)
if (pct_cens > 0 && pct_cens < 1 ) count <- count + 1
}
Median <- apply(X_after, 1, median)
Mean <- rowMeans(X_after)
data.frame(Pct_cens_array=pct_cens, Mean=Mean, Median=Median)
}
Compare timings with, for example,
mu=1
sigma=3
n_samples=10L
n_iterations = 1000L
p=0.10
system.time(dset(mu,sigma, n_samples, n_iterations, p))
system.time(dset2(mu,sigma, n_samples, n_iterations, p))
On my machine, there is a factor of 3 speedup.
First rule I learnt with C programming: divide to reign! I mean you should first create multiple functions and call them into your loop because this loop does too many different things.
And I am worried about your algorithm:
if (pct_cens == 0 | pct_cens == 1 ) next
else {count <- count +1
Is there any reason you use while instead of for?
There is a difference between the loops while and for: with while, you always have a first loop, not with for.
Finally, about your problem: use more memory with an array to increase the speed.
Example:
lod <- quantile(rlnorm(100000, log(mu), log(sigma)), p = p)
X_before <- rlnorm(n, log(mu), log(sigma))
log(mu) and log(sigma) are computed twice: use variables to store the result, you will save time but spend more memory of course.