Consider the following vector x:
> 1:9
[1] 1 2 3 4 5 6 7 8 9
and consider the following inputs:
start = 10
pmt = 2
This is the result (let's call the resulting vector res) I am looking to achieve (what's displayed are the actual formulas). Note that the result is a vector not a dataframe. I just displayed it here 2 dimensions.
In other words, to obtain res, you multiple start by the cumulative product for each cell of df up to the corresponding cell.
When the vector index is a multiple is 4 or 7, the start value gets updated.
This is what I have attempted:
for(i in 1:9) {
res[i] = start * cumprod(df[k:i])[i]
if(i %% 3 == 0) {
start = res[i] - pmt
k = k + 3
} else {
start = res[i]
}
}
}
To put the problem into context, imagine you have a start value of money of 10 dollars, and you want to invest it over 9 months. However, you want to make a withdrawal at the end of each 3 months (i.e. at the beginning of month 4, month 7, ...). The vector x represent random values of returns.
Therefore, at the beginning of month 4, your start value is start*1*2*3 minus the withdrawal pmt.
The purpose here is computing the wealth value at the end of month 9.
The problem is that in reality, i = 200 (200 months), and I need to redo this computation for 10,000 different vectors x. So looping 10,000 times over the above code takes forever to execute!
Would you have any suggestion as to how to compute this more efficiently? I hope the explanation is not too confusing!
Thank you!
If you work out your formula for res as an iterative formula, then it is easier to write a function that you can give to Reduce. Here it is as a simple loop
x <- 1:9
start <- 10
pmt <- 2
res <- numeric(length(x))
res[1] <- x[1] * start
for (i in seq_along(x)[-1]) {
res[i] <- (res[i-1] - (pmt * (!(i%%4) || !(i%%7)))) * x[i]
}
If you want to write it as a Reduce function, it would look like this
Reduce(function(r, i) {
(r - (pmt * (!(i%%4) || !(i%%7)))) * x[i]
},
seq_along(x),
init = start,
accumulate = TRUE)[-1]
There is some weirdness with the start values and dropping the first element of the result because of the way that initial values are handled (and that it iteration is over indexes, not values, since comparisons must be done on the index). The loop here is probably more understandable.
I know you mentioned it being 1d, but I think this works well and you can convert it to 1d very easily -
start = 10
pmt = 2
library(data.table)
dt <- data.table(
month = 1:13
)
dt[,principalgrown := start*cumprod(month)]
#explained below#######
dt[,interestlost := 0]
for(i in seq(from = 4, to = (dim(dt)[1]), by = 3))
{
dt[month >= i,interestlost := interestlost + (prod(i:month)), by = month]
}
#######################
dt[,finalamount := principalgrown - (pmt*interestlost)]
The part within the #s is the trick. Where you calculate month 7 value as ((1*2*3*start - pmt)*4*5*6 - pmt) * 7, i calculate it as 1*2*3*4*5*6*7*start - 4*5*6*7*pmt - 7*pmt. 1*2*3*4*5*6*7*start is principalgrown and - 4*5*6*7*pmt - 7*pmt is -(pmt*interestlost)
Related
What I look for is basically an R-version of the answer to this question: Generating all permutation of numbers that sums up to N. First of all the answer uses java, which I have a really hard time reading. Second of all the code uses "deque", which I cant figure out a way to implement in R.
I have found several algorithms to do this, but they have all been written in programming languages using structures not available in R such as deques, heaps or list-comprehensions.
What I actually need is a way of finding all the vectors v of length N-1 where:
sum(v * 1:(N-1)) == N
and I think I can manage that myself if only I find a way of obtaining all the ordered integer partitions.
As an example for N = 4 all the ordered integer partitions using numbers 1 to N-1 are:
1+1+1+1
1+1+2
1+3
2+2
What I effectively need is output of the either form:
c(1,1,1,1)
c(1,1,2)
c(1,3)
c(2,2)
Or of the form:
c(4,0,0)
c(2,1,0)
c(1,0,1)
c(0,2,0)
since I should be able to convert the former format to the latter by myself. Any hint as to how to approach this problem using R would be greatly appreciated. The latter format is excactly the vectors v such that sum(v * 1:3) is 4.
EDIT:
My own attempt:
rek = function(mat, id1, id2){
if(id1 + id2 != length(mat) + 1){ #If next state not absorbing
mat[id1] = mat[id1] - 1
mat[id2] = mat[id2] - 1
mat[id1+id2] = mat[id1+id2] + 1
out = mat
id = which(mat > 0)
for(i in id){
for(j in id[id>=i]){
if(j == i & mat[i] == 1){
next
}
out = rbind(out, rek(mat,i,j))
}
}
return(out)
}
}
start = c(n, rep(0, n-2))
states = rbind(start, rek(start, 1, 1))
states = states[!duplicated(states), ] #only unique states.
This is incredibly inefficient. E. g. when n = 11, my states has over 120,000 rows prior to removing duplicates, which leaves only 55 rows.
EDIT 2:
Using the parts() function described below I came up with:
temp = partitions::parts(n)
temp = t(temp)
for(i in 1:length(temp[,1])){
row = temp[i,]
if(any(row>(n-1))){#if absorbing state
next
}
counts = plyr::count(row[row>0])
newrow = rep(0,n-1)
id = counts$x
numbs = counts$freq
newrow[id] = numbs
states = rbind(states, newrow)
}
states = states[-1,]#removing the first row, added manually
which excactly gives me the vectors v such that sum(v * 1:(N-1)) is N.
If anyone is interested, this is to be used within coalescent theory, as a way to describe the possible relations between N individuals omitting when all are related. As an example with N = 4:
(4, 0, 0) -- No individuals are related
(2, 1, 0) -- Two individuals are related, the rest are not
(0, 2, 0) -- The individuals are pair-wise related
(1, 0, 1) -- Three individuals are related, the other individual is not.
Hope parts from package partitions could help
library(partitions)
N <- 4
res <- unique(lapply(asplit(parts(N),2),function(x) sort(x[x>0])))[-1]
which gives
> res
[[1]]
[1] 1 3
[[2]]
[1] 2 2
[[3]]
[1] 1 1 2
[[4]]
[1] 1 1 1 1
If you would like to write a custom base R function, here is a recursive version
f <- function(n, vhead = n, v = c()) {
if (n == 0) return(list(v))
unlist(lapply(seq_len(min(n, vhead)), function(k) f(n - k, k, c(k,v))), recursive = FALSE)
}
then we can run
res <- Filter(function(x) length(x)>1,f(N))
I'm trying to run a simulation which would run n_tests for n_products for n_years to estimate an increase in demand and subsequent increase in stored pallets (linear relation between demand and stored products assumed). To make things a bit spicier, the demand comes from 2 separate regions (A and B) but products are stored in one warehouse.
What I've done currently works, but is slow. 10 years, 200 tests and 25,000 products need 10 seconds to run.
The setup:
library(data.table)
n_products <- 25000
n_years <- 10
n_tests <- 200
pct_error <- 2
A_fcst <- runif(n_years, min = 1, max = 8)
B_fcst <- runif(n_years, min = 3, max = 6)
Populate initial DT and matrices:
yearly_demand_A <- matrix(0, n_years, n_tests)
yearly_demand_B <- matrix(0, n_years, n_tests)
for (i in 1:n_years){
yearly_demand_A[i,] <- rnorm(n_tests, A_fcst[i], pct_error*sqrt(i))
yearly_demand_B[i,] <- rnorm(n_tests, B_fcst[i], pct_error*sqrt(i))
}
yearly_pallets <- matrix(0, n_years, n_tests)
demand_x_pallets <- data.table(prod_code = 1:n_products, stock_qty = as.integer(runif(n_products,1,100)), pallet_qty = as.integer(runif(n_products,10,30)), demand_A = runif(n_products,1,40), demand_B = runif(n_products,1,40))
demand_x_pallets[,pallets := ceiling(stock_qty/pallet_qty)]
demand_x_pallets[,demand := demand_A + demand_B]
for (i in 1:n_tests){
yearly_pallets[1:n_years,i] <- number_of_pallets(yearly_demand_A[1:n_years,i], yearly_demand_B[1:n_years,i], demand_x_pallets)
}
And the function itself:
number_of_pallets <- function(fcst_A,fcst_B,d_x_p,year=0){
pallets <- vector("double",n_years)
new_profile <- copy(d_x_p) #if I don't create a copy, the same DT is passed and number of pallets compunds
if (year == 0){ #if function called without year argument call it recursively
for(i in 1:(n_years)){
new_profile <- number_of_pallets(fcst_A[[i]],fcst_B[[i]],new_profile,i)
pallets[i] <- new_profile[,sum(pallets)]
}
}
else{ #calculate demand and pallet count for each product each year
d_x_p[,demand_A := demand_A * (100+fcst_A) / 100]
d_x_p[,demand_B := demand_B * (100+fcst_B) / 100]
d_x_p[,new_Dmnd := demand_A + demand_B]
d_x_p[,Dmnd_change := ifelse(demand==0,1,new_Dmnd/demand)]
d_x_p[,stock_qty := stock_qty * Dmnd_change]
d_x_p[,pallets := ceiling(stock_qty/pallet_qty)]
d_x_p[,demand := new_Dmnd]
return(d_x_p)
}
return(pallets)
}
Initially, I thought that copying the DT might be the reason for making it slow, but removing the line in the function, apart from making it not working properly, doesn't make any difference. This is the best I came up so far after few miserable failures, but I'm completely stuck now.
Any pointers on how to tackle it differently would be greatly appreciated.
I'm new to R, so most of my code is most likely wrong. However, I was wondering how to use a while() loop within a for() loop. I'm trying to simulate rolling a pair of dice several times if the total 2,3,7,11,or 12 then I stop. If the total 4,5,6,8,9, or 10 then I continue to the roll the dice until the initial total appears or 7. I'm trying to find the average number of rolls it take to end the game
count = 0
x = NULL
for (i in 1:10) {
x[i] = c(sample(1:6,1) +sample(1:6,1))
if(x[i] == c(2||3||7||11||12)) {
if(TRUE) {count = count +1}
} else { while(x[i] == c(4||5||6||8||9||10)) {
x[i +1] = c(sample(1:6,1)+sample(1:6,1))
if(x[i+1] == c(x[i]||7)) {
if(TRUE){count = count + x[i+1]}
}
}
}
}
print(count)
I think there are a few issues with your logic. I'm not quite sure what you're trying to do in your code, but this is my interpretation of your description of your problem ... this only runs a single round of your game -- it should work if you embed it in a for loop though (just don't reset count or reset the random-number seed in side your loop -- then count will give you the total number of rolls, and you can divide by the number of rounds to get the average)
Setup:
count = 0
sscore <- c(2,3,7,11,12)
set.seed(101)
debug = TRUE
Running a single round:
x = sample(1:6,1) +sample(1:6,1) ## initial roll
count = count + 1
if (x %in% sscore) {
## don't need to do anything if we hit,
## as the roll has already been counted
if (debug) cat("hit",x[i],"\n")
} else {
## initialize while loop -- try one more time
y = c(sample(1:6,1)+sample(1:6,1))
count = count + 1
if (debug) cat("initial",x,"next",y,"\n")
while(!(y %in% c(x,7))) {
y = c(sample(1:6,1)+sample(1:6,1))
count = count+1
if (debug) cat("keep trying",y,"\n")
} ## end while
} ## end if-not-hit
print(count)
I tried embedding this in a for loop and got a mean of 3.453 for 1000 rounds, close to #PawelP's answer.
PS I hope this isn't homework, as I prefer not to answer homework questions ...
EDIT: I had a bug - forgot to remove if negation. Now the below seems to be 100% true to your description of the problem.
This is my implementation of the game you've described. It calculates the average number of rolls it took to end the game over a TOTAL_GAMES many games.
TOTAL_GAMES = 1000
counts = rep(0, TOTAL_GAMES)
x = NULL
for (i in 1:TOTAL_GAMES) {
x_start = c(sample(1:6,1) +sample(1:6,1))
counts[i] = counts[i] + 1
x = x_start
if(x %in% c(2, 3, 7, 11, 12)){
next
}
repeat {
x = c(sample(1:6,1)+sample(1:6,1))
counts[i] = counts[i] + 1
if(x %in% c(x_start, 7)){
break
}
}
}
print(mean(counts))
It seems that the average number of rolls is around 3.38
Here's one approach to this question - I made a function that runs a single trial, and another function which conducts a variable number of these trials and returns the cumulative average.
## Single trial
rollDice <- function(){
init <- sample(1:6,1)+sample(1:6,1)
rolls <- 1
if( init %in% c(2,3,7,11,12) ){
return(1)
} else {
Flag <- TRUE
while( Flag ){
roll <- sample(1:6,1)+sample(1:6,1)
rolls <- rolls + 1
if( roll %in% c(init,7) ){
Flag <- FALSE
}
rolls
}
}
return(rolls)
}
## Multiple trials
simAvg <- function(nsim = 100){
x <- replicate(nsim,rollDice())
Reduce("+",x)/nsim
}
##
## Testing
nTrial <- seq(1,1000,25)
Results <- sapply(nTrial, function(X){ simAvg(X) })
##
## Plot over varying number of simulations
plot(x=nTrial,y=Results,pch=20)
As #Ben Bolker pointed out, you had a couple of syntax errors with ||, which is understandable for someone new to R. Also, you'll probably hear it a thousand times, but for and while loops are pretty inefficient in R so you generally want to avoid them if possible. In the case of the while loop in the above rollDice() function, it probably isn't a big deal because the probability of the loop executing a large number of times is very low. I used the functions Reduce and replicate to serve the role of a for loop in the second function. Good question though, it was fun to work on.
I have a working solution to my problem, but I will not be able to use it because it is so slow (my calculations predict that the whole simulation will take 2-3 years!). Thus I am looking for a better (faster) solution. This is (in essence) the code I am working with:
N=4
x <-NULL
for (i in 1:N) { #first loop
v <-sample(0:1, 1000000, 1/2) #generate data
v <-as.data.frame(v) #convert to dataframe
v$t <-rep(1:2, each=250) #group
v$p <-rep(1:2000, each=500) #p.number
# second loop
for (j in 1:2000) { #second loop
#count rle for group 1 for each pnumber
x <- rbind(x, table(rle(v$v[v$t==1&v$p==j])))
#count rle for group 2 for each pnumber
x <- rbind(x, table(rle(v$v[v$t==2&v$p==j])))
} #end second loop
} #end first loop
#total rle counts for both group 1 & 2
y <-aggregate(x, list(as.numeric(rownames(x))), sum)
In words: The code generates a coin-flip simulation (v). A group factor is generated (1 & 2). A p.number factor is generated (1:2000). The run lengths are recorded for each p.number (1:2000) for both groups 1 & group 2 (each p.number has runs in both groups). After N loops (the first loop), the total run lengths are presented as a table (aggregate) (that is, the run lengths for each group, for each p.number, over N loops as a total).
I need the first loop because the data that I am working with comes in individual files (so I'm loading the file, calculating various statistics etc and then loading the next file and doing the same). I am much less attached to the second loop, but can't figure out how to replace it with something faster.
What can be done to the second loop to make it (hopefully, a lot) faster?
You are committing the cardinal sin of growing an object within a for() loop in R. Don't (I repeat don't) do this. Allocate sufficient storage for x at the beginning and then fill in x as you go.
x <- matrix(nrow = N * (2000 * 2), ncol = ??)
Then in the inner loop
x[ii, ] <- table(rle(....))
where ii is a loop counter that you initialise to 1 before the first loop and increment within the second loop:
x <- matrix(nrow = N * (2000 * 2), ncol = ??)
ii <- 1
for(i in 1:N) {
.... # stuff here
for(j in 1:2000) {
.... # stuff here
x[ii, ] <- table(rle(....))
## increment ii
ii <- ii + 1
x[ii, ] <- table(rle(....))
## increment ii
ii <- ii + 1
} ## end inner loop
} ## end outer loop
Also note that you are reusing index i in bot for()loops which will not work.iis just a normal R object and so bothfor()loops will be overwriting it as the progress. USej` for the second loop as I did above.
Try that simple optimisation first and see if that will allow the real simulation to complete in an acceptable amount of time. If not, come back with a new Q showing the latest code and we can think about other optimisations. The optimisation above is simple to do, optimising table() and rle() might take a lot more work. Noting that, you might look at the tabulate() function which does the heavy lifting in table(), which might be one avenue for optimising that particular step.
If you just want to run rle and table for each combination of the values of v$t and v$p separately, there is no need for the second loop. It is much faster in this way:
values <- v$v + v$t * 10 + v$p * 100
runlength <- rle(values)
runlength$values <- runlength$values %% 2
x <- table(runlength)
y <- aggregate(unclass(x), list(as.numeric(rownames(x))), sum)
The whole code will look like this. If N is as low as 4, the growing object x will not be a severe problem. But generally I agree with #GavinSimpson, that it is not a good programming technique.
N=4
x <-NULL
for (i in 1:N) { #first loop
v <-sample(0:1, 1000000, 1/2) #generate data
v <-as.data.frame(v) #convert to dataframe
v$t <-rep(1:2, each=250) #group
v$p <-rep(1:2000, each=500) #p.number
values <- v$v + N * 10 + v$t * 100 + v$p * 1000
runlength <- rle(values)
runlength$values <- runlength$values %% 2
x <- rbind(x, table(runlength))
} #end first loop
y <-aggregate(x, list(as.numeric(rownames(x))), sum) #tota
I am looking for a way to convert cron information into a list of timestamps, using R.
is there an easy way to do do?
Given the crontab and a start date and an end date, I would like to obtain the list of trigger timestamps during these 2 dates.
I have not found any package that specifically deals with CRON info, but maybe somebody has already had this problem?
Thanks
I assume you mean the crontab(5) format. I know of no library parsing this information, but the following snippet should get you started:
splitre <- function(p, s) {
s <- as.character(s)
stopifnot(length(s) == 1)
m <- gregexpr(p, s)[[1]]
if (m[1] == -1) return(s);
return(substring(s, c(1, m + attr(m, "match.length")), c(m - 1, nchar(s))))
}
ranges <- function(desc, lo, hi, name) {
res <- integer(0)
for (range in splitre(",", desc)) {
m <- regexec("^(?:\\*|(?:(\\d+)(?:-(\\d+))?))(?:/(\\d+))?$", range)
m <- regmatches(range, m)[[1]]
m[m == ""] <- NA
m[1] <- NA
m <- as.integer(m)
if (is.na(m[2])) r <- lo:hi
else if (is.na(m[3])) r <- m[2]
else r <- m[2]:m[3]
if (!is.na(m[4])) {
stopifnot(m[4] > 0)
r <- r[rep(c(TRUE, rep(FALSE, m[4] - 1)), length.out = length(r))]
}
res <- c(res, r)
}
res <- data.frame(res)
names(res) <- name
return(res)
}
ct2df <- function(lines) {
res <- data.frame()
for (line in lines) {
if (regexpr("^ *(#|$)", line) == 1) continue
parts <- splitre(" +", line)
stopifnot(length(parts) > 5)
j <- ranges(parts[1], 0, 59, "minute")
j <- merge(j, ranges(parts[2], 0, 23, "hour"))
j <- merge(j, ranges(parts[3], 1, 31, "day.of.month"))
j <- merge(j, ranges(parts[4], 1, 12, "month"))
j <- merge(j, ranges(parts[5], 0, 6, "day.of.week"))
res <- rbind(res, j)
}
return(res)
}
print(ct2df("* 1-2,5 1-10/2 */3 1 command"))
This is not perfect, as it won't handle names for months or day of week, and it won't handle the special case about day of month vs. day of week, which requires treatment of * as more than a simple range.
Note: The day of a command's execution can be specified by two fields - day of month, and day of week. If both fields are restricted (ie, aren't *), the command will be run when either field matches the current time. For example, 30 4 1,15 * 5 would cause a command to be run at 4:30 am on the 1st and 15th of each month, plus every Friday.
The resulting data frame can be turned into a list of timestamps, but I haven't written code for that. Perhaps someone else will, building on this post. The simple but slow method would be iterating over possible timestamps minute by minute, ans for every timestamp see whether a row from the computed data frame matches that value. Faster solutions would iterate on a day-by-day basis, use that to work out day of week and day of month, and then take times from all rows matching that day.