convert cron to timestamps in R - r

I am looking for a way to convert cron information into a list of timestamps, using R.
is there an easy way to do do?
Given the crontab and a start date and an end date, I would like to obtain the list of trigger timestamps during these 2 dates.
I have not found any package that specifically deals with CRON info, but maybe somebody has already had this problem?
Thanks

I assume you mean the crontab(5) format. I know of no library parsing this information, but the following snippet should get you started:
splitre <- function(p, s) {
s <- as.character(s)
stopifnot(length(s) == 1)
m <- gregexpr(p, s)[[1]]
if (m[1] == -1) return(s);
return(substring(s, c(1, m + attr(m, "match.length")), c(m - 1, nchar(s))))
}
ranges <- function(desc, lo, hi, name) {
res <- integer(0)
for (range in splitre(",", desc)) {
m <- regexec("^(?:\\*|(?:(\\d+)(?:-(\\d+))?))(?:/(\\d+))?$", range)
m <- regmatches(range, m)[[1]]
m[m == ""] <- NA
m[1] <- NA
m <- as.integer(m)
if (is.na(m[2])) r <- lo:hi
else if (is.na(m[3])) r <- m[2]
else r <- m[2]:m[3]
if (!is.na(m[4])) {
stopifnot(m[4] > 0)
r <- r[rep(c(TRUE, rep(FALSE, m[4] - 1)), length.out = length(r))]
}
res <- c(res, r)
}
res <- data.frame(res)
names(res) <- name
return(res)
}
ct2df <- function(lines) {
res <- data.frame()
for (line in lines) {
if (regexpr("^ *(#|$)", line) == 1) continue
parts <- splitre(" +", line)
stopifnot(length(parts) > 5)
j <- ranges(parts[1], 0, 59, "minute")
j <- merge(j, ranges(parts[2], 0, 23, "hour"))
j <- merge(j, ranges(parts[3], 1, 31, "day.of.month"))
j <- merge(j, ranges(parts[4], 1, 12, "month"))
j <- merge(j, ranges(parts[5], 0, 6, "day.of.week"))
res <- rbind(res, j)
}
return(res)
}
print(ct2df("* 1-2,5 1-10/2 */3 1 command"))
This is not perfect, as it won't handle names for months or day of week, and it won't handle the special case about day of month vs. day of week, which requires treatment of * as more than a simple range.
Note: The day of a command's execution can be specified by two fields - day of month, and day of week. If both fields are restricted (ie, aren't *), the command will be run when either field matches the current time. For example, 30 4 1,15 * 5 would cause a command to be run at 4:30 am on the 1st and 15th of each month, plus every Friday.
The resulting data frame can be turned into a list of timestamps, but I haven't written code for that. Perhaps someone else will, building on this post. The simple but slow method would be iterating over possible timestamps minute by minute, ans for every timestamp see whether a row from the computed data frame matches that value. Faster solutions would iterate on a day-by-day basis, use that to work out day of week and day of month, and then take times from all rows matching that day.

Related

Nested for loop in R is giving me bracket error despite using correct amount of brackets

So I made a matrix with 4 rows to represent 4 individuals (each with an ID). I'm trying to use a nested for loop to incorporate a time increment in the first loop, then in the second loop add a row for every individual for the new time and incorporate a function in the 3rd column and each increase in time will add on the value of the function to the value in the same column from the function from the previous time step. I'm starting small with 4 individuals and 5 time steps, but for some reason I'm getting an error message about an unexpected '}', but I've gone through and double checked the brackets and parentheses multiple times. I'm not sure what the issue is with the for loop or if it's going to end up doing what I intend for it to do.
uptake <- function(x){
vmax <- x[1]
km <- x[2]
s <- x[3]
result <- vmax*(s/(km+s))
return(result)
}
agents <- matrix(0, nrow = 4, ncol = 6)
colnames(agents) <- c("Time", "ID", "Uptake rate (V)", "vmax", "km", "s")
agents[,1] <- 0
agents[,2] <- c(1:4)
agents[,4] <- 1.4
agents[,5] <- 17
agents[,6] <- 1.4
for (i in seq(1, 5, 1)){
for (j in 1:nrow(agents$Time = i-1)){
agents[j,] <-rbind(agents, c(i, agents[j,2], agents[j,3] +
uptake(agents[j,4:6]), agents[j,4],
agents[j,5], agents[j,6]))}}
and this is the error code I'm getting:
Error: unexpected '}' in:
" uptake(agents[j,4:6]), agents[j,4],
agents[j,5], agents[j,6]))}"
I appreciate any advice and insight!!
Is this what you are trying to do?
for (i in seq(1, 5, 1)) {
for (j in seq_len(sum(agents[,"Time"] == i - 1))) {
agents <- rbind(
agents,
c(
i,
agents[j,2],
agents[j,3] + uptake(agents[j,4:6]),
agents[j,4],
agents[j,5],
agents[j,6]
)
)
}
}

Select values from a data frame indexed to another data frame in a many-to-one relationship in R

I am building a program for simulating sequences of wind vectors (in base R).
I have a data set of parameters for six wind-generation mechanisms ('pars'),(I'll call them ellipses) and there are 5 parameters for each ellipse, thus 30 columns of parameters, plus other parameters that indicate the proportion of time (frequency, indicated by f.0, f.1...) each ellipse is in operation. There are 24 rows in 'pars', each identified by an 'hour' variable. The following codes generates a simulated 'pars' data frame
pars <- as.data.frame(matrix(rnorm(24*42), 24, 42, dimnames=list(NULL, c(
'f.0', 'f.1', 'f.2', 'f.3', 'f.4', 'f.5', 'f.6',
'W.0', 'W.1', 'W.2', 'W.3', 'W.4', 'W.5', 'W.6',
'S.0', 'S.1', 'S.2', 'S.3', 'S.4', 'S.5', 'S.6',
'w.0', 'w.1', 'w.2', 'w.3', 'w.4', 'w.5', 'w.6',
's.0', 's.1', 's.2', 's.3', 's.4', 's.5', 's.6',
'r.0', 'r.1', 'r.2', 'r.3', 'r.4', 'r.5', 'r.6')
)))
jobFun <- function(n) {
m <- matrix(runif(7*n), ncol=7)
m <- sweep(m, 1, rowSums(m), FUN="/")
m
}
pars[1:24,c('f.0', 'f.1', 'f.2', 'f.3', 'f.4', 'f.5', 'f.6')] <- jobFun(24) # generate ellipse frequencies, summing to 1
pars$hour <- 0:23 # Add an 'hour' variable
pars$p0 <- with(pars, f.0) # change to make it zero if < zero!
pars$p1 <- with(pars, f.1 + p0)
pars$p2 <- with(pars, f.2 + p1)
pars$p3 <- with(pars, f.3 + p2)
pars$p4 <- with(pars, f.4 + p3)
pars$p5 <- with(pars, f.5 + p4)
pars$p6 <- with(pars, f.6 + p5)
I start by generating a sequence of POSIXct date-times for a single day, e.g, at 5 minute intervals ('sim'). For each date-time in 'sim', I need to select an ellipse and assign the parameters to the 'sim' data set. I have made additional columns in 'pars' with the cumulative probability of each ellipse, e.g., p0 = f.0, p1 = p0 + f.1, p2 = p1 + f.2, etc. I am going to select a different ellipse for each 5 minute time increment (then select the parameters corresponding to that ellipse). My difficulty lies in being unable to specify the appropriate value for p.
START <- ISOdate(2022, MONTH, 1, hour=0, min=0)
END <- START + (24*3600) - 1
tseq <- seq(from=START,to=END,by=300)
sim = data.frame(tseq)
sim$Ep <- runif(nrow(sim)) # Generate random vector Ep for ellipse picking
sim$Enum <- with(sim, ifelse( # number identifying ellipse to be used
Ep < pars$p0[which(pars$hour == hour(tseq))], 0, ifelse(
Ep < pars$p1[which(pars$hour == hour(tseq))], 1, ifelse(
Ep < pars$p2[which(pars$hour == hour(tseq))], 2, ifelse(
Ep < pars$p3[which(pars$hour == hour(tseq))], 3, ifelse(
Ep < pars$p4[which(pars$hour == hour(tseq))], 4, ifelse(
Ep < pars$p5[which(pars$hour == hour(tseq))], 5, 6)))))))
...
The result should be a vector (Enum) of integers between 0 and 6 identifying the ellipse to be used at each 5 minute time increment. My program only provides a correct answer at the 0th minute of each hour; there is something wrong with the statement
pars$p[which(pars$hour == hour(tseq))]
which ends up generating NA's for all the other 5 minute time increments in the hour. (i.e., there are 12 increments of 5 minutes in an hour, and the statement
which(pars$hour == hour(tseq))
brings up all 12 at once, instead of one at a time which is what I need here. Maybe I need a 'for' loop? Any suggestions for fixing, and for making the above code more compact, will be appreciated.
The problem is that the logical subscripting too complicated. All that is necessary is to change, e.g.,
pars$p0[which(pars$hour == hour(tseq))]
to
pars$p0[hour(tseq)+1]
and the value for p0 that is specific to the hour being simulated will be selected.
Spector (2008) "Data Manipulation with R" is helpful as usual.
Note that for question above, the 'lubridate' package is necessary for the hour() function, and MONTH must be specified (e.g., MONTH=4) to run the code

R generates NA_real vector in while loop, but not when code line is run separately, how to fix the loop?

I'm trying to "pseudo-randomize" a vector in R using a while loop.
I have a vector delays with the elements that need to be randomized.
I am using sample on a vector values to index randomly into delays. I cannot have more than two same values in a row, so I am trying to use an if else statement. If the condition are met, the value should be added to random, and removed from delays.
When I run the individual lines outside the loop they are all working, but when I try to run the loop, one of the vector is populated as NA_real, and that stops the logical operators from working.
I'm probably not great at explaining this, but can anyone spot what I'm doing wrong? :)
delay_0 <- rep(0, 12)
delay_6 <- rep(6, 12)
delays <- c(delay_6, delay_0)
value <- c(1:24)
count <- 0
outcasts <- c()
random <- c(1,2)
while (length(random) < 27) {
count <- count + 1
b <- sample(value, 1, replace = FALSE)
a <- delays[b]
if(a == tail(random,1) & a == head(tail(random,2),1) {
outcast <- outcasts + 1
}
else {
value <- value[-(b)]
delays <- delays[-(b)]
random <- c(random,a)
}
}
Two problems with your code:
b can take a value that is greater than the number of elements in delays. I fixed this by using sample(1:length(delays), 1, replace = FALSE)
The loop continues when delays is empty. You could either change length(random) < 27 to length(random) < 26 I think or add length(delays) > 0.
The code:
delay_0 <- rep(0, 12)
delay_6 <- rep(6, 12)
delays <- c(delay_6, delay_0)
value <- c(1:24)
count <- 0
outcasts <- c()
random <- c(1, 2)
while (length(random) < 27 & length(delays) > 0) {
count <- count + 1
b <- sample(1:length(delays), 1, replace = FALSE)
a <- delays[b]
if (a == tail(random, 1) & a == head((tail(random, 2)), 1))
{
outcast <- outcasts + 1
}
else {
value <- value[-(b)]
delays <- delays[-(b)]
random <- c(random, a)
}
}

How to speed up a loop-like function in R

In trying to avoid using the for loop in R, I wrote a function that returns an average value from one data frame given row-specific values from another data frame. I then pass this function to sapply over the range of row numbers. My function works, but it returns ~ 2.5 results per second, which is not much better than using a for loop. So, I feel like I've not fully exploited the vectorized aspects of the apply family of functions. Can anyone help me rethink my approach? Here is a minimally working example. Thanks in advance.
#Creating first dataframe
dates<-seq(as.Date("2013-01-01"), as.Date("2016-07-01"), by = 1)
n<-length(seq(as.Date("2013-01-01"), as.Date("2016-07-01"), by = 1))
df1<-data.frame(date = dates,
hour = sample(1:24, n,replace = T),
cat = sample(c("a", "b"), n, replace = T),
lag = sample(1:24, n, replace = T))
#Creating second dataframe
df2<-data.frame(date = sort(rep(dates, 24)),
hour = rep(1:24, length(dates)),
p = runif(length(rep(dates, 24)), min = -20, max = 100))
df2<-df2[order(df2$date, df2$hour),]
df2$cat<-"a"
temp<-df2
temp$cat<-"b"
df2<-rbind(df2,temp)
#function
period_mean<-function(x){
tmp<-df2[df$cat == df1[x,]$cat,]
#This line extracts the row name index from tmp,
#in which the two dataframes match on date and hour
he_i<-which(tmp$date == df1[x,]$date & tmp$hour == df1[x,]$hour)
#My lagged period is given by the variable "lag". I want the average
#over the period hour - (hour - lag). Since df2 is sorted such hours
#are consecutive, this method requires that I subset on only the
#relevant value for cat (hence the creation of tmp in the first line
#of the function
p<-mean(tmp[(he_i - df1[x,]$lag):he_i,]$p)
print(x)
print(p)
return(p)
}
#Execute function
out<-sapply(1:length(row.names(df1)), period_mean)
EDIT I have subsequently learned that part of the reason my original problem was iterating so slowly is that my data classes between the two dataframes were not the same. df1$date was a date field, while df2$date was a character field. Of course, this wasn't apparent with the example I posted because the data types were the same by construction. Hope this helps.
Here's one suggestion:
getIdx <- function(i) {
date <- df1$date[i]
hour <- df1$hour[i]
cat <- df1$cat[i]
which(df2$date==date & df2$hour==hour & df2$cat==cat)
}
v_getIdx <- Vectorize(getIdx)
df1$index <- v_getIdx(1:nrow(df1))
b_start <- match("b", df2$cat)
out2 <- apply(df1[,c("cat","lag","index")], MAR=1, function(x) {
flr <- ifelse(x[1]=="a", 1, b_start)
x <- as.numeric(x[2:3])
mean(df2$p[max(flr, (x[2]-x[1])):x[2]])
})
We make a function (getIdx) to retrieve the rows from df2 that match the values from each row in df1, and then Vectorize the function.
We then run the vectorized function to get a vector of rownames. We set b_start to be the row where the "b" category starts.
We then iterate through the rows of df1 with apply. In the mean(...) function, we set the "floor" to be either row 1 (if cat=="a") or b_start (if cat=="b"), which eliminates the need to subset (what you were doing with tmp).
Performance:
> system.time(out<-sapply(1:length(row.names(df1)), period_mean))
user system elapsed
11.304 0.393 11.917
> system.time({
+ df1$index <- v_getIdx(1:nrow(df1))
+ b_start <- match("b", df2$cat)
+ out2 <- apply(df1[,c("cat","lag","index")], MAR=1, function(x) {
+ flr <- ifelse(x[1]=="a", 1, b_start)
+ x <- as.numeric(x[2:3])
+ mean(df2$p[max(flr, (x[2]-x[1])):x[2]])
+ })
+ })
user system elapsed
2.839 0.405 3.274
> all.equal(out, out2)
[1] TRUE

Simplify loop computation

Consider the following vector x:
> 1:9
[1] 1 2 3 4 5 6 7 8 9
and consider the following inputs:
start = 10
pmt = 2
This is the result (let's call the resulting vector res) I am looking to achieve (what's displayed are the actual formulas). Note that the result is a vector not a dataframe. I just displayed it here 2 dimensions.
In other words, to obtain res, you multiple start by the cumulative product for each cell of df up to the corresponding cell.
When the vector index is a multiple is 4 or 7, the start value gets updated.
This is what I have attempted:
for(i in 1:9) {
res[i] = start * cumprod(df[k:i])[i]
if(i %% 3 == 0) {
start = res[i] - pmt
k = k + 3
} else {
start = res[i]
}
}
}
To put the problem into context, imagine you have a start value of money of 10 dollars, and you want to invest it over 9 months. However, you want to make a withdrawal at the end of each 3 months (i.e. at the beginning of month 4, month 7, ...). The vector x represent random values of returns.
Therefore, at the beginning of month 4, your start value is start*1*2*3 minus the withdrawal pmt.
The purpose here is computing the wealth value at the end of month 9.
The problem is that in reality, i = 200 (200 months), and I need to redo this computation for 10,000 different vectors x. So looping 10,000 times over the above code takes forever to execute!
Would you have any suggestion as to how to compute this more efficiently? I hope the explanation is not too confusing!
Thank you!
If you work out your formula for res as an iterative formula, then it is easier to write a function that you can give to Reduce. Here it is as a simple loop
x <- 1:9
start <- 10
pmt <- 2
res <- numeric(length(x))
res[1] <- x[1] * start
for (i in seq_along(x)[-1]) {
res[i] <- (res[i-1] - (pmt * (!(i%%4) || !(i%%7)))) * x[i]
}
If you want to write it as a Reduce function, it would look like this
Reduce(function(r, i) {
(r - (pmt * (!(i%%4) || !(i%%7)))) * x[i]
},
seq_along(x),
init = start,
accumulate = TRUE)[-1]
There is some weirdness with the start values and dropping the first element of the result because of the way that initial values are handled (and that it iteration is over indexes, not values, since comparisons must be done on the index). The loop here is probably more understandable.
I know you mentioned it being 1d, but I think this works well and you can convert it to 1d very easily -
start = 10
pmt = 2
library(data.table)
dt <- data.table(
month = 1:13
)
dt[,principalgrown := start*cumprod(month)]
#explained below#######
dt[,interestlost := 0]
for(i in seq(from = 4, to = (dim(dt)[1]), by = 3))
{
dt[month >= i,interestlost := interestlost + (prod(i:month)), by = month]
}
#######################
dt[,finalamount := principalgrown - (pmt*interestlost)]
The part within the #s is the trick. Where you calculate month 7 value as ((1*2*3*start - pmt)*4*5*6 - pmt) * 7, i calculate it as 1*2*3*4*5*6*7*start - 4*5*6*7*pmt - 7*pmt. 1*2*3*4*5*6*7*start is principalgrown and - 4*5*6*7*pmt - 7*pmt is -(pmt*interestlost)

Resources