How do you plot a linear regression line? - r

How does one fit a linear regression line to a scatter plot using base R? Assuming you already have the summary info from the linear model.
I already have a scatter plot that compares a and ix, and I am trying to add the regression lines lm.a and lm.b to the plot. Should I use an a b line or something else?
a <- c(21, 23, 25, 27, 29)
ix <- c(100, 300, 500, 600, 750)
ib <- c(0, 1, 0, 1, 1)
x <- data.frame(a, ix, ib)
lm.a <- with(x, lm(a ~ ix + ib + ix*ib))
summary(lm.a)
n1 <- lm.a$coefficients[1]
n2 <- lm.a$coefficients[2]
n3 <- lm.a$coefficients[3]
n4 <- lm.a$coefficients[4]

You almost got it, here is a working example you can adapt:
height <- c(176, 154, 138, 196, 132, 176, 181, 169, 150, 175)
bodymass <- c(82, 49, 53, 112, 47, 69, 77, 71, 62, 78)
plot(bodymass, height)+
abline(lm(height ~ bodymass)) # Missing lm here

Related

How to calculate confidence interval

We are supposed to find the 90% confidence interval for a 74 year old man.
x <- c(58, 69, 43, 39, 63, 52, 47, 31, 74, 36)
y <- c(189, 235, 193, 177, 154, 191, 213, 165, 198, 181)
(where x is age and y is cholesterol level)
i used:
correlation <- cor.test(x, y, conf.level = 0.90)
and that gives me this:
data: x and y t = 1.2656, df = 8, p-value = 0.2413 alternative hypothesis: true correlation is not equal to 0 90 percent confidence interval: -0.1857867 0.7839057 sample estimates: cor 0.4084309
and when i asked people in my class what values they were getting all of them told me (203.2717, 205.5591) Where am I going wrong, the corr.test is telling me -0.1857867 0.7839057.
also the next portion of the assignment is asking us to calculate a 90% prediction interval for a 74 year olds, how would i do this in r studio?
thanks a lot!
df <- data.frame(
x = c(58, 69, 43, 39, 63, 52, 47, 31, 74, 36),
y = c(189, 235, 193, 177, 154, 191, 213, 165, 198, 181)
)
predict.lm(
lm(y~x, data = df),
newdata = data.frame(x = 74),
interval = "confidence",
level = 0.90
)
# fit lwr upr
# 1 204.42 178.99 229.85

Fit custom function to data

I have a data such that produced from special function:
where t0=1, alpha, q, gamma, C and beta are unknown parameters.
The question is how to fit the above function to following data, in R?
mydata<-structure(list(x = 1:100, y = c(0, 0, 2, 1, 3, 4, 4, 3, 7, 8,
9, 11, 12, 11, 15, 15, 17, 21, 49, 43, 117, 75, 85, 97, 113,
129, 135, 147, 149, 149, 123, 129, 127, 122, 143, 157, 144, 139,
123, 117, 141, 138, 124, 134, 158, 151, 136, 133, 121, 117, 122,
125, 117, 111, 98, 94, 92, 89, 73, 87, 91, 88, 94, 90, 93, 76,
60, 96, 71, 80, 71, 63, 65, 47, 74, 63, 78, 68, 55, 48, 51, 45,
48, 50, 71, 48, 35, 51, 69, 62, 64, 66, 51, 59, 58, 34, 57, 56,
63, 50)), class = "data.frame", row.names = c(NA, -100L))
I defined the function as follows:
t0<<-1
fyy<-function(t,cc0,alpha0,qq0,beta0,gamma0){
ret<-cc0*((t-t0)^alpha0)/(((1+(qq0-1)*beta0*(t-t0)^gamma0))^(1/(qq0-1)))
return(ret)
}
but I don't know how to continue?
as #mhovd mentioned I used "nls" function but I got an error as follows:
> fit <- nls(y~fyy(x,cc0 ,alpha0 ,beta0 ,gamma0 ,qq0 ),
data=data.frame(mydata), start=list(cc0 = .01,alpha0 =1,beta0 =.3,gamma0
= 2,qq0 = 1))
Error in numericDeriv(form[[3L]], names(ind), env) :
Missing value or an infinity produced when evaluating the model
In the comments #masoud references a paper about the specific function in the question. It suggests fixing gamma0 and qq0 and if we do that we do get a solution -- fm shown in red in the plot. We have also shown an alternate parametric curve as fm2 in blue. It also has 3 optimized parameters but has lower residual sum of squares (lower is better).
fyy <- function(t,cc0,alpha0,qq0,beta0,gamma0){
cc0 * ((t-t0)^alpha0) / (((1+(qq0-1)*beta0*(t-t0)^gamma0))^(1/(qq0-1)))
}
mydata0 <- subset(mydata, y > 0)
# fixed values
t0 <- 1
gamma0 <- 3
qq0 <- 1.2
st <- list(cc0 = 1, alpha0 = 1, beta0 = 1) # starting values
fm <- nls(y ~ fyy(x, cc0, alpha0, qq0, beta0, gamma0), mydata0,
lower = list(cc0 = 0.1, alpha0 = 0.1, beta0 = 0.00001),
start = st, algorithm = "port")
deviance(fm) # residual sum of squares
## [1] 61458.5
st2 <- list(a = 1, b = 1, c = 1)
fm2 <- nls(y ~ exp(a + b/x + c*log(x)), mydata0, start = st2)
deviance(fm2) # residual sum of squares
## [1] 16669.24
plot(mydata0, ylab = "y", xlab = "t")
lines(fitted(fm) ~ x, mydata0, col = "red")
lines(fitted(fm2) ~ x, mydata0, col = "blue")
legend("topright", legend = c("fm", "fm2"), lty = 1, col = c("red", "blue"))

Simulate data from a Gompertz curve in R

I have a set of data that I have collected which consists of a time series, where each y-value is found by taking the mean of 30 samples of grape cluster weight.
I want to simulate more data from this, with the same number of x and y values, so that I can carry out some Bayesian analysis to find the posterior distribution of the data.
I have the data, and I know that the growth follows a Gompertz curve with formula:
[y = a*exp(-exp(-(x-x0)/b))], with a = 88.8, b = 11.7, and x0 = 15.1.
The data I have is
x = c(0, 28, 36, 42, 50, 58, 63, 71, 79, 85, 92, 99, 106, 112)
y = c(0, 15, 35, 55, 62, 74, 80, 96, 127, 120, 146, 160, 177, 165).
Any help would be appreciated thank you
*Will edit when more information is given**
I am a little confused by your question. I have compiled what you have written into R. Please elaborate for me so that I can help you:
gompertz <- function(x, x0, a, b){
a*exp(-exp(-(x-x0)/b))
}
y = c(0, 15, 35, 55, 62, 74, 80, 96, 127, 120, 146, 160, 177, 165) # means of 30 samples of grape cluster weights?
x = c(0, 28, 36, 42, 50, 58, 63, 71, 79, 85, 92, 99, 106, 112) # ?
#??
gompertz(x, x0 = 15.1, a = 88.8, b = 11.7)
gompertz(y, x0 = 15.1, a = 88.8, b = 11.7)

Simulating data for a Gompertz curve

I have a set of data that I have collected which consists of a time series, where each y-value is found by taking the mean of 30 samples of grape cluster weight.
The growth follows a Gompertz curve with formula y = a*exp(-exp(-(x-x0)/b)), with
a = 88.8
b = 11.7
x0 = 15.1.
The data:
x = c(0, 28, 36, 42, 50, 58, 63, 71, 79, 85, 92, 99, 106, 112)
y = c(0, 15, 35, 55, 62, 74, 80, 96, 127, 120, 146, 160, 177, 165).
x refers to the days from fruit set (i.e. 0 is when the time series starts)
x values correspond to the days in which the measurements are taken (which depends on certain growth stages of grapes)
I want to simulate more data from this, with the same number of x and y values, so that I can carry out some Bayesian analysis to find the posterior distribution of the data.
Effectively what I need is:
to simulate data which follows the Gompertz curve to create the posterior distribution. This data would technically be for "previous years" time series data.
to construct and test the fit of the predictive time-series model based on the distribution
If there is some skeleton code where it is possible to change around the parameters, then this could potentially be very helpful for me too.
Thanks
Let's inspect your data
x <- c(0, 28, 36, 42, 50, 58, 63, 71, 79, 85, 92, 99, 106, 112)
y <- c(0, 15, 35, 55, 62, 74, 80, 96, 127, 120, 146, 160, 177, 165)
and fitted Gompertz curve
gFun <- function(x){
a <- 88.8
b <- 11.7
x0 <- 15.1
est <- a*exp(-exp(-(x-x0)/b))
return(est)
}
by visualisation:
library(ggplot2)
ggplot(ggData, aes(x=x, y=y) ) +
geom_point() +
stat_function(fun=gFun, colour="blue") +
theme_bw()
This doesn't look as a good fit. However, simulating data y|x at fixed x as in the vector above can be done by adding error term. I've used normal distribution with sd=4 for illustration.
nSim <- 10
simData <- data.frame(x=c(0, rep(x[-1], each=nSim)) ) # x[-1] removes 0 from simulation
simData$y <- gFun(simData$x) + rnorm(n=nrow(simData), sd=4)
ggplot(simData, aes(x=x, y=y) ) +
geom_point(alpha=0.4) +
stat_function(fun=gFun, colour="blue") +
scale_x_continuous(limits=c(0, max(x)) ) +
theme_bw()

Simulate vectors conditional on custom distribution

I am measuring the duration of episodes (vector ep.dur in minutes) per day, for an observation period for T=364 days. The vector ep.dur has a length(ep.dur) of T=364, with zeros in days when no episode occurred, and range(ep.dur) is between 0 and 1440
The sum of the episode duration over the T period is a<-sum(ep.duration)
Now I have a vector den, with length(den)=99. The vector den shows how many days are required for the development of each 1% (1%, 2%, 3%, ...) of a
Now given den and a, I would like to simulate multiple ep.dur
Is this possible?
Clarification 1:: (first comment of danas.zuokas) The elements of den represent duration NOT exact days. That means, for example 1, that 1%(=1195.8) of a is developed in 1 day, 2% in 2 days, 3% in 3 days, 4% in 4 days, 5% in 5 days, 6% in 5 days .....). The episodes can take place anywhare in T
Clarification 2: (second comment of danas.zuokas) Unfortunately there can be no assumptions on how duration develops. That is why I have to simulate numerous ep.dur vectors. HOWEVER, i can expand the den vector into more finite resolution (that is: instead of 1% jumps, 0.1% jumps) if this is of any help.
Description of the algorithm
The algorithm should satisfy all information the den vector provides. I have imagined the algorithm going as following (Example 3):
Each 1% jump of a is 335,46 min. den[1] tells us that 1% of a is developed in 1 day. so lets say we generate ep.dur[1]=335,46. OK. We go to den[2]: 2% of the a is developed in d[2]=1 days. So, ep.dur[1] cannot be 335,46 and is rejected (2% of a should still occur in one day). Lets say that had generated ep.dur[1]=1440. d[1] is satisfied, d[2] is satisifed (at least 2% of the total duration is developed in dur[2]=1 days), dur[3]=1 is also satisfied. Keeper? However, dur[4]=2 is not satified if ep.dur[1]=1440 because it states that 4% of a (=1341) should occur in 2 days. So ep.dur[1] is rejected. Now lets say that ep.dur[1]=1200. dur[1:3] are accepted. Then we generate ep.dur[2] and so on making sure that the generated ep.dur all satisfy the information provided by den.
Is this programmatically feasible? I really do not know where to start with this problem. I will provide a generous bounty once bounty start period is over
Example 1:
a<-119508
den<-c(1, 2, 3, 4, 5, 5, 6, 7, 8, 9, 10, 10, 11, 12, 13, 14, 15, 15,
16, 17, 18, 19, 20, 20, 21, 22, 23, 24, 25, 25, 26, 27, 28, 29,
30, 30, 31, 32, 33, 34, 35, 35, 36, 37, 38, 39, 40, 40, 41, 42,
43, 44, 45, 45, 46, 47, 48, 49, 50, 50, 51, 52, 53, 54, 55, 55,
56, 57, 58, 59, 60, 60, 61, 62, 63, 64, 65, 65, 66, 67, 68, 69,
70, 70, 71, 72, 73, 74, 75, 75, 76, 77, 78, 79, 80, 80, 81, 82,
83)
Example 2:
a<-78624
den<-c(1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 7, 7, 8, 8, 9, 9, 10, 10, 11,
11, 12, 13, 13, 14, 14, 15, 15, 16, 16, 17, 18, 19, 21, 22, 23,
28, 32, 35, 36, 37, 38, 43, 52, 55, 59, 62, 67, 76, 82, 89, 96,
101, 104, 115, 120, 126, 131, 134, 139, 143, 146, 153, 160, 165,
180, 193, 205, 212, 214, 221, 223, 227, 230, 233, 234, 235, 237,
239, 250, 253, 263, 269, 274, 279, 286, 288, 296, 298, 302, 307,
309, 315, 320, 324, 333, 337, 342, 347, 352)
Example 3
a<-33546
den<-c(1, 1, 1, 2, 4, 6, 8, 9, 12, 15, 17, 21, 25, 29, 31, 34, 37,
42, 45, 46, 51, 52, 56, 57, 58, 59, 63, 69, 69, 71, 76, 80, 81,
87, 93, 95, 102, 107, 108, 108, 112, 112, 118, 123, 124, 127,
132, 132, 132, 135, 136, 137, 150, 152, 162, 166, 169, 171, 174,
176, 178, 184, 189, 190, 193, 197, 198, 198, 201, 202, 203, 214,
218, 219, 223, 225, 227, 238, 240, 246, 248, 251, 254, 255, 257,
259, 260, 277, 282, 284, 285, 287, 288, 290, 294, 297, 321, 322,
342)
Example 4
a<-198132
den<-c(2, 3, 5, 6, 7, 9, 10, 12, 13, 14, 16, 17, 18, 20, 21, 23, 24,
25, 27, 28, 29, 31, 32, 34, 35, 36, 38, 39, 40, 42, 43, 45, 46,
47, 49, 50, 51, 53, 54, 56, 57, 58, 60, 61, 62, 64, 65, 67, 68,
69, 71, 72, 74, 75, 76, 78, 79, 80, 82, 83, 85, 86, 87, 89, 90,
91, 93, 94, 96, 97, 98, 100, 101, 102, 104, 105, 107, 108, 109,
111, 112, 113, 115, 116, 120, 123, 130, 139, 155, 165, 172, 176,
178, 181, 185, 190, 192, 198, 218)
As far as I understand what you're after, I would start by converting den to an rle object. (Here using data from your Example 3)
EDIT: Add 100% at day 364 to den
if(max(den)!=364) den <- c(den, 364)
(rleDen <- rle(den))
# Run Length Encoding
# lengths: int [1:92] 3 1 1 1 1 1 1 1 1 1 ... # 92 intervals
# values : num [1:92] 1 2 4 6 8 9 12 15 17 21 ...
percDur <- rleDen$lengths # Percentage of total duration in each interval
atDay <- rleDen$values # What day that percentage was reached
intWidth <- diff(c(0, atDay), k = 1) # Interval width
durPerDay <- 1440 # Max observation time per day
percPerDay <- durPerDay/a*100 # Max percentage per day
cumPercDur <- cumsum(percDur) # Cumulative percentage in each interval
maxPerInt <- pmin(percPerDay * diff(c(0, atDay), 1),
percDur + 1) # Max percent observation per interval
set.seed(1)
nsims <- 10 # Desired number of simulations
sampMat <- matrix(0, ncol = length(percDur), nrow = nsims) # Matrix to hold sim results
To allow for randomness while considering the limitation of a maximum 1440 minutes of observation per day, check to see if there are any long intervals (i.e., any intervals in which the jump in percentage cannot be completely achieved in that interval)
if(any(percDur > maxPerInt)){
longDays <- percDur > maxPerInt
morePerInt <- maxPerInt - percDur
perEnd <- c(which(diff(longDays,1) < 0), length(longDays))
# Group intervals into periods bounded by "long" days
# and determine if there are any long periods (i.e., where
# the jump in percentage can't be achieved in that period)
perInd <- rep(seq_along(perEnd), diff(c(0, perEnd)))
perSums <- tapply(percDur, perInd, sum)
maxPerPer <- tapply(maxPerInt, perInd, sum)
longPers <- perSums > maxPerPer
# If there are long periods, determine, starting with the last period, when the
# excess can be covered. Each group of periods is recorded in the persToWatch
# object
if(any(longPers)) {
maxLongPer <- perEnd[max(which(longPers))]
persToWatch <- rep(NA, length(maxLongPer))
for(kk in rev(seq_len(maxLongPer))) {
if(kk < maxLongPer && min(persToWatch, na.rm = TRUE) <= kk) next
theSums <- cumsum(morePerInt[order(seq_len(kk),
decreasing = TRUE)])
above0 <- which(rev(theSums) > 0)
persToWatch[kk] <- max(above0[which(!perInd[above0] %in% c(perInd[kk],
which(longPers)) & !above0 %in% which(longDays))])
}
}
}
Now we can start the randomness. The first component of the sampling determines the overall proportion of a that occurs in each of the intervals. How much? Let runif decide. The upper and lower limits must reflect the maximum observation time per day and the excess amount of any long days and periods
for(jj in seq_along(percDur[-1])) {
upperBound <- pmin(sampMat[, jj] + maxPerInt[jj],
cumPercDur[jj] + 1)
lowerBound <- cumPercDur[jj]
# If there are long days, determine the interval over which the
# excess observation time may be spread
if(any(percDur > maxPerInt) && any(which(longDays) >= jj)) {
curLongDay <- max(which(perInd %in% perInd[jj]))
prevLongDay <- max(0, min(which(!longDays)[which(!longDays) <= jj]))
curInt <- prevLongDay : curLongDay
# If there are also long periods, determine how much excess observation time there is
if(any(longPers) && maxLongPer >= jj) {
curLongPerHigh <- min(which(!is.na(persToWatch))[
which(!is.na(persToWatch)) >= jj])
curLongPerLow <- persToWatch[curLongPerHigh]
longInt <- curLongPerLow : curLongPerHigh
curExtra <- max(0,
cumPercDur[curLongPerHigh] -
sum(maxPerInt[longInt[longInt > jj]]) -
sampMat[, jj, drop = FALSE])
} else {
curExtra <- cumPercDur[curLongDay] -
(sum(maxPerInt[curInt[curInt > jj]]) +
sampMat[, jj, drop = FALSE])
}
# Set the lower limit for runif appropriately
lowerBound <- sampMat[, jj, drop = FALSE] + curExtra
}
# There may be tolerance errors when the observations are tightly
# packed
if(any(lowerBound - upperBound > 0)) {
if(all((lowerBound - upperBound) <= .Machine$double.eps*2*32)) {
upperBound <- pmax(lowerBound, upperBound)
} else {
stop("\nUpper and lower bounds are on the wrong side of each other\n",
jj,max(lowerBound - upperBound))
}
}
sampMat[, jj + 1] <- runif(nsims, lowerBound, upperBound)
}
Then add 100 percent to the end of the results and calculate the interval-specific percentage
sampMat2 <- cbind(sampMat[, seq_along(percDur)], 100)
sampPercDiff <- t(apply(sampMat2, 1, diff, k = 1))
The second component of the randomness determines the distribution of sampPercDiff over the interval widths intWidth. This still requires more thought in my opinion. For instance, how long does a typical episode last compared to the unit of time under consideration?
For each interval, determine if the random percentage needs to be allocated over multiple time units (in this case days). EDIT: Changed the following code to limit percentage increase when intWidth > 1.
library(foreach)
ep.dur<-foreach(ii = seq_along(intWidth),.combine=cbind)%do%{
if(intWidth[ii]==1){
ret<-sampPercDiff[, ii, drop = FALSE] * a / 100
dimnames(ret)<-list(NULL,atDay[ii])
ret
} else {
theDist<-matrix(numeric(0), ncol = intWidth[ii], nrow = nsims)
for(jj in seq_len(intWidth[ii]-1)){
theDist[, jj] <- floor(runif(nsims, 0, pmax(0,
min(sampPercDiff[, ii], floor(sampMat2[,ii + 1])-.Machine$double.eps -
sampMat2[,ii]) * a / 100 - rowSums(theDist, na.rm = TRUE))))
}
theDist[, intWidth[ii]] <- sampPercDiff[, ii] * a / 100 - rowSums(theDist,
na.rm = TRUE)
distOrder <- replicate(nsims, c(sample.int(intWidth[ii] - 1),
intWidth[ii]), simplify = FALSE)
ret <- lapply(seq_len(nrow(theDist)), function(x) {
theDist[x, order(distOrder[[x]])]
})
ans <- do.call(rbind, ret)
dimnames(ans) <- list(NULL, atDay[ii]-((intWidth[ii]:1)-1))
ans
}
}
The duration time is sampled randomly for each time unit (day) in the interval to which it is to be distributed. After breaking up the total duration into daily observed times, these are then assigned randomly to the days in the interval.
Then, multiply the sampled and distributed percentages by a and divide by 100
ep.dur[1, 1 : 6]
# 1 2 3 4 5 6
# 1095.4475 315.4887 1.0000 578.9200 13.0000 170.6224
ncol(ep.dur)
# [1] 364
apply(ep.dur, 1, function(x) length(which(x == 0)))
# [1] 131 133 132 117 127 116 139 124 124 129
rowSums(ep.dur)/a
# [1] 1 1 1 1 1 1 1 1 1 1
plot(ep.dur[1, ], type = "h", ylab = "obs time")
I would most probably do this with a ruby script but it could be done in R too. I am not sure whether it is your homework problem or not. As to answer your question: Can this be done problematically? Yes, Ofcourse!
According to your problem, my solution is to define the minimum and maximum limits with in which I could like to randomly pick a percentage that satisfies the conditions given by den vector and a value.
Since the den vector only contains 99% values, we cannot be sure when the 100% is going to happen. This condition yields my solution to be split into 3 parts - 1) For the given den vector upto 98% 2) For the 99% 3) Beyond 99%. I could define another function and put the common code in all these 3 parts in it but I haven't done so.
Since, I use runif command to generate random numbers, given the low-limit, it is unlikely that it will generate the exact low-limit value. Hence, I have defined a threshold value which I can check and if it falls below it, I would make it 0. You can have this or remove it. Also when you consider example 4, the first 1% is going to happen at 2nd day. So it means the 1st day could contain upto a maximum=0.999999% of the episode and then the 1% occurs on 2nd day. This is why the maximum limit is defined by subtracting a smallestdiff value, which can be changed.
FindMinutes=function(a,den){
if (a>1440*364){
Print("Invalid value for aa")
return("Invalid value for aa")
}
threshold=1E-7
smallestdiff=1E-6
sum_perc=0.0
start=1 #day 1
min=0 #minimum percentage value for a day
max=0 #maximum percentage value for a day
days=rep(c(0),364) #day vector with percentage of minutes - initialized to 0
maxperc=1440*100/a #maximum percentage wrto 1440 minutes/day
#############################################################
#############################################################
############ For the length of den vector ###################
for (i in 1:length(den)){
if (den[i]>start){
min=(i-1)-sum_perc
for(j in start:(den[i]-1)){#number of days in-between
if (j>start){ min=0 }
if (i-smallestdiff-sum_perc>=maxperc){
max=maxperc
if ((i-smallestdiff-sum_perc)/(den[i]-j)>=maxperc){
min=maxperc
}else{
if ((i-smallestdiff-sum_perc)/(den[i]-j-1)<maxperc){
min=maxperc-(i-smallestdiff-sum_perc)/(den[i]-j-1)
}else{
min=maxperc
}
}
}else{
max=i-smallestdiff-sum_perc
}
if ((r=runif(1,min,max))>=threshold){
days[j]=r
sum_perc=sum_perc+days[j]
}else{
days[j]=0.0
}
}
start=den[i]
}
}
#############################################################
#############################################################
#####################For the 99% ############################
min=99-sum_perc
for(j in start:den[length(den)]){
if (j>start){
min=0
}
max=100-sum_perc
if (100-sum_perc>=maxperc){
max=maxperc
if ((100-sum_perc)/(364+1-j)>=maxperc){
min=maxperc
}else{
if ((100-sum_perc)/(364-j)<maxperc){
min=maxperc-(100-sum_perc)/(364-j)
}else{
min=maxperc
}
}
}else{
max=100-sum_perc
}
if ((r=runif(1,min,max))>=threshold){
days[j]=r
sum_perc=sum_perc+days[j]
}else{
days[j]=0.0
}
}
#############################################################
#############################################################
##################### For the remaining 1%###################
min=0
for(j in den[length(den)]+1:364){
max=100-sum_perc
if (j==364){
min=max
days[j]=min
}else{
if (100-sum_perc>maxperc){
max=maxperc
if ((100-sum_perc)/(364+1-j)>=maxperc){
min=maxperc
}else{
if ((100-sum_perc)/(364-j)<maxperc){
min=maxperc-(100-sum_perc)/(364-j)
}else{
min=maxperc
}
}
}else{
max=100-sum_perc
}
if ((r=runif(1,min,max))>=threshold){
days[j]=r
}else{
days[j]=0.0
}
}
sum_perc=sum_perc+days[j]
if (sum_perc>=100.00){
break
}
}
return(days*a/100) #return as minutes vector corresponding to each 364 days
}#function
In my code, I randomly generate percentage values of episodes for each day according to the minimum and maximum value. Also, the condition (den vector) holds good when you round the percentage values to integers (days vector) but you might need extra tuning (which depends on checking the den vector further ahead and then re-tuning the minimum value of percentages) if you want it accurate upto few decimal places. You can also check to make sure that sum(FindMinutes(a,den)) is equal to a. If you want to define den in terms of 0.1%, you can do so but you need to change the corresponding equations (in min and max)
As the worst case scenario example, if you make a as the maximum value it can take and a corresponding den vector:
a=1440*364
den<-c(0)
cc=1
for(i in 1:363){
if (trunc(i*1440*100/(1440*364))==cc){
den[cc]=i
cc=cc+1
}
}
You can run the above example by calling the function: maxexamplemin=FindMinutes(a,den)
and you can check to see that all the days have the maximum minutes of 1440 which is the only possible scenario here.
As an illustration, let me run your example 3:
a<-33546
den<-c(1, 1, 1, 2, 4, 6, 8, 9, 12, 15, 17, 21, 25, 29, 31, 34, 37, 42, 45, 46, 51, 52, 56, 57, 58, 59, 63, 69, 69, 71, 76, 80, 81, 87, 93, 95, 102, 107, 108, 108, 112, 112, 118, 123, 124, 127, 132, 132, 132, 135, 136, 137, 150, 152, 162, 166, 169, 171, 174, 176, 178, 184, 189, 190, 193, 197, 198, 198, 201, 202, 203, 214, 218, 219, 223, 225, 227, 238, 240, 246, 248, 251, 254, 255, 257, 259, 260, 277, 282, 284, 285, 287, 288, 290, 294, 297, 321, 322, 342)
rmin=FindMinutes(a,den)
sum(rmin)
[1] 33546
rmin2=FindMinutes(a,den)
rmin3=FindMinutes(a,den)
plot(rmin,tpe="h")
par(new=TRUE)
plot(rmin2,col="red",type="h")
par(new=TRUE)
plot(rmin3,col="red",type="h")
and the 3 super-imposed plots is shown below :

Resources