I have a set of programs and for each program, it contains many subprograms, of which, one subprogram has the longest runtime. My goal is to calculate the the average ratio of (longest runtime)/(entire program runtime).
I want to know what is the right way to do so.
> program longest runtime entire runtime ratio
>
> 1 10 secs 50 secs 0.2
>
> 2 5 secs 40 secs 0.125
>
> 3 1 secs 10 secs 0.1
>
> 4 20 secs 80 secs 0.25
>
> 5 15 secs 20 secs 0.75
So I want to see how much percentage the longest runtime takes of the entire runtime.
There are two ways to do so:
1: compute the ratio for each program and then calculate the average of the ratios.
(0.2 + 0.125 + 0.1 + 0.25 + 0.75) / 5 = 1.425 / 5 = 0.285
2: compute the sum of longest runtime and then divided by the sum of entire runtime.
sum_longest = 41 secs
sum_entire = 200 secs
average = 41 / 200 = 0.205
which way is correct?
I'd say that your latter answer (getting .205) is correct, because your first method does not take the weights (i.e. how long it takes each program to run) into account.
Related
I am using the R package deSolve to solve systems of ordinary differential equations. In the 'systems dynamics' literature, delays of the inflowing and outflowing rates can be modelled using average delay times. For instance, the rate of a change of a stock Y at time t could be:
dy(t)/dt = inflow(t) - ( outflow(t) / D )
where the delay time D is, e.g. 4 time steps. The delay is assumed to be an average delay time.
However, another way of modelling delays would be to assume a more discrete event case, where the outflow is equal to the amount inflowing to the stock D time units previously, thus:
dy(t)/dt = inflow(t) - inflow(t - D)
In deSolve, we can use the lagvalue and lagderiv functions with the dede solver function to specify delay differential equations which utilise lagged values of the state variables, but I cannot seem to find a way of asking deSolve to use lagged values of the inflow/outflow rates.
For example, take the simple model:
m<- function(t,y,p){
with(as.list(c(y,p)),{
inflow <- 100
outflow <- y*.5
dy <- inflow - outflow
return(list(c(dy), inflow=inflow, outflow=outflow))
})}
fit <- ode(func=m, y=c(100),t=seq(0,10,1),p=c(), method="euler")
time 1 inflow outflow
1 0 100.0000 100 50.00000
2 1 150.0000 100 75.00000
3 2 175.0000 100 87.50000
4 3 187.5000 100 93.75000
5 4 193.7500 100 96.87500
6 5 196.8750 100 98.43750
7 6 198.4375 100 99.21875
8 7 199.2188 100 99.60938
9 8 199.6094 100 99.80469
10 9 199.8047 100 99.90234
11 10 199.9023 100 99.95117
Using dede, I can make the outflow a lagged value of the state variable at D = 2 time steps previous:
m2<- function(t,y,p){
with(as.list(c(y,p)),{
inflow <- 100
if(t < D) outflow <- y*.5
if(t >= D) outflow <- lagvalue(t-D,1)*.5
dy <- inflow - outflow
return(list(c(dy), inflow=inflow, outflow=outflow))
})}
fit2 <- dede(func=m, y=c(100),t=seq(0,10,1),p=c(D=2))
time 1 inflow outflow
1 0 100.0000 100 50.00000
2 1 139.3469 100 69.67344
3 2 163.2120 100 81.60602
4 3 177.6870 100 88.84349
5 4 186.4665 100 93.23323
6 5 191.7915 100 95.89575
7 6 195.0213 100 97.51064
8 7 196.9803 100 98.49013
9 8 198.1684 100 99.08422
10 9 198.8891 100 99.44456
11 10 199.3262 100 99.66312
But now imagine I want the outflow to actually be the inflow D=2 time steps previous. I want something like:
**** Code will not run ****
m3<- function(t,y,p){
with(as.list(c(y,p)),{
inflow <- 100
if(t < D) outflow <- 0
if(t >= D) outflow <- lagvalue(t-D,inflow)
dy <- inflow - outflow
return(list(c(dy), inflow=inflow, outflow=outflow))
})}
...
As far as I can see, deSolve does not allow this. Is there an easy way to allow it?
The reason I am interested in mixing a continuous and discrete event type model is in modelling supply chains, where the average time delay may not be accurate for certain products.
I am trying to calculate in R the velocity from acceleration in a data frame where the first value is fixed at 0. I would like to use v=u+at to find the velocity from velocity[2:nrow(trial.data)] where t is a constant 0.002. The initial data frame looks like this:
trial.data <- data.table("acceleration" = sample(-5:5,5), "velocity" = c(0))
acceleration velocity
1 0 0
2 5 0
3 -1 0
4 3 0
5 4 0
I have tried using lag from the second row however this gives a value of zero with the correct value in row 3 with other values following also being incorrect.
trial.data$velocity[2:nrow(trial.data)] =
(lag(trial.data$velocity,default=0)) + trial.data$acceleration * 0.002
acceleration velocity
1 0 0.000
2 5 0.000
3 -1 0.010
4 3 -0.002
5 4 0.006
Velocity is accumulated acceleration, so use cumsum:
trial.data <- data.table("acceleration" = c(0,5,-1,3,4))
u <- 0 #starting velocity
velocity <- c(u,u+cumsum(trial.data$acceleration)*0.002)
trial.data$velocity <- velocity[-length(velocity)]
Output:
> trial.data
acceleration velocity
1: 0 0.000
2: 5 0.000
3: -1 0.010
4: 3 0.008
5: 4 0.014
Note the the velocity vector had a final element (which happens to be 0.022) which was neglected when reading it into the data table, since otherwise the columns would be of unequal length. The above code starts with u = 0, but the u could be changed to any other starting velocity and the code would work as intended.
I have a question about creating vectors. If I do a <- 1:10, "a" has the values 1,2,3,4,5,6,7,8,9,10.
My question is how do you create a vector with specific intervals between its elements. For example, I would like to create a vector that has the values from 1 to 100 but only count in intervals of 5 so that I get a vector that has the values 5,10,15,20,...,95,100
I think that in Matlab we can do 1:5:100, how do we do this using R?
I could try doing 5*(1:20) but is there a shorter way? (since in this case I would need to know the whole length (100) and then divide by the size of the interval (5) to get the 20)
In R the equivalent function is seq and you can use it with the option by:
seq(from = 5, to = 100, by = 5)
# [1] 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100
In addition to by you can also have other options such as length.out and along.with.
length.out: If you want to get a total of 10 numbers between 0 and 1, for example:
seq(0, 1, length.out = 10)
# gives 10 equally spaced numbers from 0 to 1
along.with: It takes the length of the vector you supply as input and provides a vector from 1:length(input).
seq(along.with=c(10,20,30))
# [1] 1 2 3
Although, instead of using the along.with option, it is recommended to use seq_along in this case. From the documentation for ?seq
seq is generic, and only the default method is described here. Note that it dispatches on the class of the first argument irrespective of argument names. This can have unintended consequences if it is called with just one argument intending this to be taken as along.with: it is much better to use seq_along in that case.
seq_along: Instead of seq(along.with(.))
seq_along(c(10,20,30))
# [1] 1 2 3
Use the code
x = seq(0,100,5) #this means (starting number, ending number, interval)
the output will be
[1] 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75
[17] 80 85 90 95 100
Usually, we want to divide our vector into a number of intervals.
In this case, you can use a function where (a) is a vector and
(b) is the number of intervals. (Let's suppose you want 4 intervals)
a <- 1:10
b <- 4
FunctionIntervalM <- function(a,b) {
seq(from=min(a), to = max(a), by = (max(a)-min(a))/b)
}
FunctionIntervalM(a,b)
# 1.00 3.25 5.50 7.75 10.00
Therefore you have 4 intervals:
1.00 - 3.25
3.25 - 5.50
5.50 - 7.75
7.75 - 10.00
You can also use a cut function
cut(a, 4)
# (0.991,3.25] (0.991,3.25] (0.991,3.25] (3.25,5.5] (3.25,5.5] (5.5,7.75]
# (5.5,7.75] (7.75,10] (7.75,10] (7.75,10]
#Levels: (0.991,3.25] (3.25,5.5] (5.5,7.75] (7.75,10]
I have a file that contains two columns (Time , VA). The file is large and I managed to read it in R(used read and subset -not a practical for large file). Now, I want to do sampling based on the time, where each sample has a sample size and sample shift. Sample size is fixed value for the whole process of sampling e.g. sampleSize=10 second. Sample shift is the start point for each new sample (after First sample). For example, if sampleShift =4 sec and the sampleSize is 10 sec , that means the second sample will start from 5 sec and add 10 sec as the sample sample size=10 sec. For each sample I want feed the
-VA- values to a function to some calculation.
Sampling <- function(values){
# Perform the sampling
lastRowNumber<- #specify the last row manually
sampleSize<-10
lastValueInFile<-lastRowNumber-sampleSize
for (i in 1: (lastValueInFile ) ){
EndOfShift<-9+i
sample<-c(1:sampleSize)
h<-1
for(j in i:EndOfShift){
sample[h] <- values[j,1]
h<-h+1
}
print(sample)
#Perform the Calculation on the extracted sample
#--Samp_Calculation<-SomFunctionDoCalculation(sample)
}
}
The problems with my try are:
1) I have to specify the lastRow number manually for each file I read.
2) I was trying to do the sampling based on rows number not the Time value. Also, the shift was by one for each sample.
file sample:
Time VA
0.00000 1.000
0.12026 2.000
0.13026 2.000
0.14026 2.000
0.14371 3.000
0.14538 4.000
..........
..........
15.51805 79.002
15.51971 79.015
15.52138 79.028
15.52304 79.040
15.52470 79.053
.............
Any suggestion for more professional way ?
I've generated some test data as follows:
val <- data.frame (time=seq(from=0,to=15,by=0.01),VA=c(0:1500))
... then the function:
sampTime <- function (values,sampTimeLen)
{
# return a data frame for a random sample of the data frame -values-
# of length -sampTimeLen-
minTime <- values$time[1]
maxTime <- values$time[length(values$time)] - sampTimeLen
startTime <- runif(1,minTime,maxTime)
values[(values$time >= startTime) & (values$time <= (startTime+sampTimeLen)),]
}
... can be used as follows:
> sampTime(val,0.05)
time VA
857 8.56 856
858 8.57 857
859 8.58 858
860 8.59 859
861 8.60 860
... which I think is what you were looking for.
(EDIT)
Following the clarification that you want a sample from a specific time rather than a random time, this function should give you that:
sampTimeFrom <- function (values,sampTimeLen,startTime)
{
# return a data frame for sample of the data frame -values-
# of length -sampTimeLen- from a specific -startTime-
values[(values$time >= startTime) & (values$time <= (startTime+sampTimeLen)),]
}
... which gives:
> sampTimeFrom(val,0.05,0)
time VA
1 0.00 0
2 0.01 1
3 0.02 2
4 0.03 3
5 0.04 4
6 0.05 5
> sampTimeFrom(val,0.05,0.05)
time VA
6 0.05 5
7 0.06 6
8 0.07 7
9 0.08 8
10 0.09 9
11 0.10 10
If you want multiple samples, they can be delivered with sapply() like this:
> samples <- sapply(seq(from=0,to=0.15,by=0.05),function (x) sampTimeFrom(val,0.05,x))
> samples[,1]
$time
[1] 0.00 0.01 0.02 0.03 0.04 0.05
$VA
[1] 0 1 2 3 4 5
In this case the output will overlap but making the sampTimeLen very slightly smaller than the shift value (which is shown in the by= parameter of the seq) will give you non-overlapping samples. Alternatively, one or both of the criteria in the function could be changed from >= or <= to > or <.
I have a data frame with around 25000 records and 10 columns. I am using code to determine the change to the previous value in the same column (NewVal) based on another column (y) with a percent change already in it.
x=c(1:25000)
y=rpois(25000,2)
z=data.frame(x,y)
z[1,'NewVal']=z[1,'x']
So I ran this:
for(i in 2:nrow(z)){z$NewVal[i]=z$NewVal[i-1]+(z$NewVal[i-1]*(z$y[i]/100))}
This takes considerably longer than I expected it to. Granted I may be an impatient person - as a scathing letter drafted to me once said - but I am trying to escape the world of Excel (after I read http://www.burns-stat.com/pages/Tutor/spreadsheet_addiction.html, which is causing me more problems as I have begun to mistrust data - that letter also mentioned my trust issues).
I would like to do this without using any of the functions from packages as I would like to know what the formula for creating the values is - or if you will, I am a demanding control freak according to that friendly missive.
I would also like to know how to get a moving average just like rollmean in caTools. Either that or how do I figure out what their formula is? I tried entering rollmean and I think it refers to another function (I am new to R). This should probably be another question - but as that letter said, I don't ever make the right decisions in my life.
The secret in R is to vectorise. In your example you can use cumprod to do the heavy lifting:
z$NewVal2 <- x[1] * cumprod(with(z, 1 +(c(0, y[-1]/100))))
all.equal(z$NewVal, z$NewVal2)
[1] TRUE
head(z, 10)
x y NewVal NewVal2
1 25 4 25.00000 25.00000
2 24 3 25.75000 25.75000
3 23 0 25.75000 25.75000
4 22 1 26.00750 26.00750
5 21 3 26.78773 26.78773
6 20 2 27.32348 27.32348
7 19 2 27.86995 27.86995
8 18 3 28.70605 28.70605
9 17 4 29.85429 29.85429
10 16 2 30.45138 30.45138
On my machine, the loop takes just less than 3 minutes to run, while the cumprod statement is virtually instantaneous.
I got about a 800-fold improvement with Reduce:
system.time(z[, "NewVal"] <-Reduce("*", c(1, 1+z$y[-1]/100), accumulate=T) )
user system elapsed
0.139 0.008 0.148
> head(z)
x y NewVal
1 1 1 1.000
2 2 1 1.010
3 3 1 1.020
4 4 5 1.071
5 5 1 1.082
6 6 2 1.103
7 7 2 1.126
8 8 3 1.159
9 9 0 1.159
10 10 1 1.171
> system.time(for(i in 2:nrow(z)){z$NewVal[i]=z$NewVal[i-1]+
(z$NewVal[i-1]*(z$y[i]/100))})
user system elapsed
37.29 106.38 143.16