I have the following model I want to calculate:
where
My data is the following:
D0= 100
variance = 3
T = 20
uf = 0.3
ue = 0.7
CRRA = 0.1
theta = 0.7
Q = 1
X1=1
# DIVIDENDS:
epsilon <- c(0,0, 6, rep(0,18))
DD_t <- D0
for (t in 2:(T+1)) {
DD_t[t] <- epsilon[t]+DD_t[t-1]
}
# PRICE in t0 t1 t2
PP_t<- c(rep(0,21))
PP_t[1] <- DD_t[1] - 0.1*(3^2)*(18 + (1/0.3))
PP_t[2] <- DD_t[2] + (ue/uf)*theta^(1-1) - CRRA*(variance^2)*Q*(T-1-1 + (1/uf))
PP_t[3] <- DD_t[3] + (ue/uf)*theta^(2-1) - CRRA*(variance^2)*Q*(T-2-1 + (1/uf))
Up until here, everything is correct. Now, I want to calculate P_3 to P_20 and this is were the following code yields the wrong solution unfortunately.
# PRICE in t3 t4 to t20
for (t in 3:(T-1)) {
t1 <- (t + 1)
XX_t <- 0
for (k in 1:(t-1)) {
XX_t <- XX_t + (theta^(k-1))*(PP_t[t1-k]-PP_t[t1-k-1])
}
XX_t <- (1-theta)*XX_t + theta^(t-1)*1
PP_t[t1] <- DD_t[t1] + XX_t*(ue/uf) - CRRA*(variance^2)*Q*(T - t - 1 + (1/uf))
}
What this gives me is:
PP_t
[1] 80.80000 83.13333 89.33333 95.22667 98.26400 98.66093 97.36051 95.53206 94.15252 93.77109 94.47277 95.99211 97.89801 99.77274 101.33323 102.47560 103.25258
[18] 103.81069 104.31816 104.90794 0.00000
However, the correct solution looks like this:
[1] 80.80000 83.13333 89.33333 94.08333 96.66333 97.22033 96.46413 95.28555 94.43488 94.33895 95.07011 96.43089 98.09417 99.74021 101.15038 102.24238 103.05407
[18] 103.69603 104.29659 104.95801 105.73360
I cannot find the mistake in my code unfortunately. It must be slight one, since the values are not that far off.
Related
I'm using the stats::filter function in R in order to understand ARIMA simulations in R (as in the function stats::arima.sim) and estiamtion. I know that stats::filter applies a linear filter to a vector or time series, but I'm not sure how to "unfilter" my series.
Consider the following example: I want to use a recursive filter with value 0.7 to my series x = 1:5 (which is essentially generating an AR(1) with phi=0.7). I can do so by:
x <- 1:5
ar <-0.7
filt <- filter(x, ar, method="recursive")
filt
Time Series:
Start = 1
End = 5
Frequency = 1
[1] 1.0000 2.7000 4.8900 7.4230 10.1961
Which returns me essentially c(y1,y2,y3,y4,y5) where:
y1 <- x[1]
y2 <- x[2] + ar*y1
y3 <- x[3] + ar*y2
y4 <- x[4] + ar*y3
y5 <- x[5] + ar*y4
Now imagine I have the y = c(y1,y2,y3,y4,y5) series. How can I use the filter function to return me the original series x = 1:5?
I can write a code to do it like:
unfilt <- rep(NA, 5)
unfilt[1] <- filt[1]
for(i in 2:5){
unfilt[i] <- filt[i] - ar*filt[i-1]
}
unfilt
[1] 1 2 3 4 5
But I do want to use the filter function to do so, instead of writing my own function. How can I do so? I tried stats::filter(filt, -ar, method="recursive"), which returns me [1] 1.0000 2.0000 3.4900 4.9800 6.7101 not what I desire.
stats::filter used with the recursive option is a particular case of an ARMA filter.
a[1]*y[n] + a[2]*y[n-1] + … + a[n]*y[1] = b[1]*x[n] + b[2]*x[m-1] + … + b[m]*x[1]
You could implement this filter with the signal package which allows more options than stat::filter :
a = c(1,-ar)
b = 1
filt_Arma <- signal::filter(signal::Arma(b = b, a = a),x)
filt_Arma
# Time Series:
# Start = 1
# End = 5
# Frequency = 1
# [1] 1.0000 2.7000 4.8900 7.4230 10.1961
identical(filt,filt_Arma)
# [1] TRUE
Reverting an ARMA filter can be done by switching b and a, provided that the inverse filter stays stable (which is the case here):
signal::filter(signal::Arma(b = a, a = b),filt)
# Time Series:
# Start = 2
# End = 6
# Frequency = 1
# [1] 1 2 3 4 5
This corresponds to switching numerator and denominator in the z-transform:
Y(z) = a(z)/b(z) X(z)
X(z) = b(z)/a(z) Y(z)
Create a simulated dataset of 100 observations, where x is a random normal variable with mean 0 and standard deviation 1, and y = 0.1 + 2 * X + e, where epsilon is also a random normal error with mean 0 and sd 1.
set.seed(1)
# simulate a data set of 100 observations
x <- rnorm(100)
y.1 <- 0.1 + 2*x + rnorm(100)
Now extract the first 5 observations.
y1.FirstFive <- (y.1[1:5]) # extract first 5 observations from y
x.FirstFive <- (x[1:5]) # extract first 5 observations from x
y1.FirstFive # extracted 5 observations from y1
[1] -1.7732743 0.5094025 -2.4821789 3.4485904 0.1044309
x.FirstFive # extracted 5 observations from x
[1] -0.6264538 0.1836433 -0.8356286 1.5952808 0.3295078
Assuming the mean and sd of the sample that you calculated from the first five observations would not change, what is the minimum total number of additional observations you would need to be able to conclude that the true mean of the population is different from 0 at the p = 0.01 confidence level?
alpha <- 0.01
mu <- 0
for (i in 5:2000) {
# Recalculate the standard error and CI
stand_err <- Sd_y1 / sqrt(i)
ci <- sample_mean_y1 + c(qt(alpha/2, i-1), qt(1-alpha/2, i-1))*stand_err
if (ci[2] < mu)
break # condition met, exit loop
}
i
[1] 2000
Here, I wrote a loop that iteratively increases n from the initial n=5 to n=2000, uses pt to find the p value (given a fixed y-bar and sd), and stops when p < 0.01. However I keep getting the wrong output. Such that, the output is always the number of the maximum range that I give (here, it is 2000) instead of giving me the specific minimum n sample in order to reject the null that mu_y = 0 at the p=0.01 level. Any suggestions as to how to fix the code?
additional info: the sd of y1.FirstFive = 2.3 and mean of y1.FirstFive = -0.04
Assuming:
Sd_y1 = sd(y1.FirstFive)
sample_mean_y1 = mean(y1.FirstFive)
sample_mean_y1
[1] -0.03860587
As pointed out by #jblood94, you need to go for larger sample size.
You don't need a for loop for this, most of your functions are vectorized, so something like this:
n = 5:30000
stand_err = Sd_y1 / sqrt(n)
ub = sample_mean_y1 + qt(1-alpha/2, n-1)*stand_err
n[min(which(ub<0))]
[1] 23889
It's because n > 2000.
set.seed(1)
x <- rnorm(100)
y.1 <- 0.1 + 2*x + rnorm(100)
Sd_y1 <- sd(y.1[1:5])
sample_mean_y1 <- mean(y.1[1:5])
alpha <- 0.01
sgn <- 2*(sample_mean_y1 > 0) - 1
f <- function(n) qt(alpha/2, n - 1)*Sd_y1 + sgn*sample_mean_y1*sqrt(n)
upper <- 2
while (f(upper) < 0) upper <- upper*2
(n <- ceiling(uniroot(f, lower = upper/2, upper = upper, tol = 0.5)$root))
#> [1] 23889
I have the following data, which shows the values for 5 different cohorts of patients (3 patients in each cohort):
dat <- data.frame(Cohort=c(1,1,1, 2,2,2, 3,3,3, 4,4,4, 5,5,5),
LEN_Dose=c(15,15,15, 25,25,25, 15,15,15, 10,10,10, 10,10,10),
DLT=c("N","N","N", "Y","Y","N", "Y","N","Y", "N","N","Y", "N","N","Y"))
I would like to modify the cohort levels to be +/- 0.2 of the main cohort number so they don't sit on top of one another in a graph. I can achive what I want like this:
dat$Cohort <- dat$Cohort-0.2
dat$Cohort <- ifelse(duplicated(dat$Cohort), dat$Cohort+0.2, dat$Cohort)
dat$Cohort <- ifelse(duplicated(dat$Cohort), dat$Cohort+0.2, dat$Cohort) # have to run this twice as there are 3 patients
So the result is:
head(dat)
# Cohort LEN_Dose DLT
# 0.8 15 N
# 1.0 15 N
# 1.2 15 N
# 1.8 25 Y
# 2.0 25 Y
# 2.2 25 N
But I'm wondering if there's a better way to do this? Eg somehow inputting the base cohort level and some function automatically works out the 3 values I need?
The point is to eventually graph the data using this graph:
ggplot(aes(x=Cohort, y=as.numeric(LEN_Dose)), data = dat) +
ylab("Dose Level\n") +
xlab("\nCohort") +
ggtitle("\n") +
scale_y_continuous(breaks = c(5, 10, 15, 25),
label = c("1.2mg/kg\n5mg", "1.2mg/kg\n10mg", "1.8mg/kg\n15mg", "1.8mg/kg\n25mg")) +
scale_fill_manual(values = c("white", "darkred"),
name="Had DLT") +
geom_line(colour="grey20", size=1) +
geom_point(shape=23, size=6, aes(fill=DLT), stroke=1.1, colour="grey20") + # 21 for circles
theme_classic() +
theme(legend.box.margin=margin(c(0,0,0,-10))) +
expand_limits(y=c(5,25))
EDIT: I have tried position = position_jitter, position = position_dodge and all the other types of positions within ggplot itself, but they don't space the points equally or in any particular order, which is why I'm trying to modify the dataframe itself
How about writing your jitter function, something like:
jitterit<- function(xTojitter= dat$Cohort, howMuchjitter=0.2){
x<-xTojitter
uni<-unique(x)
for (i in 1:length(uni)) {
if (is.na(uni[i])) {
x[is.na(x)]<-NA
} else if (sum(x==uni[i], na.rm = T) %%2 ==1) {
if(sum(x==uni[i], na.rm = T)==1){x[x==uni[i] & !is.na(x)][middle] <- uni[i]
} else {
middle<-ceiling (sum(x==uni[i], na.rm = T)/2)
x[x==uni[i] & !is.na(x)][1:(middle-1)] <- uni[i] - howMuchjitter
x[x==uni[i] & !is.na(x)][(middle+1):sum(x==uni[i], na.rm = T) ]<- uni[i] + howMuchjitter
x[x==uni[i] & !is.na(x)][middle] <- uni[i]
}} else if (sum(x==uni[i], na.rm = T) %%2 ==0) {
x[x==uni[i] & !is.na(x)]<- rep(c(uni[i] - howMuchjitter,uni[i] + howMuchjitter), each= sum(x==uni[i],na.rm = T)/2)
}
}
return(x)
}
It will work for all kind of duplicated data (even or odd number of duplication)
jitterit(xTojitter = c(1,1,2,1,2,NA), howMuchjitter=0.2)
[1] 0.8 1.0 1.8 1.2 2.2 NA
My question is pretty simple but I'm a new R user...
So I have a function which took an argument. I want to put the results in a vector with a specific name for each call to the function.
My function
`Window_length<-function(x,y) {
first_interval<-length(which(x <= -1.5))
second_interval<-length(which(x <= -1 & x > -1.5 ))
third_interval<-length(which(x <= -0.5 & x > -1 ))
fourth_interval<-length(which(x <= 0 & x > -0.5 ))
fifth_interval<-length(which(x <= 0.5 & x > 0 ))
sixth_interval<-length(which(x <= 1 & x > 0.5 ))
seventh_interval<-length(which(x <= 1.5 & x > 1 ))
eighth_interval<-length(which(x <= 2 & x > 1.5 ))
ninth_interval<-length(which(x > 2 ))
y <<- c(
rep("1",first_interval),
rep("2",second_interval),
rep("3",third_interval),
rep("4",fourth_interval),
rep("5",fifth_interval),
rep("6",sixth_interval),
rep("7",seventh_interval),
rep("8",eighth_interval),
rep("9",ninth_interval))}`
So when I call Window_length, I want to put the results into a given variable for example :
`Window_length(data,output_result)`
In output_result I expect to have the "y" values.
Also I'm sure that my code is not perfect at all. If someone can help me to optimized a little bit my code it's will be nice.
I'm trying to make all of this because I need to make a plot with ggplot of data. My value are between -4 and +3. And I want to create a plot with specific window ( <-1.5 / -1.5:-1 / -1:-0.5 / -0.5:0 / 0:1 / 1:1.5 / 1.5:2 / >2 )
My data :
data<- c(-3.7865964 -3.7865964 -3.1975372 -3.1975372 -3.169925 -3.1292830 -3.1292830 -2.6629650 -2.4739312 -2.4739312 -2.3536370 -2.3536370 -2.2446224 -2.2446224 -2.0000000 -1.8744691 -1.8744691 -1.7705182 -1.7655347 -1.7655347 -1.7472339 -1.7472339 -1.7062688 -1.7036070........... 1.8744691 1.8744691 2.0000000 2.2446224 2.2446224 2.3536370)
length(data)=21685
To_Be_Plot = data.frame(data,y)
fig1<-ggplot(To_Be_Plot, aes(x=y, y=data))+geom_boxplot()
expected results :
Thanks everyone
One solution, if I understood correctly the issue, would be to use the function cut:
x <- seq(-2.9, 3, l=5000)
FC <- sin(x*pi) + x^2/10 + 0.1*rnorm(5000)
dat <- data.frame(x, FC)
dat$windows <- cut(dat$x, breaks = seq(-3, 3, by=1))
ggplot(data=dat, aes(x, FC, color=windows)) +
geom_boxplot() + theme_bw()
The resulting command plot boxplots to display the windows.
I am trying to simulate cell uptake in R, having ported a model from Berkeley Madonna. The model is comprised of several constants and differential equations to calculate amounts and concentrations. A portion of the code is listed:
library(deSolve)
fb = 0.0510
Km = 23.5
Pdif = 0.429
Vmax = 270
Vol_cell = 9.33
Vol_media = 150
S = 10 #concentration of dosing media
yini = c(Amt_media=(S*Vol_media)-(S*fb*Vol_cell),
Amt_cell=S*fb*Vol_cell,
Amt_total=S*Vol_media,
Con_media=S-(S*fb),
Con_cell=S*fb)
Uptake = function(t, y, p){
dy1 = (- (Pdif * y[1]) + (Pdif * y[2]) - ((Vmax * y[4])/(Km + y[4])))
dy2 = (+ (Pdif * y[1]) - (Pdif * y[2]) + ((Vmax * y[4])/(Km + y[4])))
dy3 = dy1 + dy2
dy4 = dy1 / Vol_media
dy5 = dy2 / Vol_cell
list(c(dy1, dy2, dy3, dy4, dy5))}
times1 = seq(from=0, to=15, by=0.01)
out1 = ode(y=yini, times=times1, func=Uptake, parms=NULL, method="rk4")
The rest of the code is for output to dataframes and plotting. My question then is how to have the code structured to use "S" as a list of several concentrations such that each concentration can be applied to the differential equations (essentially giving me an out1 for S1, out2 for S2, etc, that can then be passed onto a dataframe)? In Berkeley Madonna this was achieved by writing over 35 differential equations, though I'd like to use a simplified approach in R if possible.
The only part where S is used is in the initialization of the yini values. Basically we just need to move that part and the part that runs ode with those values into a new function. Then you can call that function for what ever values you want. For example
#set up
library(deSolve)
fb <- 0.0510
Km <- 23.5
Pdif <- 0.429
Vmax <- 270
Vol_cell <- 9.33
Vol_media <- 150
Uptake <- function(t, y, p){
dy1 = (- (Pdif * y[1]) + (Pdif * y[2]) - ((Vmax * y[4])/(Km + y[4])))
dy2 = (+ (Pdif * y[1]) - (Pdif * y[2]) + ((Vmax * y[4])/(Km + y[4])))
dy3 = dy1 + dy2
dy4 = dy1 / Vol_media
dy5 = dy2 / Vol_cell
list(c(dy1, dy2, dy3, dy4, dy5))}
times1 <- seq(from=0, to=15, by=0.01)
# function with S as a parameter
runConc <- function(S) {
yini <- c(Amt_media=(S*Vol_media)-(S*fb*Vol_cell),
Amt_cell=S*fb*Vol_cell,
Amt_total=S*Vol_media,
Con_media=S-(S*fb),
Con_cell=S*fb)
ode(y=yini, times=times1, func=Uptake, parms=NULL, method="rk4")
}
#run for concentrations 10,20,30
out <- lapply(c(10,20,30), runConc)
This will result in a list object with the results for each concentration. So out[[1]] is the result for S=10, out[[2]] is S=20, etc. We can see the first few lines of each of the results with
lapply(out, head, 3)
# [[1]]
# time Amt_media Amt_cell Amt_total Con_media Con_cell
# [1,] 0.00 1495.242 4.75830 1500 9.490000 0.510000
# [2,] 0.01 1488.103 11.89710 1500 9.442408 1.275145
# [3,] 0.02 1481.028 18.97216 1500 9.395241 2.033457
#
# [[2]]
# time Amt_media Amt_cell Amt_total Con_media Con_cell
# [1,] 0.00 2990.483 9.51660 3000 18.98000 1.020000
# [2,] 0.01 2976.550 23.44980 3000 18.88711 2.513377
# [3,] 0.02 2962.739 37.26072 3000 18.79504 3.993646
#
# [[3]]
# time Amt_media Amt_cell Amt_total Con_media Con_cell
# [1,] 0.00 4485.725 14.27490 4500 28.47000 1.53000
# [2,] 0.01 4465.153 34.84653 4500 28.33286 3.73489
# [3,] 0.02 4444.761 55.23920 4500 28.19690 5.92060