Delaying parameter values/inflow and outflowing rates in deSolve - r

I am using the R package deSolve to solve systems of ordinary differential equations. In the 'systems dynamics' literature, delays of the inflowing and outflowing rates can be modelled using average delay times. For instance, the rate of a change of a stock Y at time t could be:
dy(t)/dt = inflow(t) - ( outflow(t) / D )
where the delay time D is, e.g. 4 time steps. The delay is assumed to be an average delay time.
However, another way of modelling delays would be to assume a more discrete event case, where the outflow is equal to the amount inflowing to the stock D time units previously, thus:
dy(t)/dt = inflow(t) - inflow(t - D)
In deSolve, we can use the lagvalue and lagderiv functions with the dede solver function to specify delay differential equations which utilise lagged values of the state variables, but I cannot seem to find a way of asking deSolve to use lagged values of the inflow/outflow rates.
For example, take the simple model:
m<- function(t,y,p){
with(as.list(c(y,p)),{
inflow <- 100
outflow <- y*.5
dy <- inflow - outflow
return(list(c(dy), inflow=inflow, outflow=outflow))
})}
fit <- ode(func=m, y=c(100),t=seq(0,10,1),p=c(), method="euler")
time 1 inflow outflow
1 0 100.0000 100 50.00000
2 1 150.0000 100 75.00000
3 2 175.0000 100 87.50000
4 3 187.5000 100 93.75000
5 4 193.7500 100 96.87500
6 5 196.8750 100 98.43750
7 6 198.4375 100 99.21875
8 7 199.2188 100 99.60938
9 8 199.6094 100 99.80469
10 9 199.8047 100 99.90234
11 10 199.9023 100 99.95117
Using dede, I can make the outflow a lagged value of the state variable at D = 2 time steps previous:
m2<- function(t,y,p){
with(as.list(c(y,p)),{
inflow <- 100
if(t < D) outflow <- y*.5
if(t >= D) outflow <- lagvalue(t-D,1)*.5
dy <- inflow - outflow
return(list(c(dy), inflow=inflow, outflow=outflow))
})}
fit2 <- dede(func=m, y=c(100),t=seq(0,10,1),p=c(D=2))
time 1 inflow outflow
1 0 100.0000 100 50.00000
2 1 139.3469 100 69.67344
3 2 163.2120 100 81.60602
4 3 177.6870 100 88.84349
5 4 186.4665 100 93.23323
6 5 191.7915 100 95.89575
7 6 195.0213 100 97.51064
8 7 196.9803 100 98.49013
9 8 198.1684 100 99.08422
10 9 198.8891 100 99.44456
11 10 199.3262 100 99.66312
But now imagine I want the outflow to actually be the inflow D=2 time steps previous. I want something like:
**** Code will not run ****
m3<- function(t,y,p){
with(as.list(c(y,p)),{
inflow <- 100
if(t < D) outflow <- 0
if(t >= D) outflow <- lagvalue(t-D,inflow)
dy <- inflow - outflow
return(list(c(dy), inflow=inflow, outflow=outflow))
})}
...
As far as I can see, deSolve does not allow this. Is there an easy way to allow it?
The reason I am interested in mixing a continuous and discrete event type model is in modelling supply chains, where the average time delay may not be accurate for certain products.

Related

Error in using 'ddply' and 'glm' for k-value estimation (hyperbolic delay discounting) in R

I am trying to find the most suited k-value (or discount rate) that best explains my participants' choices for immediate vs delayed reward (where lower k value means they choose a lot of immediate options and higher k value means they are more "patient".)
SS = Smaller Sooner; LL = Larger Later Reward; Delay = in Days; Choice = 0:SS, 1:LL; SV = Subjective Value.
So first I assign 5001 potential k values or discount rates to each trial (from -50 to 200 in steps of 0.05) which results in a data frame with 8001600 rows (50 participants * 32 trials per participant * 5001 potential values).
This is how the k-values were assigned to the data -
uniquek<- c(seq(-50,200,0.05))
DataSoc <- do.call(rbind,lapply(1:length(uniquek),function(i) data.frame(i,SocialData)))
k <- rep(uniquek, times = nrow(SocialData))
DataSoc$k <- k
Then I create an empty data frame (called 'data_simulation' here) with 3 columns (PPN_f, k, r_squared) each 8001600 rows long.
Then I try to apply 'ddply' to a data frame to be able to perform a logistic regression using glm, something like this -
data_simulation <- ddply(DataSoc,.(PPN_f,k), function(x){
r_squared <- summary(glm(Choice ~ SV_diff, x, family=binomial()))$r_squared
return(data.frame(r_squared))}, .progress ="win")
Ideally, this would give me the r_squared values for each trial, after which I would find the k with the largest r_squared value for each participant, and assign the corresponding k-value to that participant.
BUT the regression just isn't going through. Could you help solve this issue?
Here's the first 6 rows of my raw data for reference. Thank you for your help!
> head(SocialData)
PPN_f (Participant as factor) SS LL Delay Choice SS_SV LL_SV SV_diff
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 5e7339dac6b16528d49937bc 1000 30000 60 1 1000 1000 0
2 5e7339dac6b16528d49937bc 1000 5000 60 0 1000 1000 0
3 5e7339dac6b16528d49937bc 1000 10000 60 1 1000 1000 0
4 5e7339dac6b16528d49937bc 1000 5000 30 0 1000 1000 0
5 5e7339dac6b16528d49937bc 1000 5000 5 1 1000 1000 0
6 5e7339dac6b16528d49937bc 1000 2500 14 0 1000 1000 0

Friedman's test manual calculation

I have obtained a negative value from the Friedman's test. The data is:
Full MIC ReliefF LCorrel InfoGain
equinox 69.939 80.178 78.794 75.205 62.268
lucene 78.175 84.103 79.017 82.044 75.564
mylyn 75.531 78.006 77.161 47.711 81.575
pde 70.282 82.686 81.884 75.07 79.476
jdt 71.675 93.202 95.387 85.878 82.818
Ranking is below
Full MIC ReliefF LCorrel InfoGain
equinox 2 5 4 3 1
lucene 2 5 3 4 1
mylyn 2 4 3 1 5
pde 1 5 4 2 3
jdt 1 4 5 3 2
Sum 8 23 19 13 12
The Friedman's F Calculation formula:
F = (5/[5*5*(5+1)] * [8*8 + 23*23 + 19*19 + 13*13 + 12*12] - [5*5*(5+1)]
The value I get is -107.7666667.
How do I interpret that? The examples I have seen all have positive result.
I know about the R code but want the manual calculation.
This is how I generated the results and it worked
pacc_part
f1 <- friedman.test(pacc_part)
print (f1)
# Post-hoc tests are conducted only if omnimus Kruskal-Wallis test p-value
is 0.05 or less.
if ( f1$p.value < 0.05 )
{
n1 <- posthoc.friedman.nemenyi.test(pacc_part)
}
n1;
# alternate representation of post-hoc test results
summary(n1);

Removing outlier from excel using R code

The following datasheet is from excel file
Part A B C D E F G H I J K L
XXX 0 1 1 2 0 1 2 3 1 2 1 0
YYY 0 1 2 2 0 30 1 1 0 1 10 0
....
So, I want to display those parts that contains outliers having logic of
[median – t * MAD, median + t * MAD]
So how to code this using R by function for large amount of data?
You would want to calculate robust Z-scores based on median and MAD (median of absolute deviations) instead of non-robust standard mean and SD. Then assess your data using Z, with Z=0 meaning on median, Z=1 one MAD out, etc.
Let's assume we have the following data, where one set is outliers:
df <- rbind( data.frame(tag='normal', res=rnorm(1000)*2.71), data.frame(tag='outlier', res=rnorm(20)*42))
then Z it:
df$z <- with(df, (res - median(res))/mad(res))
that gives us something like this:
> head(df)
tag res z
1 normal -3.097 -1.0532
2 normal -0.650 -0.1890
3 normal 1.200 0.4645
4 normal 1.866 0.6996
5 normal -6.280 -2.1774
6 normal 1.682 0.6346
Then cut it into Z-bands, eg.
df$band <- cut(df$z, breaks=c(-99,-3,-1,1,3,99))
That can be analyzed in a straightforward way:
> addmargins(xtabs(~band+tag, df))
tag
band normal outlier Sum
(-99,-3] 1 9 10
(-3,-1] 137 0 137
(-1,1] 719 2 721
(1,3] 143 1 144
(3,99] 0 8 8
Sum 1000 20 1020
As can be seen, obviously, the ones with the biggest Zs (those being in the (-99,-3) and (3,99) Z-band, are those from the outlier community).

Probability of account win/loss using Bayesian Statistics

I am trying to estimate the probability of winning or losing an account, and I'd like to do this using Bayesian Methods. I'm not really that familiar with these methods, but I think I understand the general idea.
I know some information about losses and wins. Wins are usually characterized by some combination of activities; losses are usually characters by a different combination of activities. I'd like to be able to get some posterior probability of whether or not a new observation will be won or lost based on the current number of activities that are associated with that account.
Here is an example of my data: (This is just a sample for simplicity)
Email Call Callback Outcome
14 9 2 1
3 2 4 0
16 14 2 0
15 1 3 1
5 2 2 0
1 1 0 0
10 3 5 0
2 0 1 0
17 8 4 1
3 15 2 0
17 1 3 0
10 7 5 0
10 2 3 0
8 0 0 1
14 10 3 0
1 9 3 1
5 10 3 1
13 5 1 0
9 4 4 0
So from here I know that 30% of the observations have an outcome of 1 (win) and 70% have an outcome of 0 (loss). Let's say that I want to use the other columns to get a probability of win/loss for a new observation which may have a small number of events (emails, calls, and callbacks) associated with it.
Now let's say that I want to use the counts/proportions of the different events as priors for a new observation. This is where I start getting tripped up. My thinking is to create a dirichlet distribution for wins and losses, so two separate distributions, one for wins and one for losses. Using the counts/proportions of events for each outcome as the priors. I guess I'm not sure how to do this in R. I think my course of action would be estimate a dirichlet distribution (since I have 3 variables) for each outcome using maximum likelihood. I've been trying to use the dirichlet.simul and dirichlet.mle functions from the sirt package in R. I'm not sure if I need to simulate one first?
Another issue is once I have this distribution, it's unclear to me how to get a posterior distribution of a new observation. I've read several papers and can't seem to find a straightforward process on how to do this. (Or maybe there's some holes in my understanding). Any pushes in the right direction would be greatly appreciated.
This is the code I've tried so far:
### FOR WON ACCOUNTS
set.seed(789)
N <- 6
probs <- c(0.535714286, 0.330357143, 0.133928571 )
alpha <- probs
alpha <- matrix( alpha , nrow=N , ncol=length(alpha) , byrow=TRUE )
x <- dirichlet.simul( alpha )
dirichlet.mle(x)
$alpha
[1] 0.3385607 0.2617939 0.1972898
$alpha0
[1] 0.7976444
$xsi
[1] 0.4244507 0.3282088 0.2473405
### FOR LOST ACCOUNTS
set.seed(789)
N2 <- 14
probs2 <- c(0.528037383,0.308411215,0.163551402 )
alpha2 <- probs2
alpha2 <- matrix( alpha2 , nrow=N , ncol=length(alpha2) , byrow=TRUE )
x2 <- dirichlet.simul( alpha2 )
dirichlet.mle(x2)
$alpha
[1] 0.3388486 0.2488771 0.2358043
$alpha0
[1] 0.8235301
$xsi
[1] 0.4114587 0.3022077 0.2863336
Not sure if this is a correct approach or how to get posteriors from here. I realize all the outputs look similar across won/lost accounts. I just used some simulated data to represent what I'm working with.

How to generalize this algorithm (sign pattern match counter)?

I have this code in R :
corr = function(x, y) {
sx = sign(x)
sy = sign(y)
cond_a = sx == sy && sx > 0 && sy >0
cond_b = sx < sy && sx < 0 && sy >0
cond_c = sx > sy && sx > 0 && sy <0
cond_d = sx == sy && sx < 0 && sy < 0
cond_e = sx == 0 || sy == 0
if(cond_a) return('a')
else if(cond_b) return('b')
else if(cond_c) return('c')
else if(cond_d) return('d')
else if(cond_e) return('e')
}
Its role is to be used in conjunction with the mapply function in R in order to count all the possible sign patterns present in a time series. In this case the pattern has a length of 2 and all the possible tuples are : (+,+)(+,-)(-,+)(-,-)
I use the corr function this way :
> with(dt['AAPL'], table(mapply(corr, Return[-1], Return[-length(Return)])) /length(Return)*100)
a b c d e
24.6129416 25.4466058 25.4863041 24.0174672 0.3969829
> dt["AAPL",list(date, Return)]
symbol date Return
1: AAPL 2014-08-29 -0.3499903
2: AAPL 2014-08-28 0.6496702
3: AAPL 2014-08-27 1.0987923
4: AAPL 2014-08-26 -0.5235654
5: AAPL 2014-08-25 -0.2456037
I would like to generalize the corr function to n arguments. This mean that for every nI would have to write down all the conditions corresponding to all the possible n-tuples. Currently the best thing I can think of for doing that is to make a python script to write the code string using loops, but there must be a way to do this properly. Do you have an idea about how I could generalize the fastidious condition writing, maybe I could try to use expand.grid but how do the matching then ?
I think you're better off using rollapply(...) in the zoo package for this. Since you seem to be using quantmod anyway (which loads xts and zoo), here is a solution that does not use all those nested if(...) statements.
library(quantmod)
AAPL <- getSymbols("AAPL",auto.assign=FALSE)
AAPL <- AAPL["2007-08::2009-03"] # AAPL during the crash...
Returns <- dailyReturn(AAPL)
get.patterns <- function(ret,n) {
f <- function(x) { # identifies which row of `patterns` matches sign(x)
which(apply(patterns,1,function(row)all(row==sign(x))))
}
returns <- na.omit(ret)
patterns <- expand.grid(rep(list(c(-1,1)),n))
labels <- apply(patterns,1,function(row) paste0("(",paste(row,collapse=","),")"))
result <- rollapply(returns,width=n,f,align="left")
data.frame(100*table(labels[result])/(length(returns)-(n-1)))
}
get.patterns(Returns,n=2)
# Var1 Freq
# 1 (-1,-1) 22.67303
# 2 (-1,1) 26.49165
# 3 (1,-1) 26.73031
# 4 (1,1) 23.15036
get.patterns(Returns,n=3)
# Var1 Freq
# 1 (-1,-1,-1) 9.090909
# 2 (-1,-1,1) 13.397129
# 3 (-1,1,-1) 14.593301
# 4 (-1,1,1) 11.722488
# 5 (1,-1,-1) 13.636364
# 6 (1,-1,1) 13.157895
# 7 (1,1,-1) 12.200957
# 8 (1,1,1) 10.765550
The basic idea is to create a patterns matrix with 2^n rows and n columns, where each row represents one of the possible patterns (e,g, (1,1), (-1,1), etc.). Then pass the daily returns to this function n-wise using rollapply(...) and identify which row in patterns matches sign(x) exactly. Then use this vector of row numbers an an index into labels, which contains a character representation of the patterns, then use table(...) as you did.
This is general for an n-day pattern, but it ignores situations where any return is exactly zero, so the $Freq columns do not add up to 100. As you can see, this doesn't happen very often.
It's interesting that even during the crash it was (very slightly) more likely to have two up days in succession, than two down days. If you look at plot(Cl(AAPL)) during this period, you can see that it was a pretty wild ride.
This is a little different approach but it may give you what you're looking for and allows you to use any size of n-tuple. The basic approach is to find the signs of the adjacent changes for each sequential set of n returns, convert the n-length sign changes into n-tuples of 1's and 0's where 0 = negative return and 1 = positive return. Then calculate the decimal value of each n-tuple taken as binary number. These numbers will clearly be different for each distinct n-tuple. Using a zoo time series for these calculations provides several useful functions including get.hist.quote() to retrieve stock prices, diff() to calculate returns, and the rollapply() function to use in calculating the n-tuples and their sums.The code below does these calculations, converts the sum of the sign changes back to n-tuples of binary digits and collects the results in a data frame.
library(zoo)
library(tseries)
n <- 3 # set size of n-tuple
#
# get stock prices and compute % returns
#
dtz <- get.hist.quote("AAPL","2014-01-01","2014-10-01", quote="Close")
dtz <- merge(dtz, (diff(dtz, arithmetic=FALSE ) - 1)*100)
names(dtz) <- c("prices","returns")
#
# calculate the sum of the sign changes
#
dtz <- merge(dtz, rollapply( data=(sign(dtz$returns)+1)/2, width=n,
FUN=function(x, y) sum(x*y), y = 2^(0:(n-1)), align="right" ))
dtz <- fortify.zoo(dtz)
names(dtz) <- c("date","prices","returns", "sum_sgn_chg")
#
# convert the sum of the sign changes back to an n-tuple of binary digits
#
for( i in 1:nrow(dtz) )
dtz$sign_chg[i] <- paste(((as.numeric(dtz$sum_sgn_chg[i]) %/%(2^(0:2))) %%2), collapse="")
#
# report first part of result
#
head(dtz, 10)
#
# report count of changes by month and type
#
table(format(dtz$date,"%Y %m"), dtz$sign_chg)
An example of possible output is a table showing the count of changes by type for each month.
000 001 010 011 100 101 110 111 NANANA
2014 01 1 3 3 2 3 2 2 2 3
2014 02 1 2 4 2 2 3 2 3 0
2014 03 2 3 0 4 4 1 4 3 0
2014 04 2 3 2 3 3 2 3 3 0
2014 05 2 2 1 3 1 2 3 7 0
2014 06 3 4 3 2 4 1 1 3 0
2014 07 2 1 2 4 2 5 5 1 0
2014 08 2 2 1 3 1 2 2 8 0
2014 09 0 4 2 3 4 2 4 2 0
2014 10 0 0 1 0 0 0 0 0 0
so this would show that in month 1, January of 2014, there was one set of three days with 000 indicating 3 down returns , 3 days with the 001 change indicating two down return and followed by one positive return and so forth. Most months seem to have a fairly random distribution but May and August show 7 and 8 sets of 3 days of positive returns reflecting the fact that these were strong months for AAPL.

Resources