CES Production Function Estimation using micEconCES - r

I'm currently trying to do some estimations using the micEconCES package in R by Henningsen/Henningsen (2011). My issue is that I am not very familiar with R and I'm trying to implement my own dataset to get the estimations with the package.
They authors of the paper created this data set for the estimations.
R> set.seed( 123 )
R> cesData <- data.frame(x1 = rchisq(200, 10), x2 = rchisq(200, 10), x3 = rchisq(200, 10), x4 = rchisq(200, 10) )
R> cesData$y2 <- cesCalc( xNames = c( "x1", "x2" ), data = cesData, + coef = c( gamma = 1, delta = 0.6, rho = 0.5, nu = 1.1 ) )
R> cesData$y2 <- cesData$y2 + 2.5 * rnorm( 200 )
R> cesData$y3 <- cesCalc(xNames = c("x1", "x2", "x3"), data = cesData, coef = c( gamma = 1, delta_1 = 0.7, delta = 0.6, rho_1 = 0.3, rho = 0.5, + nu = 1.1), nested = TRUE )
R> cesData$y3 <- cesData$y3 + 1.5 * rnorm(200)
R> cesData$y4 <- cesCalc(xNames = c("x1", "x2", "x3", "x4"), data = cesData, coef = c(gamma = 1, delta_1 = 0.7, delta_2 = 0.6, delta = 0.5, rho_1 = 0.3, rho_2 = 0.4, rho = 0.5, nu = 1.1), nested = TRUE )
R> cesData$y4 <- cesData$y4 + 1.5 * rnorm(200)
The first line sets the“seed”for the random number generator so that these examples can be replicated with exactly the same data set. The second line creates a data set with four input variables (called x1, x2, x3, and x4) that each have 200 observations and are generated from random χ2 distributions with 10 degrees of freedom. The third, fifth, and seventh commands use the function cesCalc, which is included in the micEconCES package, to calculate the deterministic output variables for the CES functions with two, three, and four inputs (called y2, y3, and y4, respectively) given a CES production function. Now in my paper I'm trying to estimate the CES function for the U.S. at the Aggregate Level for the two input case with capital and labor. So what I did is I gathered data from the World Bank Data Base from 1990-2015, where I used Gross Fixed Capital Formation for capital and total Labor Force for Labor.
The authors did f.e. a non linear estimation the following way
R> cesNls <- nls( y2 ~ gamma * ( delta * x1^(-rho) + (1 - delta) * x2^(-rho) )^(-phi / rho), + data = cesData, start = c( gamma = 0.5, delta = 0.5, rho = 0.25, phi = 1 ) ) R> print( cesNls )
Now I want the exact same thing for my own data Set which is called Data_Extract_From_World_Development_Indicators. So what I did is firstly
ceslan <- cesCalc( xNames = c( "GrossFixedCapitalFormation", "LaborForce" ), data = Data_Extract_From_World_Development_Indicators, coef = c( gamma = 1, delta = 0.6, rho = 0.5, nu = 1.1 ) )
So i replicated
R> cesData$y2 <- cesCalc( xNames = c( "x1", "x2" ), data = cesData, coef = c( gamma = 1, delta = 0.6, rho = 0.5, nu = 1.1 ) )
All I did was changing the name of the Dataset and replaced x1 and x2 with my two variables for capital and Labor.
Afterwards I tried to do the non linear estimation
cesulan <- nls(y2 ~ gamma * (delta * GrossFixedCapitalFormation^(-rho) + (1-delta)*LaborForce^(-rho))^(-phi / rho), data = Data_Extract_From_World_Development_Indicators, start = c(gamma = 0.5, delta = 0.5, rho = 0.25, phi = 1) )
Now this is where my Problem is: I dont know what variable is meant to be y2 in my dataset. I can see in the formula that y2 ~ gamma *... so ist plotted against the rest of the term, but I dont know what Kind of value I need to plug in there. Does anyone have any advice?
Thanks in advance

In Hennigsen &Hennisgen (2011), the variable y2 is created with the function cesCalc. It is perturbed in order to theoretically test the introduced function cesEst. This variable is supposed to be your function's Output (usually Gross Domestic Product, but not exclusively).
this must be a series (a R column of a dataframe) of numerical non-negative values, of size equal to your other explanatory variables x1 and x2.

Related

Using dede from the R deSolve/FME packages to fit data to a compartment model

I'm trying to fit the tetracycline data set from Bates & Watts to a compartment model which forms a system of first order differential equations. The system has an analytic solution but I want to use the dede function to estimate the parameters numerically.
I can get parameter estimates which are close to the ones published in Bates and Watts but I'm wondering if I have coded the problem correctly. Specifically, since Bates & Watts account for dead time in their solution, I'm concerned about whether I have coded the use of lagvalue() in the function called DiffEqns correctly.
My programming question relates to coding of the derivatives with lag time. They are currently coded as:
dy1 <- -theta1*y1lag
dy2 <- theta1*y1lag - theta2*y2lag
However, I wonder if the derivatives should be coded instead as:
dy1 <- -theta1*y1lag*y[1]
dy2 <- theta1*y1lag*y[1] - theta2*y2lag*y[2]
Thanks and regards,
# Analyze the tetracycline data set as a two-compartment model
# (see Bates & Watts, "Nonlinear Regression Analysis and Its Applications")
## Note: the differential equations for the compartment model are:
## dy1/dt = -theta1*y1
## dy2/dt = theta1*y1 - theta2*y2
## (see p. 169 in Bates & Watts)
# Load packages
library(FME)
# Create the tetracycline dataset (see p. 281 in Bates & Watts)
tetra <- structure(list(time = c(1, 2, 3, 4, 6, 8, 10, 12, 16),
conc = c(0.7,1.2, 1.4, 1.4, 1.1, 0.8, 0.6, 0.5, 0.3)),
row.names = c(NA, 9L), class = "data.frame")
# Observe that: A) "conc" = data for y2; B) there is no data for y1; C) data start at time = 1 instead of time = 0
# Create a differential equation model with dead time
DiffEqns <- function(t, y, parms) {
theta1 <- parms[1] # rate constant for y1
theta2 <- parms[2] # rate constant for y2
theta3 <- parms[3] # amount of y1 at time = 0
theta4 <- parms[4] # parameter that accounts for dead time
y1lag <- ifelse(t - theta4 < 0, 0, lagvalue(t - theta4, 1))
y2lag <- ifelse(t - theta4 < 0, 0, lagvalue(t - theta4, 2))
dy1 <- -theta1*y1lag
dy2 <- theta1*y1lag - theta2*y2lag
return(list(c(dy1, dy2), y1lag = y1lag, y2lag = y2lag))
}
# Find a numerical solution for the system of delay differential equations using dede() from deSolve
time <- seq(from = 0, to = 16, by = 0.1)
Cost <- function(P) {
theta1 <- P[1]
theta2 <- P[2]
theta3 <- P[3]
theta4 <- P[4]
theta <- c(theta1, theta2, theta3, theta4)
yinit <- c(y1 = theta3, conc = 0)
out <- dede(y = yinit, times = time, func = DiffEqns, parms = theta)
modCost(model = out, obs = tetra)
}
theta <- c(theta1 = 0.1, theta2 = 0.2, theta3 = 5, theta4 = 0.2) # starting values for the parameters
yinit <- c(y1 = theta[3], conc = 0)
CompModFit2 <- modFit(f = Cost, p = theta, lower = c(0,0,0,0))
FMEtheta <- coef(CompModFit2)
# Compare data to numerical model solution using parameters from modFit
dedeFitted <- dede(times = time,y = c(y1 = FMEtheta[3], conc = 0), func = DiffEqns, parms = FMEtheta)
plot(dedeFitted, obs=tetra)
# Parameters from FME are:
# theta1 theta2 theta3 theta4
#0.1193617 0.6974401 10.7188251 0.2206997
# Compare FME parameters to the parameter estimates published in Bates & Watts:
# theta1 theta2 theta3 theta4
# 0.1488 0.7158 10.10 0.4123

avoid negative values when resolving a ODE

I am trying to model the behavior of a made-up networks of 5 genes, but I have the problem that I get negative values, which it has not sense biologically speaking.
Is there a way to limit the values to zero?
I managed to do it when I represent the graph, but I don't know how to use the ifelse in the main equation.
Thank you very much-1
###################################################
###preliminaries
###################################################
library(deSolve)
library(ggplot2)
library(reshape2)
###################################################
### Initial values
###################################################
values <- c(A = 1,
B = 1,
D = 1,
E = 20,
R = 1)
###################################################
### Set of constants
###################################################
constants <- c(a = 1.2,
b = 0.5,
c = 1.2,
d = 1.5,
e = 0.3,
f = 0.5,
g = 1.5,
h = 0.9,
i = 1.3,
j = 1.3,
m = 0.8,
n = 0.6,
q = 1,
t = 0.0075,
u = 0.0009,
Pa = 100,
Pb = 0.05,
Pd = 0.1,
Pe = 10)
###################################################
### differential equations
###################################################
Dynamic_Model<-function(t, values, constants) {
with(as.list(c(values, constants)),{
dA <- Pa + a*D - j*A - R
dB <- Pb + b*A + e*E - m*B
dD <- Pd + d*B + f*E - g*A - n*D
dE <- Pe - h*B + i*E - q*E
dR <- t*A*B - u*D*E
list(c(dA, dB, dD, dE, dR))
})
}
###################################################
### time
###################################################
times <- seq(0, 200, by = 0.01)
###################################################
### print ## Ploting
###################################################
out <- ode(y = values, times = times, func = Dynamic_Model, parms = constants)
out2 <- ifelse(out<0, 0, out)
out.df = as.data.frame(out2)
out.m = melt(out.df, id.vars='time')
p <- ggplot(out.m, aes(time, value, color = variable)) + geom_point(size=0.5) + ggtitle("Dynamic Model")
I agree completely with #Lutz Lehmann, that the negative values are a result of the structure of the model.
The system of equations allows that derivatives still become negative, even if the states are already below zero, i.e. the states can further decrease. We don't have information about what the states are, so the following is only a technical demonstration. Here a dimensionless Monod-type feedback function fb is implemented as a safeguard. It is normally close to one. The km value should be small enough to act only for state values close to zero, and it should not be too small to avoid numerical errors. It can be formulated individually for each state. Other function types are also possible.
library(deSolve)
library(ggplot2)
library(reshape2)
values <- c(A = 1,
B = 1,
D = 1,
E = 20,
R = 1)
constants <- c(a = 1.2,
b = 0.5,
c = 1.2,
d = 1.5,
e = 0.3,
f = 0.5,
g = 1.5,
h = 0.9,
i = 1.3,
j = 1.3,
m = 0.8,
n = 0.6,
q = 1,
t = 0.0075,
u = 0.0009,
Pa = 100,
Pb = 0.05,
Pd = 0.1,
Pe = 10,
km = 0.001)
Dynamic_Model<-function(t, values, constants) {
with(as.list(c(values, constants)),{
fb <- function(x) x / (x+km) # feedback
dA <- (Pa + a*D - j*A - R) * fb(A)
dB <- (Pb + b*A + e*E - m*B) * fb(B)
dD <- (Pd + d*B + f*E - g*A - n*D) * fb(D)
dE <- (Pe - h*B + i*E - q*E) * fb(E)
dR <- (t*A*B - u*D*E) * fb(R)
list(c(dA, dB, dD, dE, dR))
})
}
times <- seq(0, 200, by = 0.1)
out <- ode(y = values, times = times, func = Dynamic_Model, parms = constants)
plot(out)
Additional hints:
Removal of negative values afterwards (out2 <- ifelse(out<0, 0, out)) is just wrong.
Removal of negative values in the model function, i.e.
use the ifelse in the main
would also be wrong as it can lead to a severe violation of mass balance.
the time steps don't need to be very small. They are automatically adapted anyway by the solver. Too small time steps make your model slow and you get more outputs as needed.
some of your parameters are quite large, so that the model becomes very stiff.

"One after the other" realisation of discrete random variables

I'm stuck with the following problem:
There are given n+1 discrete random variables:
X = {1,...,n} with P(x=i) = p_i
Y_i = {1,...,n_i} with P(y_i = j) = p_ij and i = 1,...,n
We do the following:
We draw from X and the result determines which Y_i we choose for the next step: If x = a, we use Y_a.
We draw from this Y_a.
Now my questions to this:
How do I get the Expected Value and the Variance of the whole?
Can this "process" be defined by a single random variable?
Assume we only know the EV and Var of all Y_i, but not all (or even none) of the probabilities. Can we still calculate the EV and Var of the whole process?
If 2) can be done, how to do this efficiently in R?
To give you an example of what I've tried:
X = {1,2} with P(x = 1) = 0.3 and P(x = 2) = 0.7
Y_1 = {2,3} with P(y_1 = 1) = 0.5 and P(y_1 = 3) = 0.5
Y_2 = {1,5,20} with P(y_2 = 1) = 0.3, P(y_2 = 5) = 0.6 and P(y_2 = 20) = 0.1
I have tried to combine those to a single random variable Z, but I'm not sure, if that can be done that way:
Z = {2,3,1,5,20} with probabilities (0.5*0.3, 0.5*0.3, 0.3*0.7, 0.6*0.7, 0.1*0.7)
The weighted EV is correct, but the "weighted" Var is different - if it is correct to use the formula for Var of linear combination for independent random variables. (Maybe just the formula for the combined Var is wrong.)
I used R and the package "discreteRV":
install.packages("discreteRV")
library(discreteRV)
#defining the RVs
Y_1 <- RV(outcomes = c(2, 3), probs = c(0.5, 0.5)) #occures 30% of the time
Y_2 <- RV(outcomes = c(1, 5, 20), probs = c(0.3, 0.6, 0.1)) #occures 70% of the time
Z <- RV(outcomes = c(2, 3, 1, 5, 20),
probs = c(0.5*0.3, 0.5*0.3, 0.3*0.7, 0.6*0.7, 0.1*0.7))
#calculating the EVs
E(Z)
E(Y_1)*0.3 + E(Y_2)*0.7
#calculating the VARs
V(Z)
V(Y_1)*(0.3)^2 + V(Y_2)*(0.7)^2
Thank you for your help.
Actually Z has a larger sample space expanded by Y1 and Y2, which is not a linear superposition of two components. In other words, we should present Z like Z = [0.3*Y1, 0.7*Y2] rather than Z = 0.3*Y1 + 0.7*Y2.
Since we have
V(Z) = E(Z**2)-E(Z)**2
> E(Z**2) -E(Z)**2
[1] 20.7684
> V(Z)
[1] 20.7684
We will easily find that in the term E(Z)**2, there are cross-product terms between Y1 and Y2, which makes V(Z) != V(Y_1)*(0.3)^2 + V(Y_2)*(0.7)^2.

MLE error: initial value in 'vmmin' is not finite

We simulated a data set and created a model.
set.seed(459)
# seed mass
n <- 1000
seed.mass <- round(rnorm(n, mean = 250, sd = 75),digits = 1)
## Setting up the deterministic function
detFunc <- function(a,b,x){
return(exp(a+b*x)) / (1+exp(a+b*x))
}
# logit link function for the binomial
inv.link <- function(z){
p <-1/(1+exp(-z))
return(p)
}
#setting a and b values
a <- -2.109
b <- 0.02
# Simulating data
germination <- (rbinom(n = n, size = 10,
p = inv.link(detFunc(x = seed.mass, a = a, b = b))
))/10
## make data frame
mydata <- data.frame("predictor" = seed.mass, "response" = germination)
# plotting the data
tmp.x <- seq(0,1e3,length.out=500)
plot(germination ~ seed.mass,
xlab = "seed mass (mg)",
ylab = "germination proportion")
lines(tmp.x,inv.link(detFunc(x = tmp.x, a = a, b = b)),col="red",lwd=2)
When we check the model we created and infer the parameters, we get an error:
Error in optim(par = c(a = -2.109, b = 0.02), fn = function (p) : initial value in 'vmmin' is not finite
library(bbmle)
mod1<-mle2(response ~ dbinom(size = 10,
p = inv.link(detFunc(x = predictor, a = a, b = b))
),
data = mydata,
start = list("a"= -2.109 ,"b"= 0.02))
We're stumped and can't figure out why we're getting this error.
Your problem is that you're trying to fit a binomial outcome (which must be an integer) to a proportion.
You can use round(response*10) as your predictor (to put the proportion back on the count scale; round() is because (a/b)*b is not always exactly equal to a in floating-point math ...) Specifically, with your setup
mod1 <- mle2(round(response*10) ~ dbinom(size = 10,
p = inv.link(detFunc(x = predictor, a = a, b = b))
),
data = mydata,
start = list(a = -2.109 ,b = 0.02))
works fine. coef(mod1) is {-1.85, 0.018}, plausibly close to the true values you started with (we don't expect to recover the true values exactly, except as the average of many simulations [and even then MLE is only asymptotically unbiased, i.e. for large data sets ...]
The proximal problem is that trying to evaluate dbinom() with a non-integer value gives NA. The full output from your model fit would have been:
Error in optim(par = c(a = -2.109, b = 0.02), fn = function (p) :
initial value in 'vmmin' is not finite
In addition: There were 50 or more warnings (use warnings() to see the first 50)
It's always a good idea to check those additional warnings ... in this case they are all of the form
1: In dbinom(x = c(1, 1, 1, 0.8, 1, 1, 1, 1, 1, 1, 1, 0.8, ... :
non-integer x = 0.800000
which might have given you a clue ...
PS you can use qlogis() and plogis() from base R for your link and inverse-link functions ...

Fitting a nonlinear function with "missing level" in mle2 (WARNING: ecologist with computer)

I am looking to optimize the fit of a model that describes the amount of litter collected in a network of .5m^2 'litter traps' in a plot of mapped trees of known diameter and species. The model of choice has two factors, allometric scaling of litter production, and exponential decay in litter travel distance.
tree1.litter = alpha*gamma^2 * DBH^Beta/(2*pi) * exp(-gamma*z-delta*DBH)
However, our trap data contains input from multiple trees (this is the "missing level" referred to in title):
Obs.Litter = tree1.litter + tree2.litter + ... + treej.litter + error
So far had very mixed results on even simulated data. It seems like with enough combinations of diameters and distances the functions should be somewhat well constrained. This analysis has been performed in an article I'm copy-catting. I've also tried the analysis on the log(Obs.Litter), which I think is the way to go. But I am not sure that the way I've coded the log version would have resulted in something that you would expect to perform any better.
At this point I suppose I'm just looking for any sort of advice (code based or conceptual) from someone more experienced with fitting nonlinear regressions or model fitting problems with this type of "hidden process". Code for data simulation and the various likelihoods are included below. I've had a bit more success with estimating these parameters with a Bayesian hierarchical model in OpenBUGS, with informative priors only.
library(plyr)
########################
##Generate Data#########
########################
alpha = 5
Beta = 2
gamma = .2
delta = .02
n = 600 #Number of trees
N.trap = 45 #Number of litter traps
D = rlnorm(n, 2)+5 #generate diameters
Z = runif(n, 0, 25) #generate distances
trap.id = sort(sample(1:N.trap, size = n, replace = T)) #assign trees to traps
tree.lit = (2*pi)^-1*alpha*gamma^2*D^Beta * exp(-gamma*Z-delta*D) #generate litter
log.tree.lit = -(2*pi) + log(alpha) + 2*log(gamma) + Beta*DBH -gamma*Z - delta*D
litter = data.frame(D=D, Z = Z, trap.id = trap.id, tree.lit = tree.lit)
data = ddply(litter, .(trap.id), summarize, trap.lit = sum(tree.lit), n.trees=length(trap.id) )
trap.lit = data[,2]
#####################################
##### Maximum Likelihood Optimization
#####################################
library(bbmle)
log.Litter.Func<-function(alpha, Beta, gamma, delta, sigma, D, Z, N.trap, trap.id, Obs.Litter){
log.Expected.Litter.tree = -log(2*pi) + log(alpha) + 2*log(gamma) + Beta*log(D) -gamma*Z - delta*D
log.Expected.Litter.trap = rep(0, N.trap)
for(i in 1:N.trap){
log.Expected.Litter.trap[i] <- sum(exp(log.Expected.Litter.tree[trap.id==i]))
}
-sum(dlnorm(log(Obs.Litter), log.Expected.Litter.trap, sd=sigma, log=T))
}
Litter.Func<-function(alpha, Beta, gamma, delta, sigma, D, Z, N.trap, trap.id, Obs.Litter){
Expected.Litter.tree = 1/(2*pi) * alpha * gamma^2 * D^Beta *exp(-gamma*Z - delta*D)
Expected.Litter.trap = rep(0, N.trap)
for(i in 1:N.trap){
Expected.Litter.trap[i] <- sum(Expected.Litter.tree[trap.id==i])
}
-sum(dnorm(Obs.Litter, Expected.Litter.trap, sd=sigma, log=T))
}
log.fit<-mle2(log.Litter.Func,
start = list(alpha = 5,gamma = .2,Beta = 2,delta = .02, sigma = 1),
#upper = list(alpha = 20,gamma = 1,Beta = 4,delta = .2,sigma = 20),
#lower = list(alpha = .002,gamma = .002,Beta = .0002,delta = .000000002,sigma = .020),
#method="L-BFGS-B",
data=list(D=D, Z=Z, N.trap=N.trap, trap.id=litter$trap.id, Obs.Litter=trap.lit)
)
fit<-mle2(Litter.Func,
start = list(alpha = 5,gamma = .2,Beta = 2,delta = .02, sigma = 1),
#upper = list(alpha = 20,gamma = 1,Beta = 4,delta = .2,sigma = 20),
#lower = list(alpha = .002,gamma = .002,Beta = .0002,delta = .000000002,sigma = .020),
#method="L-BFGS-B",
data=list(D = D,Z = Z,N.trap=N.trap, trap.id=litter$trap.id,Obs.Litter = trap.lit)
)

Resources