Optimize / solve equation for unknown exponent - r

I have dataframe with the following variables: a and b as predictors and c as outcome. My formula is:
c = (a^x) / (a^x + b^x)
How to solve for x?
Example data:
dat <- data.frame(a = runif(5, 1, 100), b = runif(5, 10, 20), c = runif(5, 0, 1))
Reply to comment:
What is your expected output? A single x-value from least squares fitting, or a column x?
The whole column (sum of all row errors). I want to minimize the error for every row.

You can use the following code
library(minpack.lm)
dataset = data.frame(a = runif(5, 1, 100), b = runif(5, 10, 20), c = runif(5, 0, 1))
fun <- as.formula(c ~ a^x/(a^x + b^x))
#Fitting model using minpack.lm package
nls.out1 <- nlsLM(fun,
data = dataset,
start=list(x=1),
algorithm = "LM",
control = nls.lm.control(maxiter = 500))
summary(nls.out1)

Related

How can I predict values in factorial experiments (2^k) with centre points in R?

How can I predict values in factorial experiments with centre points in R using FrF2 package with the predict function or using the broom package?
My code:
library(FrF2)
plan.person = FrF2(nfactors = 5, resolution = 5, replications = 2,
ncenter = 1, randomize = FALSE,
factor.names = list(
A = c(8, 5),
B = c(70, 30),
C = c(0.5, 0),
D = c(1000, 700),
E = c(70, 10)))
resp <- c(84.55, 66.34, -1, 69.18, 73.01, 64.52, 0.73, 47.61, 68.18, 59.87,
26, 72.57, 78.08, 73.81, 26, 59.38, 71.41, 88.64, 64.92, 4, 68.81,
80, 69.66, -1.36, 54.50, 79.24, 78.53, -1, 72.63, 89.97, 87.98,
-11, 65.68, 82.46)
newplan <- add.response(design = plan.person, response = resp)
model <- lm(newplan, use.center = T)
# summary(model)
d <- within(newplan, {
A <- as.numeric(as.character(A))
B <- as.numeric(as.character(B))
C <- as.numeric(as.character(C))
D <- as.numeric(as.character(D))
E <- as.numeric(as.character(E)) })
A = seq(5, 8, 1)
B = seq(30, 70, length.out = length(A))
C = seq(0, 0.5, length.out = length(A))
D = seq(700, 1000, length.out = length(A))
E = seq(10, 70, length.out = length(A))
data <- expand.grid(A = A, B = B,
C = C, D = D,
E = E)
dados$p <- predict(model, newdata=data)
Because of the center point the following message appears.
Error in model.frame.default (Terms, newdata, na.action = na.action, xlev = object $ xlevels):
   lengths of variables differ (found in 'center')
"A two-level experiment with center points can detect, but not fit, quadratic effects."
(https://www.itl.nist.gov/div898/handbook/pri/section3/pri336.htm)
That is, R can't predict these values because you need to make additional assumptions about what the curve looks like to predict points not at your design points.
Note that computationally, you can get the software to work by adding a center term. The error is because this term is in the regression but not in the data set. You could add one with data$center <- FALSE (because none of the points in data are at the center), but this will not do the right thing, as it will not take the potential curvature into account when predicting non-central points, it would simply predict a twisted plane (that is, linear with interactions) with a single bump at the center.
Of course, it's also equivalent to just fitting the model with use.center=FALSE, as the center point doesn't affect the fit of the other points.
If you remove the central value, you can this after model <- lm(newplan, use.center = T)
:
1- Filter the pvalues < 0.05
coe <- broom::tidy(model) %>%
slice(-7) %>% #remove center
filter(p.value < 0.05)
m_beta <- coe$estimate
2 - Do a grid:
A = seq(5, 8, 0.5)
B = seq(30, 70, length.out = length(A))
exp <- expand.grid(A = A, B = B) %>%
mutate(bo = as.numeric(1)) %>%
mutate(ult = A*B) %>%
select(bo, A, B, ult) %>%
as.matrix()
3: Do a Regression:
reg <- t(m_beta %*% t(exp))
exp <- cbind(exp, reg) %>%
as.data.frame() %>%
rename(reg = V5)
But I believe this only solves the computational problem or simplifies it. I believe linear regression should be redone as well. But with this code you can explore and see what other errors exist.

Create normally distributed variables with a defined correlation in R

I am trying to create a data frame in R, with a set of variables that are normally distributed. Firstly, we only create the data frame with the following variables:
RootCause <- rnorm(500, 0, 9)
OtherThing <- rnorm(500, 0, 9)
Errors <- rnorm(500, 0, 4)
df <- data.frame(RootCuase, OtherThing, Errors)
In the second part, we're asked to redo the above, but with a defined correlation between RootCause and OtherThing of 0.5. I have tried reading through a couple of pages and articles explaining correlation commands in R, but I am afraid I am struggling with comprehending it.
Easy answer
Draw another random variable OmittedVar and add it to the other variables:
n <- 1000
OmittedVar <- rnorm(n, 0, 9)
RootCause <- rnorm(n, 0, 9) + OmittedVar
OtherThing <- rnorm(n, 0, 9) + OmittedVar
Errors <- rnorm(n, 0, 4)
cor(RootCause, OtherThing)
[1] 0.4942716
Other answer: use multivariate normal function from MASS package:
But you have to define the variance/covariance matrix that gives you the correlation you like (the Sigma argument here):
d <- MASS::mvrnorm(n = n, mu = c(0, 0), Sigma = matrix(c(9, 4.5, 4.5, 9), nrow = 2, ncol = 2), tol = 1e-6, empirical = FALSE, EISPACK = FALSE)
cor(d[,1], d[,2])
[1] 0.5114698
Note:
Getting a correlation other than 0.5 depends on the process; if you want to change it from 0.5, you'll change the details (from adding 1 * OmittedVar in the first strat or changing Sigma in the second strat). But you'll have to look up details on variance rulse of the normal distribution.

Achieving t random variables with each different df and ncp in R?

I'm trying to generate 5 random t variates using rt(), with each of the 5 having a particular df (respectively, from 1 to 5) and a particular ncp (respectively, seq(0, 1, l = 5)). So, 5 random t-variables each having a different df and a different ncp.
To achieve the above, I tried the below with no success. What could be the efficient R code to achieve what I described above?
vec.rt = Vectorize(function(n, df, ncp) rt(n, df, ncp), c("n", "df", "ncp"))
vec.rt(n = 5, df = 1:5, ncp = seq(0, 1, l = 5))
Or
mapply(FUN = rt, n = 5 , df = 1:5, ncp = seq(0, 1, l = 5))
Notice for:
rt(n = 5, df = 1:5, ncp = seq(0, 1, l = 5))
R gives the following warning:
Warning message:
In if (is.na(ncp)) { :
the condition has length > 1 and only the first element will be used
Rephrasing your question helps to find an answer: you want sample of length 1 (n = 1) from 5 random variables each having different parameters.
mapply(FUN = rt, n = 1 , df = 1:5, ncp = seq(0, 1, l = 5))

R: Alleged "missing values" when no values are actually missing for MCMCglmm

I have a data structured as follows:
A is the count of positive cases in a cohort
B is the total count of the cohort minus A.
C is a binary variable
D- F are normally distributed continuous variables
G is 6 level factor
I am using MCMCglmm package in R to analyse this data to find which of the variables C - G affect A and B.
I have done this successfully using lme4's glmer function with success but now I wish to add more random effects which I have been advised will be better handled by the MCMC optimised variation. However, given the following function call
MCMCmod1 <- MCMCglmm(cbind(A, B) ~ C + D + E + F,
random = G,
prior = prior,
family ="multinomial2",
data = g)
(I appreciate the family may not be correct in this case)
Where
prior = list(R = list(V = 1, n = 0, fix = 1),
G = list(
G1 = list(V = 1, n = 1),
G2 = list(V = 1, n = 1),
G3 = list(V = 1, n = 1),
G4 = list(V = 1, n = 1),
G5 = list(V = 1, n = 1)))
I get the following error
Error in `[<-.data.frame`(`*tmp*`, , response.names, value = c(0, 0, 0, :
missing values are not allowed in subscripted assignments of data frames
However, when I check my variables there are 100% no NAs.
Any ideas on the steps I can take to try to debug this?

dlm package in R: What is causing this error: `tsp<-`(`*tmp*`, value = c(1, 200, 1))

I am using dlm package in R for performing Kalman filtering for the following simulated data.
## Multivariate time-series of dimension 200 and length 3
obsTimeSeries <- cbind(rnorm(200, 1, 2), rnorm(200, 2, 2), rnorm(200, 3, 2))
tseries <- ts(obsTimeSeries, frequency = 1)
kalmanBuild <- function (par) {
kalmanMod <- dlm(FF = diag(1, 200), GG = diag(1, 200),
V = exp(par[1]) * diag(1, 200),
W = exp(par[2]) * diag(1, 200),
m0 = rep(0, 200), C0 = 1e100 * diag(1, 200))
kalmanMod
}
kalmanMLE <- dlmMLE(tseries, parm = rep(0, 2), build = kalmanBuild)
kalmanMod <- kalmanBuild(kalmanMLE$par)
kalmanFilt <- dlmFilter (tseries, kalmanMod)
The code until kalmanMod works fine. It give an error in dlmFilter(tseries, kalmanMod) saying `tsp<-(*tmp*, value = c(1, 200, 1))`.
I tried to look for the location of error. It seems that the filtering works fine, that is, the means and variances are estimated correctly, until in the very last part when the code assigns tsp(ans$a) <- ytsp, the error occurs.
Has anyone else face this problem? If yes, then what am I doing wrong.
Try changing your code to:
obsTimeSeries <- rbind(rnorm(200, 1, 2), rnorm(200, 2, 2), rnorm(200, 3, 2))
rather than:
obsTimeSeries <- cbind(rnorm(200, 1, 2), rnorm(200, 2, 2), rnorm(200, 3, 2))
Your time series was set up to be 3 series at 200 time points. If you change it to rbind you will have a ts with 200 series at 3 time points.

Resources