How to use an equation with my data in R? - r

I am struggling with a portion of the data analysis for some research I have carried out. Other researchers have used an equation to estimate population growth rate that I would like to implement, but I am hitting a wall with trying to do so. Below is the equation:
Where N0 is the initial number of females in a cohort,
Ax in the number of females emerging on day X, Wx is a measure of mean female size on day x
per replicate, f(wx) is a function relating fecundity to female size, and D is the time (in days)
for a female to reproduce.
N0 (n=15) and D (7) are fixed numbers that I can put in the equation. f(wx) is a function that I have (y = 91.85x - 181.40). Below is a small sample of my data:
df <- data.frame(replicate = c('1','1','2','2','3','3','4','4'),
size = c(5.1, 4.9, 4.7, 4.6, 5.1,2.4,4.3,4.4),
day_emerging = c('6','7','6','7','6','8','7','6'))
I am sorry if this is a bad question for this site. I am just lost for how to handle this. I need R to be able to do the equation for different days. I'm not sure if that is actually possible with my current data format, because R will have to figure out how many females emerged on day x and then perform the other calculations for that day. So maybe this is impossible.
Thank you very much for any advice you can offer.

Here is a base R solution. Hope this is what you are after
dfs <- split(df,df$day_emerging)
p <- sum(sapply(dfs, function(v) nrow(v)*f(mean(v$size))))
q <- sum(sapply(dfs, function(v) nrow(v)*as.numeric(unique(v$day_emerging))*f(mean(v$size))))
res <- log(p/n)/(D + q/p)
such that
> res
[1] 0.5676656
DATA
n <- 15
D <- 7
f <- function(x) 91.85*x-181.4
df <- data.frame(replicate = c('1','1','2','2','3','3','4','4'),
size = c(5.1, 4.9, 4.7, 4.6, 5.1,2.4,4.3,4.4),
day_emerging = c('6','7','6','7','6','8','7','6'))

The answer to this is not particularly R-specific, but rather a skill in and of itself. What you want to do is translate a formal mathematical language into one that works in R (or Python or Matlab,etc).
This is a skill that's worth developing. In python-like psuedocode:
numerator = math.log((1 / n_0) * sum(A * f(w))
denominator = D + (sum(X * A * f(w)) / sum(A * f(w))
r_prime = numerator / denominator
As you can see, there's a lot of unknown variables that you'll have to set previously. Also, things f(w) will need to be defined as helper functions earlier in the script so they can be used. In general, you just want to be able to break down your equation into small parts that you can verify are correct.
It very much helps to do some unit testing with these things - package the equation as a function (or set of small functions that you'll use together) and feed it data that you've run through the equation and verified in another way - by hand, or by a more familiar package. This way, you only have to worry about expressing it in the correct syntax and will know when you've gotten everything correct.

Related

Data perturbation - How to perform it?

I am doing some projects related to statistics simulation using R based on "Introduction to Scientific Programming and Simulation Using R" and in the Students projects session (chapter 24) i am doing the "The pipe spiders of Brunswick" problem, but i am stuck on one part of an evolutionary algorithm, where you need to perform some data perturbation according to the sentence bellow:
"With probability 0.5 each element of the vector is perturbed, independently
of the others, by an amount normally distributed with mean 0 and standard
deviation 0.1"
What does being "perturbed" really mean here? I dont really know which operation I should be doing with my vector to make this perturbation happen and im not finding any answers to this problem.
Thanks in advance!
# using the most important features, we create a ML model:
m1 <- lm(PREDICTED_VALUE ~ PREDICTER_1 + PREDICTER_2 + PREDICTER_N )
#summary(m1)
#anova(m1)
# after creating the model, we perturb as follows:
#install.packages("perturb") #install the package
library(perturb)
set.seed(1234) # for same results each time you run the code
p1_new <- perturb(m1, pvars=c("PREDICTER_1","PREDICTER_N") , prange = c(1,1),niter=200) # your can change the number of iterations to any value n. Total number of iteration would come to be n+1
p1_new # check the values of p1
summary(p1_new)
Perturbing just means adding a small, noisy shift to a number. Your code might look something like this.
x = sample(10, 10)
ind = rbinom(length(x), 1, 0.5) == 1
x[ind] = x[ind] + rnorm(sum(ind), 0, 0.1)
rbinom gets the elements to be modified with probability 0.5 and rnorm adds the perturbation.

Struggling with simple constraints in constrOptim

I have a function in R that I wish to maximise subject to some simple constraints in optim or constrOptim, but I'm struggling to get my head around ci and uito fit my constraints.
My function is:
negexpKPI <- function(alpha,beta,spend){
-sum(alpha*(1-exp(-spend/beta)))
}
where alpha and beta are fixed vectors, and spend is a vector of spends c(sp1,sp2,...,sp6) which I want to vary in order to maximise the output of negexpKPI. I want to constrain spend in three different ways:
1) Min and max for each sp1,sp2,...,sp6, i.e
0 < sp1 < 10000000
5000 < sp2 < 10000000
...
2) A total sum:
sum(spend)=90000000
3) A sum for some individual components:
sum(sp1,sp2)=5000000
Any help please? Open to any other methods that would work but would prefer base R if possible.
According to ?constrOptim:
The feasible region is defined by ‘ui %*% theta - ci >= 0’. The
starting value must be in the interior of the feasible region, but
the minimum may be on the boundary.
So it is just a matter of rewriting your constraints in matrix format. Note, an identity constraint is just two inequality constraints.
Now we can define in R:
## define by column
ui = matrix(c(1,-1,0,0,1,-1,1,-1,
0,0,1,-1,1,-1,1,-1,
0,0,0,0,0,0,1,-1,
0,0,0,0,0,0,1,-1,
0,0,0,0,0,0,1,-1,
0,0,0,0,0,0,1,-1), ncol = 6)
ci = c(0, -1000000, 5000, -1000000, 5000000, 90000000, -90000000)
Additional Note
I think there is something wrong here. sp1 + sp2 = 5000000, but both sp1 and sp2 can not be greater than 1000000. So there is no feasible region! Please fix your question first.
Sorry, I was using sample data that I hadn't fully checked; the true optimisation is for 40 sp values with 92 constraints which would if I'd replicated here in full would have made the problem more difficult to explain. I've added a few extra zeroes to make it feasible now.

Finding Sample Size

I am attempting to use several methods (Wald, Wilson, Clopper-Pearson, Jeffreys, etc.) to calculate sample sizes for confidence intervals. I have been unable to find, in R, how to calculate these. Is there a better way to calculate these besides brute force? Does R have a package that will output all to compare?
I have been unsuccessful with the likes of n.clopper.pearson{GenBinomApps} and some of these require lots of by-hand computations. I have done this for the Wald method:
#Variables
z <- 1.95996
d <- .05
p <- 0.5
q <- 1 - p
#Wald
n_wald <- (z^2 * (p*q))/(d^2)
n_wald
But, I have not been able to find away, besides guess and check methods, to produce the others in R.
I was able to answer my own question with help from the comments:
n_wald <- ciss.wald(p, d, alpha = 0.05)
n_wilson <- ciss.wilson(p, d, alpha = 0.05)
n_agricoull <- ciss.agresticoull(p, d, alpha = 0.05)
These were from the binomSamSize package. Still struggling with an optimization for the clopper-pearson and jeffries if anyone can provide direction there, but these commands calculated sample size easily.

Is there a way to simplify functions in R that utilize loops?

For example, I’m currently working on a function that allows you to see how much money you might have if you invested in the stock market. It’s currently using a loop structure, which is really irritating me, because I know there probably is a better way to code this and leverage vectors in R. I’m also creating dummy vectors before running the function, which seems a bit strange too.
Still a beginner at R (just started!), so any helpful guidance is highly appreciated!
set.seed(123)
##Initial Assumptions
initialinvestment <- 50000 # e.g., your starting investment is $50,000
monthlycontribution <- 3000 # e.g., every month you invest $3000
months <- 200 # e.g., how much you get after 200 months
##Vectors
grossreturns <- 1 + rnorm(200, .05, .15) # approximation of gross stock market returns
contribution <- rep(monthlycontribution, months)
wealth <- rep(initialinvestment, months + 1)
##Function
projectedwealth <- function(wealth, grossreturns, contribution) {
for(i in 2:length(wealth))
wealth[i] <- wealth[i-1] * grossreturns[i-1] + contribution[i-1]
wealth
}
##Plot
plot(projectedwealth(wealth, grossreturns, contribution))
I would probably write
Reduce(function(w,i) w * grossreturns[i]+contribution[i],
1:months,initialinvestment,accum=TRUE)
but that's my preference for using functionals. There is nothing wrong with your use of a for loop here.

Least square minimization

I hope this is the right place for such a basic question. I found this and this solutions quite articulated, hence they do not help me to get the fundamentals of the procedure.
Consider a random dataset:
x <- c(1.38, -0.24, 1.72, 2.25)
w <- c(3, 2, 4, 2)
How can I find the value of μ that minimizes the least squares equation :
The package manipulate allows to manually change with bar the model with different values of μ, but I am looking for a more precise procedure than "try manually until you do not find the best fit".
Note: If the question is not correctly posted, I would welcome constructive critics.
You could proceed as follows:
optim(mean(x), function(mu) sum(w * (x - mu)^2), method = "BFGS")$par
# [1] 1.367273
Here mean(x) is an initial guess for mu.
I'm not sure if this is what you want, but here's a little algebra:
We want to find mu to minimise
S = Sum{ i | w[i]*(x[i] - mu)*(x[i] - mu) }
Expand the square, and rearrange into three summations. bringing things that don't depend on i outside the sums:
S = Sum{i|w[i]*x[i]*x[i])-2*mu*Sum{i|w[i]*x[i]}+mu*mu*Sum{i|w[i]}
Define
W = Sum{i|w[i]}
m = Sum{i|w[i]*x[i]} / W
Q = Sum{i|w[i]*x[i]*x[i]}/W
Then
S = W*(Q -2*mu*m + mu*mu)
= W*( (mu-m)*(mu-m) + Q - m*m)
(The second step is 'completing the square', a simple but very useful technique).
In the final equation, since a square is always non-negative, the value of mu to minimise S is m.

Resources