Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 9 years ago.
Improve this question
Let's say I want to predict a dependent variable D, where:
D<-rnorm(100)
I cannot observe D, but I know the values of three predictor variables:
I1<-D+rnorm(100,0,10)
I2<-D+rnorm(100,0,30)
I3<-D+rnorm(100,0,50)
I want to predict D by using the following regression equation:
I1 * w1 + I2 * w2 + I3 * w3 = ~D
however, I do not know the correct values of the weights (w), but I would like to fine-tune them by repeating my estimate:
in the first step I use equal weights:
w1= .33, w2=.33, w3=.33
and I estimate D using these weights:
EST= I1 * .33 + I2 * .33 + I3 *. 33
I receive feedback, which is a difference score between D and my estimate (diff=D-EST)
I use this feedback to modify my original weights and fine-tune them to eventually minimize the difference between D and EST.
My question is:
Is the difference score sufficient for being able to fine-tune the weights?
What are some ways of manually fine-tuning the weights? (e.g. can I look at the correlation between diff and I1,I2,I3 and use that as a weight?
The following command,
coefficients(lm(D ~ I1 + I2 + I3))
will give you the ideal weights to minimize diff.
Your defined diff will not tell you enough to manually manipulate the weights correctly as there is no way to isolate the error component of each I.
The correlation between D and the I's is not sufficient either as it only tells you the strength of the predictor, not the weight. If your I's are truly independent (both from each other, all together and w.r.t. D - a strong assumption, but true when using rnorm for each), you could try manipulating one at a time and notice how it affects diff, but using a linear regression model is the simplest way to do it.
Related
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
Can anybody tell me how do i plot the maximum likelihood values L(ˆθM, M) versus M for a suitable range of M values for the count data provided in frogs and then estimate the total number of frogs living in the pond and the probability of appearance in R?
These were the questions asked:
questions and i have answered a and b
I have my pmf of my module and finded likelihood and log likelihood of my binomial model and you can see how much code i have written so far please help!
solutions to a,b and c so far
# importing the necessary modules
library(tidyverse)
library(ggplot2)
# loading the data
load("~/Statistical Modelling and Inference/aut2020.RData")
# Assigning a variable to the data
data <- frogs
# Assigning n to the length of the data
n <- length(frogs$counts)
n
theta_hat <- sum(frogs$counts)/M
loglik <- function(theta, y, M, data){
# Computes the log_likelihood for binomial model
sum_y <- sum(data$counts)
M <- sum_y / n
sum(log(dbinom(M,y))) + sum(y)*log(theta) + n*M - sum(y)*log(1-theta)}
Data looks like this:
in r script
when readed
Since you have already found the likelihood function in your answer (a), you can see that it is a function of M and theta - both unknown.
After estimating theta you have the MLE estimator - let's call it theta_hat.
In the dataframe frogs you have all the count observations y_i (known). So, using the known data and the ML estimate theta_hat that means the likelihood can be plotted for some (reasonable) range of values of M (you might need to try different ranges). So plot L(theta_hat, M) as a function of M. Bear in mind though that the estimate theta_hat will change as you change M so take that into account. The point where L(theta_hat, M) is maximized are you ML estimates for theta and M.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 7 years ago.
Improve this question
Let's say that there is a variable A that has a Normal distribution N(μ,σ).
I have two probabilities when P(A>a) and P(A<b), where a<b, and the given probability is expressed in %.(as an example)
With this information can R find the standard deviation? I don't know which commands to use? qnorm, dnorm,.. so I can get the Standard Deviation.
What I tried to do was, knowing that a = 100, b = 200 , P(A>a) = 5% and P(A<b)=15%:
Use the standarized Normal distribution considering μ = 0, σ = 1 (But i dont know how to put it in R, so I can get what I want)
See the probability in the normal distribution table and calculate Z ..., but it didn't work.
Is there a way R can find the standard deviation with just these information??
Your problem as stated is impossible, check that your inequalities and values are correct.
You give the example that p(A > 100) = 5% which means that the p( A < 100 ) = 95% which means that p( A < 200 ) must be greater than 95% (all the probability between 100 and 200 adds to the 95%), but you also say that p( A < 200 ) = 15%. There is no set of numbers that can give you a probability that is both greater than 95% and equal to 15%.
Once you fix the problem definition to something that works there are a couple of options. Using Ryacas you may be able to solve directly (2 equations and 2 unkowns), but since this is based on the integral of the normal I don't know if it would work or not.
Another option would be to use optim or similar programs to find(approximate) a solution. Create an objective function that takes 2 parameters, the mean and sd of the normal, then computes the sum of the squared differences between the stated percentages and those computed based on current guesses. The objective function will be 0 at the "correct" mean and standard deviation and positive everywhere else. Then pass this function to optim to find the minimum.
Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 8 years ago.
Improve this question
I have 3 random variables, x, y z ( all random effect)
x is nested in y, but y is crossed in z
I use the following function in lme4, but it does not work.
<- lmer(A ~ 1 + (1 | x/y) + (1 | y*z) + (1|x/y*z), my data)
Does anyone help me? Many thanks
I'm afraid this is still very unclear. More context would be useful. My guess is that you want
A ~ 1 + (1|y)+ (1|z) + (1|y:z) + (1|y:x)
or equivalently
A ~ 1 + (1|y*z) + (1|y:x)
but it's almost impossible to know for sure.
the first two random effects terms give among-y and among-z variances
the third term gives the variance among combinations of y and z -- you will only want this if you have multiple observations for each {y,z} combination
the last term gives the effect of x nested within y.
The expression A ~ 1 + (1|y/x) + (1|z/y) should give you the same results, because a/b expands in general to a + a:b (order matters for / but not for :), but it's less clear.
Crossed random effects are generally denoted by (1|y) + (1|z), or by (1|y*z) (which expands to (1|y) + (1|z) + (1|y:z)) if as discussed above there are multiple observations per {y,z} combination.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions concerning problems with code you've written must describe the specific problem — and include valid code to reproduce it — in the question itself. See SSCCE.org for guidance.
Closed 9 years ago.
Improve this question
I have a number of melting curves, for which I want to determine the slope of the steepest part between the minimum (valley) and maximum (peak) using R code (the slope in the inflection point corresponds to the melting point). The solutions I can imagine are either to determine the slope in every point and then find the maximum positive value, or by fitting a 4-parameter Weibull-type curve using the drc package to determine the inflection point (basically corresponding to the 50% response point between minimum and maximum). In the latter case the tricky part is that this fitting has to be restricted for each curve to the temperature range between the minimum (valley) and maximum (peak) fluorescence response. These temperature ranges are different for each curve.
Grateful for any feedback!
The diff function accomplishes the equivalent of numerical differentiation on equally spaced values (up to a constant factor) so finding maximum (or minimum) values can be used to identify location of steepest ascent (or descent):
z <- exp(-seq(0,3, by=0.1)^2 )
plot(z)
plot(diff(z))
z[ which(abs(diff(z))==max(abs(diff(z))) )]
# [1] 0.6126264
# could have also tested for min() instead of max(abs())
plot(z)
abline( v = which(abs(diff(z))==max(abs(diff(z))) ) )
abline( h = z[which(abs(diff(z))==max(abs(diff(z))) ) ] )
With an x-difference of 1, the slope is just the difference at that point:
diff(z) [ which(abs(diff(z))==max(abs(diff(z))) ) ]
[1] -0.08533397
... but I question whether that is really of much interest. I would have thought that getting the index (which would be the melting point subject to an offset) would be the value of interest.
Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 11 years ago.
Improve this question
I'm trying to estimate the rate of convergence of a sequence.
background:
u^n+1 = G u_n, where G is a iteration matrix (coming from heat equation).
Fixing dx = 0.1, and setting dt = dx*dx/2.0 to satisfy the a stability constraint
I then do a number of iterations up to time T = 0.1, and calculate the error (analytical solution is known) using max-norm.
This gives me a sequence of global errors, which from the theory should be of the form O(dt) + O(dx^2).
Now, I want to confirm that we have O(dt).
How should I do this?
Relaunch the same code with dt/2 and witness the error being halved.
I think Alexandre C.'s suggestion might need a little refinement (no pun intended) because the global error estimate depends on both Δt and Δx.
So if Δx were too coarse, refining Δt by halving might not produce the expected reduction of halving the error.
A better test might then be to simultaneously reduce Δt by quartering and Δx by halving. Then the global error estimate leads us to expect the error reduced by quartering.
Incidently it is common to plot the global error and "scales" as a log-log graph to estimate the order of convergence.
With greater resources (of time and computer runs) independently varying the time and space discretizations would allow a two-parameter fit (of the same sort of log-log model).
I suck at physics, but simple problems like this, even I can do.
Well, with what do you have problem with?
Calculating rate of the convergence:
If you have series defined as ( Sum[a[n], {n, 1, Infinity}] ), then you need to find location, where the series converges ( L=Limit[a[n], n -> Infinity] ).
Now you can find the rate of the convergence ( μ = Limit[(a[n + 1] - L)/(a[n] - L), n -> Infinity] )
Finding the combined uncertainty with analytical solution
Using the equation:
( Uc =
Sqrt[(D[a, t] Δt)^2 + (D[a, x] Δx)^2] )