Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 7 years ago.
Improve this question
Let's say that there is a variable A that has a Normal distribution N(μ,σ).
I have two probabilities when P(A>a) and P(A<b), where a<b, and the given probability is expressed in %.(as an example)
With this information can R find the standard deviation? I don't know which commands to use? qnorm, dnorm,.. so I can get the Standard Deviation.
What I tried to do was, knowing that a = 100, b = 200 , P(A>a) = 5% and P(A<b)=15%:
Use the standarized Normal distribution considering μ = 0, σ = 1 (But i dont know how to put it in R, so I can get what I want)
See the probability in the normal distribution table and calculate Z ..., but it didn't work.
Is there a way R can find the standard deviation with just these information??
Your problem as stated is impossible, check that your inequalities and values are correct.
You give the example that p(A > 100) = 5% which means that the p( A < 100 ) = 95% which means that p( A < 200 ) must be greater than 95% (all the probability between 100 and 200 adds to the 95%), but you also say that p( A < 200 ) = 15%. There is no set of numbers that can give you a probability that is both greater than 95% and equal to 15%.
Once you fix the problem definition to something that works there are a couple of options. Using Ryacas you may be able to solve directly (2 equations and 2 unkowns), but since this is based on the integral of the normal I don't know if it would work or not.
Another option would be to use optim or similar programs to find(approximate) a solution. Create an objective function that takes 2 parameters, the mean and sd of the normal, then computes the sum of the squared differences between the stated percentages and those computed based on current guesses. The objective function will be 0 at the "correct" mean and standard deviation and positive everywhere else. Then pass this function to optim to find the minimum.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions concerning problems with code you've written must describe the specific problem — and include valid code to reproduce it — in the question itself. See SSCCE.org for guidance.
Closed 9 years ago.
Improve this question
I have a number of melting curves, for which I want to determine the slope of the steepest part between the minimum (valley) and maximum (peak) using R code (the slope in the inflection point corresponds to the melting point). The solutions I can imagine are either to determine the slope in every point and then find the maximum positive value, or by fitting a 4-parameter Weibull-type curve using the drc package to determine the inflection point (basically corresponding to the 50% response point between minimum and maximum). In the latter case the tricky part is that this fitting has to be restricted for each curve to the temperature range between the minimum (valley) and maximum (peak) fluorescence response. These temperature ranges are different for each curve.
Grateful for any feedback!
The diff function accomplishes the equivalent of numerical differentiation on equally spaced values (up to a constant factor) so finding maximum (or minimum) values can be used to identify location of steepest ascent (or descent):
z <- exp(-seq(0,3, by=0.1)^2 )
plot(z)
plot(diff(z))
z[ which(abs(diff(z))==max(abs(diff(z))) )]
# [1] 0.6126264
# could have also tested for min() instead of max(abs())
plot(z)
abline( v = which(abs(diff(z))==max(abs(diff(z))) ) )
abline( h = z[which(abs(diff(z))==max(abs(diff(z))) ) ] )
With an x-difference of 1, the slope is just the difference at that point:
diff(z) [ which(abs(diff(z))==max(abs(diff(z))) ) ]
[1] -0.08533397
... but I question whether that is really of much interest. I would have thought that getting the index (which would be the melting point subject to an offset) would be the value of interest.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 9 years ago.
Improve this question
Let's say I want to predict a dependent variable D, where:
D<-rnorm(100)
I cannot observe D, but I know the values of three predictor variables:
I1<-D+rnorm(100,0,10)
I2<-D+rnorm(100,0,30)
I3<-D+rnorm(100,0,50)
I want to predict D by using the following regression equation:
I1 * w1 + I2 * w2 + I3 * w3 = ~D
however, I do not know the correct values of the weights (w), but I would like to fine-tune them by repeating my estimate:
in the first step I use equal weights:
w1= .33, w2=.33, w3=.33
and I estimate D using these weights:
EST= I1 * .33 + I2 * .33 + I3 *. 33
I receive feedback, which is a difference score between D and my estimate (diff=D-EST)
I use this feedback to modify my original weights and fine-tune them to eventually minimize the difference between D and EST.
My question is:
Is the difference score sufficient for being able to fine-tune the weights?
What are some ways of manually fine-tuning the weights? (e.g. can I look at the correlation between diff and I1,I2,I3 and use that as a weight?
The following command,
coefficients(lm(D ~ I1 + I2 + I3))
will give you the ideal weights to minimize diff.
Your defined diff will not tell you enough to manually manipulate the weights correctly as there is no way to isolate the error component of each I.
The correlation between D and the I's is not sufficient either as it only tells you the strength of the predictor, not the weight. If your I's are truly independent (both from each other, all together and w.r.t. D - a strong assumption, but true when using rnorm for each), you could try manipulating one at a time and notice how it affects diff, but using a linear regression model is the simplest way to do it.
The prob package numerically evaluates characteristic functions for base R distributions. For almost all distributions there are existing formulas. For a few cases, though, no closed-form solution is known. Case in point: the Weibull distribution (but see below).
For the Weibull characteristic function I essentially compute two integrals and put them together:
fr <- function(x) cos(t * x) * dweibull(x, shape, scale)
fi <- function(x) sin(t * x) * dweibull(x, shape, scale)
Rp <- integrate(fr, lower = 0, upper = Inf)$value
Ip <- integrate(fi, lower = 0, upper = Inf)$value
Rp + (0+1i) * Ip
Yes, it's clumsy, but it works surprisingly well! ...ahem, most of the time. A user reported recently that the following breaks:
cfweibull(56, shape = 0.5, scale = 1)
Error in integrate(fr, lower = 0, upper = Inf) :
the integral is probably divergent
Now, we know that the integral isn't divergent, so it must be a numerical problem. With some fiddling I could get the following to work:
fr <- function(x) cos(56 * x) * dweibull(x, 0.5, 1)
integrate(fr, lower = 0.00001, upper = Inf, subdivisions=1e7)$value
[1] 0.08024055
That's OK, but it isn't quite right, plus it takes a fair bit of fiddling which doesn't scale well. I've been investigating this for a better solution. I found a recently published "closed-form" for the characteristic function with scale > 1 (see here), but it involves Wright's generalized confluent hypergeometric function which isn't implemented in R (yet). I looked into the archives for integrate alternatives, and there's a ton of stuff out there which doesn't seem very well organized.
As part of that searching it occurred to me to translate the region of integration to a finite interval via the inverse tangent, and voila! Check it out:
cfweibull3 <- function (t, shape, scale = 1){
if (shape <= 0 || scale <= 0)
stop("shape and scale must be positive")
fr <- function(x) cos(t * tan(x)) * dweibull(tan(x), shape, scale)/(cos(x))^2
fi <- function(x) sin(t * tan(x)) * dweibull(tan(x), shape, scale)/(cos(x))^2
Rp <- integrate(fr, lower = 0, upper = pi/2, stop.on.error = FALSE)$value
Ip <- integrate(fi, lower = 0, upper = pi/2, stop.on.error = FALSE)$value
Rp + (0+1i) * Ip
}
> cfweibull3(56, shape=0.5, scale = 1)
[1] 0.08297194+0.07528834i
Questions:
Can you do better than this?
Is there something about numerical integration routines that people who are expert about such things could shed some light on what's happening here? I have a sneaking suspicion that for large t the cosine fluctuates rapidly which causes problems...?
Are there existing R routines/packages which are better suited for this type of problem, and could somebody point me to a well-placed position (on the mountain) to start the climb?
Comments:
Yes, it is bad practice to use t as a function argument.
I calculated the exact answer for shape > 1 using the published result with Maple, and the brute-force-integrate-by-the-definition-with-R kicked Maple's ass. That is, I get the same answer (up to numerical precision) in a small fraction of a second and an even smaller fraction of the price.
Edit:
I was going to write down the exact integrals I'm looking for but it seems this particular site doesn't support MathJAX so I'll give links instead. I'm looking to numerically evaluate the characteristic function of the Weibull distribution for reasonable inputs t (whatever that means). The value is a complex number but we can split it into its real and imaginary parts and that's what I was calling Rp and Ip above.
One final comment: Wikipedia has a formula listed (an infinite series) for the Weibull c.f. and that formula matches the one proved in the paper I referenced above, however, that series has only been proved to hold for shape > 1. The case 0 < shape < 1 is still an open problem; see the paper for details.
You may be interested to look at this paper, which discuss different integration methods for highly oscillating integrals -- that's what you are essentially trying to compute:
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.8.6944
Also, another possible advice, is that instead of infinite limit you may want to specify a smaller one, because if you specify the precision that you want, then based on the cdf of the weibull you can easily estimate how much of the tail you can truncate. And if you have a fixed limit, then you can specify exactly (or almost) the number of subdivisions (e.g. in order to have a few(4-8) points per period).
I had the same problem than Jay - not with the Weibull distribution but with the integrate function. I found my answer to Jay's question 3 in a comment to this question:
Divergent Integral in R is solvable in Wolfram
The R package pracma contains several functions for solving integrals numerically. In the package, one finds some R functions for integrating certain mathematical functions. And there is a more general function integral. That helped in my case. Example code is given below.
To questions 2: The first answer to the linked question (above) states that not the complete error message of the C source file is printed out by R (The function may just converge too slowly). Therefore, I would agree with Jay that the fast fluctuation of the cosine may be a problem. In my case and in the example below it was the problem.
Example Code
# load Practical Numerical Math Functions package
library(pracma)
# define function
testfun <- function(r) cos(r*10^6)*exp(-r)
# Integrate it numerically with the basic 'integrate'.
out1 = integarte(testfun, 0, 100)
# "Error in integrate(testfun, 0, 100) : the integral is probably divergent"
# Integrate it numerically with 'integral' from 'pracma' package
# using 'Gauss-Kronrod' method and 10^-8 as relative tolerance. I
# did not try the other ones.
out2 = integral(testfun, 0, 100, method = 'Kronrod', reltol = 1e-8)
Two remarks
The integral function does not break as the integrate function does but it may take quite a long time to run. I do not know (and I did not try) whether the user can limit the number of iterations (?).
Even if the integral function finalises without errors I am not sure how correct the result is. Numerically integrating a function which is fast fluctuating around zero seems to be quite tricky since one does not know where exactly values on the fluctuating function are calculated (twice as much positive than negative values; positive values close to local maxima and negative values far off). I am not on expert on numeric integration but I just got to know some basic fixed-step integration methods in my numerics lectures. So maybe the adaptive methods used in integral deal with this problem in some way.
I'm attempting to answer questions 1 & 3. That being said I am not contributing any original code. I did a google search and hopefully this is helpful. Good luck!
Source:http://cran.r-project.org/doc/contrib/Ricci-distributions-en.pdf (p.6)
#Script
library(ggplot2)
## sampling from a Weibull distribution with parameters shape=2.1 and scale=1.1
x.wei<-rweibull(n=200,shape=2.1,scale=1.1)
#Weibull population with known paramters shape=2 e scale=1
x.teo<-rweibull(n=200,shape=2, scale=1) ## theorical quantiles from a
#Figure
qqplot(x.teo,x.wei,main="QQ-plot distr. Weibull") ## QQ-plot
abline(0,1) ## a 45-degree reference line is plotted
Is this of any use?
http://www.sciencedirect.com/science/article/pii/S0378383907000452
Muraleedharana et al (2007) Modified Weibull distribution for maximum and significant wave height simulation and prediction, Coastal Engineering, Volume 54, Issue 8, August 2007, Pages 630–638
From the abstract: "The characteristic function of the Weibull distribution is derived."
Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 11 years ago.
Improve this question
The I in PID (Proportional Integral Derivative) is the sum of the last few previous errors, weighted only by it's gain.
Using error(-1) to mean the previous error, error(-2) to mean the error before that etc... 'I' can be described as:
I = (error(-1) + error(-2) + error(-3) + error(-4) etc...) * I_gain
Why when PID was designed was 'I' not instead designed to slope off in importance into the past, for example:
I = (error(-1) + (error(-2) * 0.9) + (error(-3) * 0.81) + (error(-4) * 0.729) + etc...) * I_gain
edit: reworded
The integral term is the sum of ALL the past errors. You simply add the error to the "integrator" at each time step. If this needs to be limited, clamp it to a min or max value if it goes out of range. Then copy this accumulated value to your output and add the proportional and derivative terms and clamp the output again if necessary.
The Derivative term is the difference in the present and previous error (the rate of change in the error). P of course is just proportional to the error.
err = reference - new_measurement
I += kI * err
Derivative = err - old_err
output = I - kD * Derivative + kP * err
old_err = err
And there you have it. Limits omitted of course.
Once the controller reaches the reference value, the error will become zero and the integrator will stop changing. Noise will naturally make it bounce around a bit, but it will stay at the steady state value required to meet your objective, while the P and D terms do most of the work to reduce transients.
Notice that in a steady state condition, the I term is the ONLY thing providing any output. If the control has reached the reference and this requires a non-zero output, it is provided solely by the integrator since the error will be zero. If the I term used weighted errors, it would start to decay back to zero and not sustain the output as needed.