Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 5 years ago.
Improve this question
I apologize if my question is simple. I tried to find the answer but I didn't find much info.
I use the scale() function in R to scale my data. What I don't understand is that when I plot my scaled data using matplot() it seems my scaled data aren't symmetric. which means the range of the sacled data is -1,-0.5,0,0.5,1,1.5. As I know, we scale the data to mean zero and standard deviation s. So my data should have a deviation of s from mean but here I have a deviation of 1.5 and a deviation of -1. Why?
Your data are not symmetric around their mean.
Compare the following:
x <- runif(1000) # symmetric around 0.5
y <- rexp(1000) # not symmetric around 1 at all
summary(scale(x))
summary(scale(y))
Related
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 6 years ago.
Improve this question
I have distribution of parameter (natural gas mixture composition) expressed in percents. How to test such data for distribution parameters (it should be gamma, normal or lognormal distribution) and generate random composition based on that parameters in R?
This might be a better question for CrossValidated, but:
it is not generally a good idea to choose from among a range of possible distributions according to goodness of fit. Instead, you should choose according to the qualitative characteristics of your data, something like this:
Frustratingly, this chart doesn't actually have the best choice for your data (composition, continuous, bounded between 0 and 1 [or 0 and 100]), which is a Beta distribution (although there are technical issues if you have values of exactly 0 or 100 in your sample).
In R:
## some arbitrary data
z <- c(2,8,40,45,56,58,70,89)
## fit (beta values must be in (0,1), not (0,100), so divide by 100)
(m <- MASS::fitdistr(z/100,"beta",start=list(shape1=1,shape2=1)))
## sample 1000 new values
z_new <- 100*rbeta(n=1000,shape1=m$estimate["shape1"],
shape2=m$estimate["shape2"])
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
Hello Everyone!
I am fairly new to R programming and hence I have a small doubt regarding the distance (or offset) of the data-set points from their Best-fit Curve.
The given figure shows some points and a Best-fit Curve for those points.
As we can see some points are very far away from the Best-fit curve and I want to write a code which will tell me the distance (or offset) of all the points from the curve. Then I want to display all the points that are far away from the curve.
I have the equation of the curve and all the data points. The curve has an exponential equation.
The uploaded image is just a approximation of the real figure. I drew this one just as an example.
If someone can tell me what method or functions shoul be used here then it would be a big help.
Thank You.
In many R situations you will actually fit the data with a function such as lm or loess or a glm for instance and the model summary will save residuals with the result.
If you indeed have your own equation then you simply want to take those values of x from the data points - calculate the equation y-values, then subtract them from the corresponding data y-values.
e.g. a toy example
# decay function
x= 1:50
start= 80
decay=0.95
equation_y=start*(decay^x)
plot(x,equation_y, type="l")
# simulated data points
data_y = equation_y + rnorm(50, sd=3)
points(x,data_y, col="red")
# the differences
equation_y - data_y
Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 7 years ago.
Improve this question
How can I create a normal probability plot of residuals in R so that there are normal probability values on y-axis?
Normally you'll make the normal probability plot with qqnorm and qqline.
Example:
fit <- lm(resp ~ dep1 + dep2)
qqnorm(fit$residuals, datax=TRUE)
qqline(fit$residuals, datax=TRUE)
You can get residuals vs. prob. with the plot and pnorm:
plot(fit$residuals, pnorm(fit$residuals))
(with prob. on the y-axis)
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions concerning problems with code you've written must describe the specific problem — and include valid code to reproduce it — in the question itself. See SSCCE.org for guidance.
Closed 9 years ago.
Improve this question
I have a number of melting curves, for which I want to determine the slope of the steepest part between the minimum (valley) and maximum (peak) using R code (the slope in the inflection point corresponds to the melting point). The solutions I can imagine are either to determine the slope in every point and then find the maximum positive value, or by fitting a 4-parameter Weibull-type curve using the drc package to determine the inflection point (basically corresponding to the 50% response point between minimum and maximum). In the latter case the tricky part is that this fitting has to be restricted for each curve to the temperature range between the minimum (valley) and maximum (peak) fluorescence response. These temperature ranges are different for each curve.
Grateful for any feedback!
The diff function accomplishes the equivalent of numerical differentiation on equally spaced values (up to a constant factor) so finding maximum (or minimum) values can be used to identify location of steepest ascent (or descent):
z <- exp(-seq(0,3, by=0.1)^2 )
plot(z)
plot(diff(z))
z[ which(abs(diff(z))==max(abs(diff(z))) )]
# [1] 0.6126264
# could have also tested for min() instead of max(abs())
plot(z)
abline( v = which(abs(diff(z))==max(abs(diff(z))) ) )
abline( h = z[which(abs(diff(z))==max(abs(diff(z))) ) ] )
With an x-difference of 1, the slope is just the difference at that point:
diff(z) [ which(abs(diff(z))==max(abs(diff(z))) ) ]
[1] -0.08533397
... but I question whether that is really of much interest. I would have thought that getting the index (which would be the melting point subject to an offset) would be the value of interest.
Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
I have 5 values, for example like this:
3
11
8
5
8
I want to calculate the average difference between them.
If I had just two values, say 3 and 11, the difference would be 8.
But how do I do this when I have more values (for example five as in my example above)?
I can not show the answer in detail because I can not format math in this board. Please refer to the mathematics subboard for math related question.
Not exactly sure what you are after but it might be the standard deviation
The standard deviation is a measure of the relative deviation from each number with respect to the ensemble average.
You might be looking for Variance or Standard Deviation.
Variance -> A measure of the dispersion of a set of data points around their mean value. Variance is a mathematical expectation of the average squared deviations from the mean.
Estimating the variance