How to simply extract specific values of regression curve? - r

I would like to extract several predicted y-values for the x-values given from this graph :
I know that it is possible to get the x and y coordinates of the curve by using the following function :
coordinate <- ggplot_build(curve)$data[[2]][,c("x","y")]
head(coordinate,n = 6L)
# x y
1 0.1810660 32845.225
2 0.4810660 27635.136
3 0.7553301 23904.792
4 1.3295942 18316.923
5 1.8288582 15092.595
6 5.0312446 8018.707
Is there a function that allows you to directly obtain the predicted value of y for a given x value that does not appear in coordinate such as for example 3.5?

As Gregor mentions, you should fit a model aside of the plot.
Best you can do to "simply" obtain a value otherwise is an interpolating spline
sfun = splinefun(coordinate)
sfun(3.5)

Related

I wonder is histogram is appropriate for this kind of data and how to make graph in R Studio that i want

Hi ~ I'm try to make graph which has sample mean on x-axis and
relative frequency(?) on y-axis
to make sure i will give example!
for example when i pick 1sample from c(1,2,3,4,5)
the possible result will be 1 and 2 and 3 and 4 and5
in that case the relative frequency is 1/5 each !!
so in this case my graph will show 1,2,3,4,5 on x-axis
0.2 for y -axis (because they are same in 1/5)
and if i pick 2sample from c(1,2,3,4,5) case would be
(1,2) and (1,3), (1,4), (1,5) (2,3)..... and so on (total 10cases)
so sample mean would be (1+2)/2=1.5 .. (1+3)/2=2 .... etc
so in this case x value will be 1.5, 2 ... etc and y value will
1/10 1/10 ...
so, My question is, is histogram is appropriate for this graph??
i want to plot which have sample mean on x -axis, relative frequency on y-axis
and make a line that connect a dot
sorry for too long question
thanks for reading!!
Yes, it's entirely appropriate to plot a histogram of sample means. This is an example of a sampling distribution.
To do this, you would create an object that contains the sample means, and then just plot a histogram of that object as you would with any other histogram. The value of the sample means would be on the x axis, and frequency or relative frequency on the y axis. You would have to choose an appropriate bin number and breaks vector for your purpose, but it's the same as any other histogram.

predict x value with given y using loess

I have a dataset from a biological experiment:
x = c(0.488, 0.977, 1.953, 3.906, 7.812, 15.625, 31.250, 62.500, 125.000, 250.000, 500.000, 1000.000)
y = c(0.933, 1.036, 1.112, 1.627, 2.646, 5.366, 11.115, 2.355, 1.266, 0, 0, 0)
plot(log(x),y)
x represents a concentration and y represents the response in our assay.
The plot can be found here: 1
How can I predict the x-value (concentration) of a pre-defined y-value (in my case 1.5)?
After a loess smoothing I can predict the y-value at a defined x-value. See the example:
smooth_data <- loess(y~log(x))
predict(smooth_data, 1.07) # which gives 1.5
Using the predict function, both x = 1.07 and x = 5.185 result in y = 1.5
Is there a convenient way to get the estimates from the loess regression at y = 1.5 without manually typing some x values into the predict function?
Any suggestions?
I gues your x and y's are pairs? so for f(0.488) = 0.933 and so on?
More of a mathproblem in my opinion :).
If you could define a function that describes your graph it would be pretty easy.
You could also draw a straight line between all points and for every line that intersects with your y value you could get corrosponding x values. But straight lines wouldn't be really precies.
If you have enough pairs you could also train a neureal network. That might get you the best results but takes some time and alot of pairs to train well.
Could you clarify your question a bit and tell us what you are looking for? A way to do it or a code example?
I hope this is helping you atleast a little bit :)
Since your function is not monotonic, there is no true inverse, but if you split it into two functions - one for x < maximum and one for x > maximum - you can just create two inverse functions and solve for whatever values of y you want.
smooth_data <- loess(y~log(x))
X = seq(0,6.9,0.1)
P = predict(smooth_data, X)
M = which.max(P)
Inverse1 = approxfun(X[1:M] ~ P[1:M])
Inverse2 = approxfun(X[M:length(X)] ~ P[M:length(X)])
Inverse1(1.5)
[1] 1.068267
predict(smooth_data, 1.068267)
[1] 1.498854
Inverse2(1.5)
[1] 5.185876
predict(smooth_data, 5.185876)
[1] 1.499585

How do I select data inside a density curve in R?

I have a 2 varaible data set that I have to plot (na and ob). I applied the kde2d kernel and plotted 1 to 4 sigma density curves (confidencebound). I need to select those points that are inside 2 sigma curves (letting out all those between 1 and 2 sigmas), but not just in the plot, I neet select them from the data set, put them in a new list. Could you please help me with this?
kde_BPT <- kde2d(na,ob, n=1000, lims=c(-2,2,-1.5,1.5))
confidencebound <- quantile(kde_BPT$z, probs=c(0.685,0.955,0.9975,0.99995), na.rm = TRUE)
The data are to large to paste here. I put here the plot if that helps, I need to know which data points (any colour) are in the area between the contour curves of 1 and 2 (sigmas).
The plot
Thanks for your help.

R: area under curve of ogive?

I have an algorithm that uses an x,y plot of sorted y data to produce an ogive.
I then derive the area under the curve to derive %'s.
I'd like to do something similar using kernel density estimation. I like how the upper/lower bounds are smoothed out using kernel densities (i.e. the min and max will extend slightly beyond my hard coded input).
Either way... I was wondering if there is a way to treat an ogive as a type of cumulative distribution function and/or use kernel density estimation to derive a cumulative distribution function given y data?
I apologize if this is a confusing question. I know there is a way to derive a cumulative frequency graph (i.e. ogive). However, I can't determine how to derive a % given this cumulative frequency graph.
What I don't want is an ecdf. I know how to do that, and I am not quite trying to capture an ecdf. But, rather integration of an ogive given two intervals.
I'm not exactly sure what you have in mind, but here's a way to calculate the area under the curve for a kernel density estimate (or more generally for any case where you have the y values at equally spaced x-values (though you can, of course, generalize to variable x intervals as well)):
library(zoo)
# Kernel density estimate
# Set n to higher value to get a finer grid
set.seed(67839)
dens = density(c(rnorm(500,5,2),rnorm(200,20,3)), n=2^5)
# How to extract the x and y values of the density estimate
#dens$y
#dens$x
# x interval
dx = median(diff(dens$x))
# mean height for each pair of y values
h = rollmean(dens$y, 2)
# Area under curve
sum(h*dx) # 1.000943
# Cumulative area
# cumsum(h*dx)
# Plot density, showing points at which density is calculated
plot(dens)
abline(v=dens$x, col="#FF000060", lty="11")
# Plot cumulative area under curve, showing mid-point of each x-interval
plot(dens$x[-length(dens$x)] + 0.5*dx, cumsum(h*dx), type="l")
abline(v=dens$x[-length(dens$x)] + 0.5*dx, col="#FF000060", lty="11")
UPDATE to include ecdf function
To address your comments, look at the two plots below. The first is the empirical cumulative distribution function (ECDF) of the mixture of normal distributions that I used above. Note that the plot of this data looks the same below as it does above. The second is a plot of the ECDF of a plain vanilla normal distribution, mean=0, sd=1.
set.seed(67839)
x = c(rnorm(500,5,2),rnorm(200,20,3))
plot(ecdf(x), do.points=FALSE)
plot(ecdf(rnorm(1000)))

Generating multidimensional data

Does R have a package for generating random numbers in multi-dimensional space? For example, suppose I want to generate 1000 points inside a cuboid or a sphere.
I have some functions for hypercube and n-sphere selection that generate dataframes with cartesian coordinates and guarantee a uniform distribution through the hypercube or n-sphere for an arbitrary amount of dimensions :
GenerateCubiclePoints <- function(nrPoints,nrDim,center=rep(0,nrDim),l=1){
x <- matrix(runif(nrPoints*nrDim,-1,1),ncol=nrDim)
x <- as.data.frame(
t(apply(x*(l/2),1,'+',center))
)
names(x) <- make.names(seq_len(nrDim))
x
}
is in a cube/hypercube of nrDim dimensions with a center and l the length of one side.
For an n-sphere with nrDim dimensions, you can do something similar, where r is the radius :
GenerateSpherePoints <- function(nrPoints,nrDim,center=rep(0,nrDim),r=1){
#generate the polar coordinates!
x <- matrix(runif(nrPoints*nrDim,-pi,pi),ncol=nrDim)
x[,nrDim] <- x[,nrDim]/2
#recalculate them to cartesians
sin.x <- sin(x)
cos.x <- cos(x)
cos.x[,nrDim] <- 1 # see the formula for n.spheres
y <- sapply(1:nrDim, function(i){
if(i==1){
cos.x[,1]
} else {
cos.x[,i]*apply(sin.x[,1:(i-1),drop=F],1,prod)
}
})*sqrt(runif(nrPoints,0,r^2))
y <- as.data.frame(
t(apply(y,1,'+',center))
)
names(y) <- make.names(seq_len(nrDim))
y
}
in 2 dimensions, these give :
From code :
T1 <- GenerateCubiclePoints(10000,2,c(4,3),5)
T2 <- GenerateSpherePoints(10000,2,c(-5,3),2)
op <- par(mfrow=c(1,2))
plot(T1)
plot(T2)
par(op)
Also check out the copula package. This will generate data within a cube/hypercube with uniform margins, but with correlation structures that you set. The generated variables can then be transformed to represent other shapes, but still with relations other than independent.
If you want more complex shapes but are happy with uniform and idependent within the shape then you can just do rejection sampling: generate data within a cube that contains your shape, then test if the points are within your shape, reject them if not, then keep doing this until there are enough points.
A couple of years ago, I made a package called geozoo. It is available on CRAN.
install.packages("geozoo")
library(geozoo)
It has many different functions to produce objects in N-dimensions.
p = 4
n = 1000
# Cube with points on it's face.
# A 3D version would be a box with solid walls and a hollow interior.
cube.face(p)
# Hollow sphere
sphere.hollow(p, n)
# Solid cube
cube.solid.random(p, n)
cube.solid.grid(p, 10) # evenly spaced points
# Solid Sphere
sphere.solid.random(p, n)
sphere.solid.grid(p, 10) # evenly spaced points
One of my favorite ones to watch animate is a cube with points along its edges, because it was one of the first objects that I made. It also gives you a sense of distance between vertices.
# Cube with points along it's edges.
cube.dotline(4)
Also, check out the website: http://streaming.stat.iastate.edu/~dicook/geometric-data/. It contains pictures and downloadable data sets.
Hope it meets your needs!
Cuboid:
df <- data.frame(
x = runif(1000),
y = runif(1000),
z = runif(1000)
)
head(df)
x y z
1 0.7522104 0.579833314 0.7878651
2 0.2846864 0.520284731 0.8435828
3 0.2240340 0.001686003 0.2143208
4 0.4933712 0.250840233 0.4618258
5 0.6749785 0.298335804 0.4494820
6 0.7089414 0.141114804 0.3772317
Sphere:
df <- data.frame(
radius = runif(1000),
inclination = 2*pi*runif(1000),
azimuth = 2*pi*runif(1000)
)
head(df)
radius inclination azimuth
1 0.1233281 5.363530 1.747377
2 0.1872865 5.309806 4.933985
3 0.2371039 5.029894 6.160549
4 0.2438854 2.962975 2.862862
5 0.5300013 3.340892 1.647043
6 0.6972793 4.777056 2.381325
Note: edited to include code for sphere
Here is one way to do it.
Say we hope to generate a bunch of 3d points of the form y = (y_1, y_2, y_3)
Sample X from multivariate Gaussian with mean zero and covariance matrix R.
(x_1, x_2, x_3) ~ Multivariate_Gaussian(u = [0,0,0], R = [[r_11, r_12, r_13],r_21, r_22, r_23], [r_31, r_32, r_33]]
You can find a function which generates Multivariate Gaussian samples in an R package.
Take the Gaussian cdf of each covariate (phi(x_1) , phi(x_2), phi(x_3)). In this case, phi is the Gaussian cdf of our variables. Ie phi(x_1) = Pr[x <= x_1] By the probability integral transform, these (phi(x_1) , phi(x_2), phi(x_3)) = (u_1, u_2, u_3), will each be uniformly distrubted on [0,1].
Then, take the inverse cdf of each uniformly distributed marginal. In other words take the inverse cdf of u_1, u_2, u_3:
F^{-1}(u_1), F^{-2}(u_2), F^{-3}(u_3) = (y_1, y_2, y_3), where F is the marginal cdf of the distrubution you are trying to sample from.

Resources