Data frames using conditional probabilities to extract a certain range of values - r

I would like some help answering the following question:
Dr Barchan makes 600 independent recordings of Eric’s coordinates (X, Y, Z), selects the cases where X ∈ (0.45, 0.55), and draws a histogram of the Y values for these cases.
By construction, these values of Y follow the conditional distribution of Y given X ∈ (0.45,0.55). Use your function sample3d to mimic this process and draw the resulting histogram. How many samples of Y are displayed in this histogram?
We can argue that the conditional distribution of Y given X ∈ (0.45, 0.55) approximates the conditional distribution of Y given X = 0.5 — and this approximation is improved if we make the interval of X values smaller.
Repeat the above simulations selecting cases where X ∈ (0.5 − δ, 0.5 + δ), using a suitably chosen δ and a large enough sample size to give a reliable picture of the conditional distribution of Y given X = 0.5.
I know for the first paragraph we want to have the values generated for x,y,z we got in sample3d(600) and then restrict the x's to being in the range 0.45-0.55, is there a way to code (maybe an if function) that would allow me to keep values of x in this range but discard all the x's from the 600 generated not in the range? Also does anyone have any hints for the conditional probability bit in the third paragraph.
sample3d = function(n)
{
df = data.frame()
while(n>0)
{
X = runif(1,-1,1)
Y = runif(1,-1,1)
Z = runif(1,-1,1)
a = X^2 + Y^2 + Z^2
if( a < 1 )
{
b = (X^2+Y^2+Z^2)^(0.5)
vector = data.frame(X = X/b, Y = Y/b, Z = Z/b)
df = rbind(vector,df)
n = n- 1
}
}
df
}
sample3d(n)
Any help would be appreciated, thank you.

Your function produces a data frame. The part of the question that asks you to find those values in a data frame that are in a given range can be solved by filtering the data frame. Notice that you're looking for a closed interval (the values aren't included).
df <- sample3d(600)
df[df$X > 0.45 & df$X < 0.55,]
Pay attention to the comma.
You can use a dplyr solution as well, but don't use the helper between(), since it will look at an open interval (you need a closed interval).
filter(df, X > 0.45 & X < 0.55)
For the remainder of your assignment, see what you can figure out and if you run into a specific problem, stack overflow can help you.

Related

Plotting an 'n' sized vector between a given function with given interval in R

Let me make my question clear because I don't know how to ask it properly (therefore I don't know if it was answered already or not), I will go through my whole problem:
There is a given function (which is the right side of an explicit first order differential equation if it matters):
f = function(t,y){
-2*y+3*t
}
Then there's a given interval from 'a' to 'b', this is the range the function is calculated in with 'n' steps, so the step size in the interval (dt) is:
dt=abs(a-b)/n
In this case 'a' is always 0 and 'b' is always positive, so 'b' is always greater than 'a' but I tried to be generic.
The initial condition:
yt0=y0
The calculation that determines the vector:
yt=vector("numeric",n)
for (i in 1:(n-1))
{
yt[1]=f(0,yt0)*dt+yt0
yt[i+1]=(f(dt*i,yt[i]))*dt+yt[i]
}
The created vector is 'n' long, but this is an approximate solution to the differential equation between the interval ranging from 'a' to 'b'. And here comes my problem:
When I try plotting it alongside the exact solution (using deSolve), it is not accurate. The values of the vector are accurate, but it does not know that these values belong to an approximate function that's between the interval range 'a' to 'b' .
That's why the graphs of the exact and approximate solution are not matching at all. I feel pretty burnt out, so I might not describe my issue properly, but is there a solution to this? To make it realise that its values are between 'a' and 'b' on the 'x' axis and not between '1' and 'n'?
I thank you all for the answers in advance!
The deSolve lines I used (regarding 'b' is greater than 'a'):
df = function(t, y, params) list(-2*y+3*t)
t = seq(a, b, length.out = n)
ddf = as.data.frame(ode(yt0, t, df, parms=NULL))
I tried to reconstruct the comparison between an "approximate" solution using a loop (that is in fact the Euler method), and a solution with package deSolve. It uses the lsoda solver by default that is more precise than Euler'S method, but it is of course also an approximation (default relative and absolute tolerance set to 1e-6).
As the question missed some concrete values and the plot functions, it was not clear where the original problem was, but the following example may help to re-formulate the question. I assume that the problem may be confusion between t (absolute time) and dt between the two approaches. Compare the lines marked as "original code" with the "suggestion":
library(deSolve)
f = function(t, y){
-2 * y + 3 * t
}
## some values
y0 <- 0.1
a <- 3
b <- 5
n <- 100
## Euler method using a loop
dt <- abs(a-b)/n
yt <- vector("numeric", n)
yt[1] <- f(0, y0) * dt + y0 # written before the loop
for (i in 1:(n-1)) {
#yt[i+1] = (f( dt * i, yt[i])) * dt + yt[i] # original code
yt[i+1] <- (f(a + dt * i, yt[i])) * dt + yt[i] # suggestion
}
## Lsoda integration wit package deSolve
df <- function(t, y, params) list(-2*y + 3*t)
t <- seq(a, b, length.out = n)
ddf = as.data.frame(ode(y0, t, df, parms=NULL))
## Plot of both solutions
plot(ddf, type="l", lwd=5, col="orange", ylab="y", las=1)
lines(t, yt, lwd=2, lty="dashed", col="blue")
legend("topleft", c("deSolve", "for loop"),
lty=c("solid", "dashed"), lwd=c(5, 2), col=c("orange", "blue"))

Plotting function with 3 parameters (4d) in R

I want to see how three variables x, y, and z respond to a function f using R.
I've searched for R solutions (e.g. rgl using 4d plots) but none seem to allow the input of a function as the fourth variable while allowing manipulation of x, y, and z across their full range of values.
# First I create three variables that each have a domain 0 to 4
x
y
z
# Then I create a function from those three variables
f <- sqrt(x^2 + y^2 + z^2)
EDIT: I originally stated that I wanted x, y, and z to be seq(0, 4, 0.01) but in fact I only want them to range from 0 to 4, and do so independently of other variables. In other words, I want to plot the function across a range of values letting x move independently of y and z and so forth, rather than plotting a 3-D line. The result should be a 3-D surface.
I want to:
a) see how the function f responds to all possible combinations of x, y, and z across a range of x, y, and z values 0 to 4, and
b) find what maxima/minima exist especially when holding one variable constant.
This is rather a mathematical questions. Unfortunately, our computer screens are not really made fro 4D, neither our brains. So what you ask wont be possible as if. Indeed, you want to show a dense set of data (a cube between 0 and 4), and we can not display what is "inside" the cube.
To come back to R, you can always display a slice of it, for example fixing z and plot sqrt(x^2 + y^2 + z^2) for x and y. Here you have two examples:
# Points where the function should be evaluated
x <- seq(0, 4, 0.01)
y <- seq(0, 4, 0.01)
z <- seq(0, 4, 0.01)
# Compute the distance from origin
distance <- function(x,y,z) {
sqrt(x^2 + y^2 + z^2)
}
# Matrix to store the results
slice=matrix(0, nrow=length(x),ncol=length(y))
# Fill the matrix with a slice at z=3
i=1
for (y_val in y)
{
slice[,i]=distance(x,y_val,3)
i=i+1
}
# PLot with plot3D library
require(plot3D)
persp3D(z = slice, theta = 100,phi=50)
# PLot with raster library
library(raster)
plot(raster(slice,xmn=min(x), xmx=max(x), ymn=min(y), ymx=max(y)))
If you change your z values, you will not really change the shape (just making it "flatter" for bigger z). Note that the function being symmetric in x, y and z, the same plots are produced if you keep xor y constant.
For your last question about the maximum, you can re-use the slice matrix and do:
max_ind=which(slice==max(slice),arr.ind = TRUE)
x[max_ind[,1]]
y[max_ind[,2]]
(see Get the row and column name of the minimum element of a matrix)
But again with math we can see from your equation that the maximum will always be obtained by maxing x, y and z. Indeed, the function simply measure the distance from the origin.

predict x value with given y using loess

I have a dataset from a biological experiment:
x = c(0.488, 0.977, 1.953, 3.906, 7.812, 15.625, 31.250, 62.500, 125.000, 250.000, 500.000, 1000.000)
y = c(0.933, 1.036, 1.112, 1.627, 2.646, 5.366, 11.115, 2.355, 1.266, 0, 0, 0)
plot(log(x),y)
x represents a concentration and y represents the response in our assay.
The plot can be found here: 1
How can I predict the x-value (concentration) of a pre-defined y-value (in my case 1.5)?
After a loess smoothing I can predict the y-value at a defined x-value. See the example:
smooth_data <- loess(y~log(x))
predict(smooth_data, 1.07) # which gives 1.5
Using the predict function, both x = 1.07 and x = 5.185 result in y = 1.5
Is there a convenient way to get the estimates from the loess regression at y = 1.5 without manually typing some x values into the predict function?
Any suggestions?
I gues your x and y's are pairs? so for f(0.488) = 0.933 and so on?
More of a mathproblem in my opinion :).
If you could define a function that describes your graph it would be pretty easy.
You could also draw a straight line between all points and for every line that intersects with your y value you could get corrosponding x values. But straight lines wouldn't be really precies.
If you have enough pairs you could also train a neureal network. That might get you the best results but takes some time and alot of pairs to train well.
Could you clarify your question a bit and tell us what you are looking for? A way to do it or a code example?
I hope this is helping you atleast a little bit :)
Since your function is not monotonic, there is no true inverse, but if you split it into two functions - one for x < maximum and one for x > maximum - you can just create two inverse functions and solve for whatever values of y you want.
smooth_data <- loess(y~log(x))
X = seq(0,6.9,0.1)
P = predict(smooth_data, X)
M = which.max(P)
Inverse1 = approxfun(X[1:M] ~ P[1:M])
Inverse2 = approxfun(X[M:length(X)] ~ P[M:length(X)])
Inverse1(1.5)
[1] 1.068267
predict(smooth_data, 1.068267)
[1] 1.498854
Inverse2(1.5)
[1] 5.185876
predict(smooth_data, 5.185876)
[1] 1.499585

Generating values of x,y for equally spaced x in an interval

I am new to working in R and I would like to generate values of x,y to plot for lowess smoothing. I would like to generate equally spaced x values in an interval for a given function.
For example, I would like to generate the values for the function:
f(x) = 5x^3 - 2x^2 -2x +1
in the interval of [-5,5].
(p.s. my background is in biology so I don't understand the technical things as well as I would like!)
You mean something like this
f1 <- function(x) (5*x)^3 - (2*x)^2 - 2*x + 1
seqx <- seq(-5,5, by = 0.1)
plot(seqx, f1(seqx), pch = 20)

How to find 10 values, exponentially distributed, which sum to a value, x

I have a value, for example 2.8. I want to find 10 numbers which are on an exponential curve, which sum to this value.
That is, I want to end up with 10 numbers which sum to 2.8, and which, when plotted, look like the curve below (exponential decay). These 10 numbers should be equally spaced along the curve - that is, the 'x-step' between the values should be constant.
This value of 2.8 will be entered by the user, and therefore the way I calculate this needs to be some kind of algorithm that I can program (hence asking this on SO not Math.SE).
I have no idea where to start with this at all - any ideas?
You want to have 10 x values equally distributed, i.e. x_k = a + k * b. They shall fulfill sum(exp(-x_k)) = v with v being your target value (the 2.8). This means exp(-a) * sum(exp(-b)^k) = v.
Obviously, there is a solution for each choice of b if v is positive. Set b to an arbitrary value, and calculate a from it.
E.g. for v = 2.8 and b = 0.1, you get a = -log(v / sum(exp(-b)^k)) = -log(2.8/sum(0.90484^k)) = -log(2.8/6.6425) = -log(0.421526) = 0.86387.
So for this example, the x values would be 0.86387, 0.96387, ..., 1.76387 and the y values 0.421526, 0.381412, 0.345116, 0.312274, 0.282557, 0.255668, 0.231338, 0.209324, 0.189404, 0.171380.
Update:
As it has been clarified that the curve can be scaled arbitrarily and the xs are preferred to be 1, 2, 3 ... 9, this is much more simple.
Assuming the curve function is r*exp(-x), the 10 values would be r*exp(-1) ... r*exp(-9). Their sum is r*sum(exp(-x)) = r*0.58190489. So to reach a certain value (2.8) you just have to adjust the r accordingly:
r = 2.8/sum(exp(-x)) = 4.81178294
And you get the 10 values: 1.770156, 0.651204, 0.239565, 0.088131, 0.032422, 0.011927, 0.004388, 0.001614, 0.000594.
If I understand your question correctly then you want to find x which solves the equation
It can be solved as
(just sum numbers as geometric progression)
The equation under RootOf will always have 1 real square different from 1 for 2.8 or any other positive number. You can solve it using some root-finding algorithm (1 is always a root but it does not solve original task). For constant a you can choose any number you like.
After computing the x you can easily calculate 10 numbers as .
I'm going to generalize and assume you want N numbers summing to V.
Since your numbers are equally spaced on an exponential you can write your sum as
a + a*x + a*x^2 + ... + a*x^(N-1) = V
Where the first point has value a, and the second a*x etc.
You can take out a factor of a and get:
a ( 1 + x + x^2 + ... + x^(N-1) ) = V
If we're free to pick x then we can solve for a easily
a = V / ( 1 + x + x^2 + .. x^(N-1) )
= V*(x+1)/(x^N-1)
Substituting that back into
a, a*x, a*x^2, ..., a*x^(N-1)
gives the required sequence

Resources