Finding variance of a subset of data from a scatterplot - r

I have a scatterplot of x versus y. I have drawn an abline down the middle of the plot. I want to calculate the variance of the points on the left of the abline and I want to calculate the variance of the points on the right of the abline. This is most likely a relatively simple problem, but I'm struggling to find a solution. Any advice is appreciated. Thanks in advance.
x = rnorm(100,mean=12,sd=2)
y = rnorm(100,mean=20,sd=5)
data = as.data.frame(cbind(x,y))
plot(x=x,y=y,type="p")
abline(v=12,col="red")

In your sample code you have a vertical line v = 12. Your data points (x, y) are split into two groups as x < 12 and x >= 12. It is straightforward to do something like:
var(y[x < 12])
var(y[x >= 12])
But we can also use a single call to tapply:
tapply(y, x < 12, FUN = var)
More generally if you have a line y = a * x + b, where a is slope and b is intercept, your data points (x, y) will be split into two groups: y < a * x + b (below the line) and y >= a * x + b (above the line), so that you may use
tapply(y, y < a * x + b, FUN = var)

Related

R: Interpolate x values given y and z in a matrix using interp2 and fzero

Brief summary
I have a matrix of values representing a topological surface. I'm trying to calculate the x values for an exact contour z for each column y.
In Depth
I'm fitting experimental data to a 3-dimensional non-linear decay model, z(x,y).
Any code for my analysis needs to fit a number of different datasets. I've managed to parameterize the model successfully, and calculate z given each whole value of x and y. I then need to extract the values of specific contours.
The range of each variable are x: (0 > x > 80); y: (0 > y > 100) ; z: (0 > z > 1).
require(pracma)
# Truncated matrix with demo data, normally created by fitting parameterized algorithm to 81x100 matrix
M <- matrix(c(0,1,1,1,1,1,1,1,1,
0,0.843,1,1,1,1,1,1,1,
0,0.484,0.907,1,1,1,1,1,1,
0,0.218,0.459,0.721,0.978,1,1,1,1,
0,0.082,0.185,0.313,0.461,0.621,0.781,0.925,1,
0,0.029,0.066,0.113,0.171,0.242,0.323,0.414,0.511,
0,0.010,0.022,0.038,0.058,0.082,0.112,0.146,0.186,
0,0.003,0.008,0.013,0.020,0.028,0.037,0.049,0.062,
0,0.001,0.003,0.004,0.007,0.009,0.012,0.016,0.021,
0,0.000,0.001,0.001,0.002,0.003,0.004,0.006,0.007,
0,0.000,0.000,0.001,0.001,0.001,0.001,0.002,0.002),ncol = 11)
# Extract Contours ----
X <- seq(from = 0, to = 80, by = 10) # normally by = 1 on full dataset
Y <- seq(from = 0, to = 100, by = 10)
z <- 0.8
FindZ <- function(x) interp2(X, Y, M, x, y)-z
x_out <- matrix()
for(i in 1:11){
y <- i*10 # adjusted for truncated dataset. Normally in increments of 1
x <- fzero(FindZ, mean(X))
x_out[i] <- x$x
print(x$x)
next}
I get the following error:
Error in if (fb == 0) return(list(x = b, fval = fb)) :
missing value where TRUE/FALSE needed
I realize not all rows y will contain values exceeding the z contour I'm looking for at any given time. To circumvent I have attempted to use tryCatch() on my larger dataset . When I do this, I get nothing for say, up to loop iteration y[20], it runs successfully for approximately 15 iterations, then I get the same error for the balance. The problem being I have been getting this on rows where there are values >z, and the interpolation runs to an error. I have also switched the code, and sought solutions y for values of x and z, and achieved almost exactly the same result.
I don't want to subset the data as I don't want to lose the absolute x,y positions as they correspond to physical measurements. If there is no z value, I'd be expecting to place a 0 for the column using replace().
Ideally, I want a matrix of values of (x,y) that if I fed back into my original function, they would calculate to my value z. Any pointers would be great.
I think you're trying to reinvent the wheel here. What you are attempting to do can be achieved efficiently using the isoband package, which can calculate the x, y co-ordinates of an exact z value given a matrix of z values with vectors of x, y co-ordinates:
result <- as.data.frame(isoband::isobands(Y, X, M, z, z)[[1]])
head(result)
#> x y id
#> 1 44.08998 80.00000 1
#> 2 44.08998 80.00000 1
#> 3 42.44618 70.00000 1
#> 4 40.00000 61.31944 1
#> 5 39.13242 60.00000 1
#> 6 35.27704 50.00000 1
We can show this by plotting your grid with color representing the z values, and draw our result as a line.
library(ggplot2)
ggplot(reshape2::melt(M), aes(Y[Var2], X[Var1])) +
geom_tile(aes(fill = value)) +
geom_path(data = result, aes(x , y), col = "red")
Created on 2022-08-08 by the reprex package (v2.0.1)

How to create a 3D surface in R?

I am trying to make a 3D surface plot in R. In this the values for the z and x axes should be within a given range, and the value of y should depend on both x and z as described in the function.
z <- maxGiraffeNumber <- c(100:2400)
x <- Tourism <- c(1:100)
y <- Rain <- 100/(17/((z/365)*0.3))*(100-y)
surface3d(x,y,z, col=colors)
running this code gives me the following error
Error in rgl.surface(x = 1:100, y = 100:2400, z = c(14.7789550384904, :
'y' length != 'x' rows * 'z' cols
Thank you for your help
The problem is stated fairly well in the error. To produce a surface plot you need a vector of x co-ordinates of any length and a vector of y co-ordinates of any length. However, your z vector needs to have a value at every (x, y) co-ordinate, which means you need to be sure that length(z) == length(x) * length(y). However, what you have is x of length 100, y of length 2301 and z of length 2301.
If you have a function you want to apply to every possible combination of x and y, you can use outer.
I'll give an example of producing a surface with something similar to the code you have created here, but it's probably not exactly what you were looking for, since it's not clear exactly what you are trying to do.
library(rgl)
f <- function(x, y) 100 / (17/((x / 365) * 0.3)) * (100 - y)
y <- Rain <- c(1:100)
x <- Tourism <- c(1:100)
z <- maxGiraffeNumber <- outer(Rain, Tourism, f)
surface3d(Tourism, Rain, maxGiraffeNumber, col = "red")
Which makes the following rotatable 3D surface pop-up:

Calculating the distance along a line that each point would intersect at

I would like to fit a line through two points from a random distribution of points, then calculate the location along that line that each point intersects it orthogonally. I am not interested in the residual distance of each point from the line (points above/below the line are treated equally), I am only interested in calculating the location along the line of where that point would intersect (e.g. points at different distances from the line but at the same orthogonal location would have the same value). The data aren't connected to the line explicitly as the abline is drawn from the location of only 2 points, and so i can't extract these values in a classic residual type way. I don't think this is difficult, but I can't wrap by head around how to calculate it and it's really bugging me!
I have explored the dist2d function but that calculates the orthogonal distance of each point to the line. Is there a way to use that value to the then calculate the hypotenuse from the data point to some fixed constant point on the line, and then in turn calculate the adjacent distance from that constant? I would really appreciate any help!
#here is some example starter code here to visualise what I mean
#get random data
r = rnorm(100)
t = rnorm(100)
#bind and turn into a df
data = cbind(r,t)
data = as.data.frame(data)
head(data)
#plot
plot(data)
#want to draw abline between 2 points
#isolate points of interest
#here randomly select first two rows
d = data[c(1:2),]
head(d)
#calculate abline through selected points
lm = lm(t ~ r, d)
abline(lm)
#draw points to see which ones they cut through
points(d$r, d$t, bg = "red", pch = 21)
This code below works.
# Create dataframe
data = data.frame(x = rnorm(100), y = rnorm(100))
plot(data, xlim=c(-3, 3), ylim=c(-3, 3))
# Select two points
data$x1_red <- data[1,1]; data$y1_red <- data[1,2]; data$x2_red <- data[2,1]; data$y2_red <- data[2,2];
points(data$x1_red, data$y1_red, bg = "red", pch = 21); points(data$x2_red, data$y2_red, bg = "red", pch = 21);
# Show a red line where the points intersect
# Get its slope (m_red) and intercept (b_red)
data$m_red <- (data[2,2] - data[1,2]) / (data[2,1] - data[1,1])
data$b_red <- data$y1_red - data$m * data$x1_red
abline(data$b_red, data$m_red, col='red')
# Calculate the orthogonal slope
data$m_blue <- (-1/data$m_red)
abline(0, data$m_blue, col='blue')
# Solve for each point's b-intercept (if using the blue slope)
# y = m_blue * x + b
# b = y - m_blue * x
data$b <- data$y - data$m_blue * data$x
# Solve for where each point (using the m_blue slope) intersects the red line (x' and y')
# y' = m_blue * x' + b
# y' = m_red * x' + b_red
# Set those equations equal to each other and solve for x'
data$x_intersect <- (data$b_red - data$b) / (data$m_blue - data$m_red)
# Then solve for y'
data$y_intersect <- data$m_blue * data$x_intersect + data$b
# Calculate the distance between the point and where it intersects the red line
data$dist <- sqrt( (data$x - data$x_intersect)^2 + (data$y - data$y_intersect)^2 )

Plotting function with 3 parameters (4d) in R

I want to see how three variables x, y, and z respond to a function f using R.
I've searched for R solutions (e.g. rgl using 4d plots) but none seem to allow the input of a function as the fourth variable while allowing manipulation of x, y, and z across their full range of values.
# First I create three variables that each have a domain 0 to 4
x
y
z
# Then I create a function from those three variables
f <- sqrt(x^2 + y^2 + z^2)
EDIT: I originally stated that I wanted x, y, and z to be seq(0, 4, 0.01) but in fact I only want them to range from 0 to 4, and do so independently of other variables. In other words, I want to plot the function across a range of values letting x move independently of y and z and so forth, rather than plotting a 3-D line. The result should be a 3-D surface.
I want to:
a) see how the function f responds to all possible combinations of x, y, and z across a range of x, y, and z values 0 to 4, and
b) find what maxima/minima exist especially when holding one variable constant.
This is rather a mathematical questions. Unfortunately, our computer screens are not really made fro 4D, neither our brains. So what you ask wont be possible as if. Indeed, you want to show a dense set of data (a cube between 0 and 4), and we can not display what is "inside" the cube.
To come back to R, you can always display a slice of it, for example fixing z and plot sqrt(x^2 + y^2 + z^2) for x and y. Here you have two examples:
# Points where the function should be evaluated
x <- seq(0, 4, 0.01)
y <- seq(0, 4, 0.01)
z <- seq(0, 4, 0.01)
# Compute the distance from origin
distance <- function(x,y,z) {
sqrt(x^2 + y^2 + z^2)
}
# Matrix to store the results
slice=matrix(0, nrow=length(x),ncol=length(y))
# Fill the matrix with a slice at z=3
i=1
for (y_val in y)
{
slice[,i]=distance(x,y_val,3)
i=i+1
}
# PLot with plot3D library
require(plot3D)
persp3D(z = slice, theta = 100,phi=50)
# PLot with raster library
library(raster)
plot(raster(slice,xmn=min(x), xmx=max(x), ymn=min(y), ymx=max(y)))
If you change your z values, you will not really change the shape (just making it "flatter" for bigger z). Note that the function being symmetric in x, y and z, the same plots are produced if you keep xor y constant.
For your last question about the maximum, you can re-use the slice matrix and do:
max_ind=which(slice==max(slice),arr.ind = TRUE)
x[max_ind[,1]]
y[max_ind[,2]]
(see Get the row and column name of the minimum element of a matrix)
But again with math we can see from your equation that the maximum will always be obtained by maxing x, y and z. Indeed, the function simply measure the distance from the origin.

Octave: testing functions over a Range in Lists or like with For-loops?

I want to see the values of the function y(x) with different values of x, where y(x) < 5:
y = abs(x)+abs(x+2)-5
How can I do it?
fplot(#(x) abs(x)+abs(x+2)-5, [-10 10])
hold all
fplot(#(x) 5, [-10 10])
legend({'y=abs(x)+abs(x+2)-5' 'y=5'})
You can just create a vector of values x, then look at the vector y, or plot it:
% Create a vector with 20 equally spaced entries in the range [-1,1].
x = linspace(-1,1,20);
% Calculate y, and see the result
y = abs(x) + abs(x + 2) - 5
% Plot.
plot(x, y);
% View the minimum and maximum.
min(y)
max(y)
If you limit x to the range [-6 4], it will ensure that y will be limited to less than or equal to 5. In MATLAB, you can then plot the function using FPLOT (like Amro suggested) or LINSPACE and PLOT (like Peter suggested):
y = #(x) abs(x)+abs(x+2)-5; % Create a function handle for y
fplot(y,[-6 4]); % FPLOT chooses the points at which to evaluate y
% OR
x = linspace(-6,4,100); % Create 100 equally-spaced points from -6 to 4
plot(x,y(x)); % PLOT will plot the points you give it
% create a vector x with values between 1 and 5
x = 1:0.1:5;
% create a vector y with your function
y = abs(x) + abs(x+2) - 5;
% find all elements in y that are under 5
y(find(y<5))

Resources