Functions for Finding Out the Midpoint (Interpolation) - r

I am interested in solving this kind of problem (interpolation/midpoint) :
Suppose I have this kind of data (I randomly generated data, it might not exactly fit this set up):
x1 = runif(100)
x2 = x1+ runif(100)
y1 = runif(100)
y2 = y1 + runif(100)
z1 = y2 - (runif(100)/3)
interpolation_data = data.frame(x1, x2, y1, y2, z1)
> head(interpolation_data)
x1 x2 y1 y2 z1
1 0.98851249 1.8413195 0.5362105 0.6490267 0.4130697
2 0.87617261 0.9825167 0.5628463 0.9246420 0.7357799
3 0.05901131 0.4298880 0.2463369 1.1585020 1.0614156
4 0.26843176 0.6694523 0.9467490 1.1860874 1.0859276
5 0.86021705 1.4767606 0.9956568 1.2048599 1.1932450
6 0.19368972 0.2269496 0.7618794 0.9763212 0.8594290
I want to write to calculate a value of Z2 for each row. I did this myself:
interpolation_data$z2 = (interpolation_data$x1 + ((interpolation_data$z1 - interpolation_data$y1) * (interpolation_data$x2 - interpolation_data$x1)))/(interpolation_data$y2 - interpolation_data$y1)
Are there any built-in ways in R to do this? I tried looking for something and nothing came up that exactly matched this kind of problem.
Thank you!

Related

Is it mathematically possible to solve this problem?

x <- abs(rnorm(8))
C <- (x[1]*x[2]*x[3])^(1/3)
y <- log(x/C)
Is it mathematically possible to determine x[1:3] given you only have y? Here, x and y are always vectors of length 8. I should note that x is known for some of my dataset, which could be useful to find a solution for the other portion of the data where x is unknown. All of my code is implemented in R, so R code would be appreciated if this is solvable!
Defining f as
f <- function(x) {
C <- (x[1]*x[2]*x[3])^(1/3)
log(x/C)
}
we first note that if k is any scalar constant then f(x) and f(k*x) give the same result so if we have y = f(x) we can't tell whether y came from x or from k*x. That is, y could have come from any scalar multiple of x; therefore, we cannot recover x from y.
Linear formulation
Although we cannot recover x we can determine x up to a scalar multiple. Define the matrix A:
ones <- rep(1, 8)
a <- c(1, 1, 1, 0, 0, 0, 0, 0)
A <- diag(8) - outer(ones, a) / 3
in which case f(x) equals:
A %*% log(x)
Inverting formula
From this formula, given y and solving for x, the value of x would equal
exp(solve(A) %*% y) ## would equal x if A were invertible
if A were invertible but unfortunately it is not. For example, rowSums(A) equals zero which shows that the columns of A are linearly dependent which implies non-invertibility.
all.equal(rowSums(A), rep(0, 8))
## [1] TRUE
Rank and nullspace
Note that A is a projection matrix. This follows from the fact that it is idempotent, i.e. A %*% A equals A.
all.equal(A %*% A, A)
## [1] TRUE
It also follows from the fact that its eigenvalues are all 0 and 1:
zapsmall(eigen(A)$values)
## [1] 1 1 1 1 1 1 1 0
From the eigenvalues we see that A has rank 7 (the number of nonzero eigenvalues) and the dimension of the nullspace is 1 (the number of zero eigenvalues).
Another way to see this is that knowing that A is a projection matrix its rank equals its trace, which is 7, so its nullspace must have dimension 8-7=1.
sum(diag(A)) # rank of A
## [1] 7
Taking scalar multiples spans a one dimensional space so from the fact that the nullspace has dimension 1 it must be the entirely of the values that map into the same y.
Key formula
Now replacing solve in ## above with the generalized inverse, ginv, we have this key formula for our approximation to x given that y = f(x) for some x:
library(MASS)
exp(ginv(A) %*% y) # approximation to x accurate up to scalar multiple
or equivalently if y = f(x)
exp(y - mean(y))
While these do not give x they do determine x up to a scalar multiple. That is if x' is the value produced by the above expressions then x equals k * x' for some scalar constant k.
For example, using x and y from the question:
exp(ginv(A) %*% y)
## [,1]
## [1,] 1.2321318
## [2,] 0.5060149
## [3,] 3.4266146
## [4,] 0.1550034
## [5,] 0.2842220
## [6,] 3.7703442
## [7,] 1.0132635
## [8,] 2.7810703
exp(y - mean(y)) # same
## [1] 1.2321318 0.5060149 3.4266146 0.1550034 0.2842220 3.7703442 1.0132635
## [8] 2.7810703
exp(y - mean(y))/x
## [1] 2.198368 2.198368 2.198368 2.198368 2.198368 2.198368 2.198368 2.198368
Note
Note that y - mean(y) can be written as
B <- diag(8) - outer(ones, ones) / 8
B %*% y
and if y = f(x) then y must be in the range of A so we can verify that:
all.equal(ginv(A) %*% A, B %*% A)
## [1] TRUE
It is not true that the matrix ginv(A) equals B. It is only true that they act the same on the range of A which is all that we need.
No, it's not possible. You have three unknowns. That means you need three independent pieces of information (equations) to solve for all three. y gives you only one piece of information. Knowing that the x's are positive imposes a constraint, but doesn't necessarily allow you to solve. For example:
x1 + x2 + x3 = 6
Doesn't allow you to solve. x1 = 1, x2 = 2, x3 = 3 is one solution, but so is x1 = 1, x2 = 1, x3 = 4. There are many other solutions. [Imposing your "all positive" constraint would rule out solutions such as x1 = 100, x2 = 200, x3 = -294, but in general would leave more than one remaining solution.]
x1 + x2 + x3 = 6,
x1 + x2 - x3 = 0
Constrains x3 to be 3, but allows arbitrary solutions for x1 and x2, subject to x1 + x2 = 3.
x1 + x2 + x3 = 6,
x1 + x2 - x3 = 0,
x1 - x2 + x3 = 2
Gives the unique solution x1 = 1, x2 = 2, x3 = 3.

merge/cbind model matrices

This is a simplified version of my current problem. I need to create a model.matrix from 2 model matrices, without loosing the info in "assign". For example, consider data and formula
y<-rnorm(100); x1<-rnorm(100); x2<-rnorm(100); x3<-rnorm(100)
f1 <- y ~ x1 + x2 + x3
and 2 model matrices X1 and X2 created using
trms<-terms.formula(f1)
trms2<-drop.terms(trms, dropx = 2)
trms3<-drop.terms(trms, dropx = -2)
X1<-model.matrix(trms2)
X2<-model.matrix(trms3)
Is there an easy way to create from X1 and X2 a matrix X with 1 intercept column and with attr(,"assign") that would have been obtained from f1?
I'm not completly sure if this is what you are trying to do but cbind() seems to work fine in this case.
X <- cbind(X1, X2)
X <- X[, !duplicated(colnames(X))]
You can then concatenate the attributes from X1 and X2. In order not to get duplicates you can only take the assign info from X2 which isn't already present in X1:
attributes(X)$assign <- c(attr(X1,"assign"), attr(X2,"assign")[!attr(X2,"assign") %in% attr(X1,"assign")])
If this is not what you were trying to to let us know.
If I understand the question correctly, how about something simple and direct like:
X3 <- cbind(X1[,1:2], X2[,2], X1[,3])
attr(X3,"assign") <- c(0,1,2,3)
colnames(X3) <- c("Intercept",attr(trms, "term.labels"))
head(X3)
Intercept x1 x2 x3
1 1 -1.28372461 -0.2598796 0.3028496
2 1 0.56880875 0.2803302 0.7593734
3 1 -0.32480770 -1.6705911 -1.1750247
4 1 -1.02761734 -0.1405454 -0.6805033
5 1 0.84218452 -0.1224962 -1.3882420
6 1 0.07221231 0.5587801 -0.9042751

Plotting inequalities in r

I have a number of values that come from my data, e.g. x1, x2, x3 etc., and I'm trying to plot shaded regions that are dependent on functionals of these values. These regions are defined by inequalities of the form:
f(x1, x2) <= A <= f(x3, x4)
f(x5, x6) <= A + B <= f(x7, x8)
and I would like to have A on the x-axis, B on the y-axis.
I've tried putting the values in a data frame and playing around with ggplot, but my r skills are lacking and beyond simple line-plots/bar-charts etc. I'm a bit lost, I especially can't figure out the syntax for plotting inequalities as regions.
Thanks
Edit for example:
x1 x2 x3 x4 x5 x6
Plot1 0.2 0.3 0.24 0.14 0.17 0.31
Plot2 0.14 0.35 0.30 0.11 0.21 0.39
with the hope of plotting two separate graphs (one for Plot1, one for Plot2) with the regions defined by:
max(x2 + x3, x4 + x5) <= A <= min(1 - x1, 1 - x6)
max(x3, x5) <= A + B <= min(x2 + x3, x5 + x6)
I'm not aware of any build-in R function to do this. You can however make a data.frame with a grid and compute for each gridpoint if the inequalities are satisfied. Then you can plot with geom_tile. A working example:
## Equation 1: f1(A,B) <= A < f2(A,B)
## Equation 2: f3(A,B) < A + B < f4(A,B)
library(ggplot2)
grid <- expand.grid( A = seq(-2,2, length.out = 100), B = seq(-1, 1, length.out = 100))
f1 <- function(A, B) B
f2 <- function(A, B) 2*(A^2 + B^2)
f3 <- function(A, B) A*B
f4 <- function(A, B) 2*A+B^3
grid$inside_eq1 <- (f1(grid$A, grid$B) <= grid$A) & (grid$A < f2(grid$A, grid$B))
grid$inside_eq2 <- (f3(grid$A, grid$B) < grid$A + grid$B) & (grid$A + grid$B < f4(grid$A, grid$B))
grid$inside <- grid$inside_eq1 & grid$inside_eq2
ggplot(grid) +
geom_tile(aes(x = A, y = B, color = inside, fill = inside))
Edit: You asked in the comment how you could draw a border around the region. I think a general soultion is a bit complex, but in your case it's not so hard. Your functions in the inequalities are independent of A and B, they are constant and thus convex. Since the intersection of convex sets is again convex the region in which the inequalities are fulfilled is also convex. This leads to a solution which makes use of the convex hull of the region:
## if you want a border for the region (works only if all equations are convex!)
f1 <- function() 0.1
f2 <- function() 1.8
f3 <- function() -1.2
f4 <- function() 0.3
grid$inside_eq1 <- (f1() <= grid$A) & (grid$A < f2())
grid$inside_eq2 <- (f3() < grid$A + grid$B) & (grid$A + grid$B < f4())
grid$inside <- grid$inside_eq1 & grid$inside_eq2
hull <- chull(grid$A[grid$inside], grid$B[grid$inside])
ggplot(grid, aes(x = A, y = B)) +
geom_tile(aes(color = inside, fill = inside)) +
geom_path(data = grid[grid$inside, ][c(hull, hull[1]), ], size = 2)

Simulating dataset in R

I am trying to simulate a dataset and am a bit stuck on the following, as I'm a bit new to R. Here's the code I have so far:
set.seed(10)
n <- 300
x1a <- rnorm(100,1,2)
x1b <- rnorm(100,0,1)
x1c <- rnorm(100,1,0.5)
x1 <- c(x1a,x1b,x1c)
x2a <- rnorm(100,1,2)
x2b <- rnorm(100,1,1)
x2c <- rnorm(100,0,0.5)
x2 <- c(x2a,x2b,x2c)
This is what I wanted to create:
Dataset with 300 observations and three variables: x1, x2 and g.
g has 3 levels, with lvl 1 having observations 1-100, lvl2 having obs 101-200, and lvl 3 having obs 201-300.
x1 and x2 have the following factors:
the first 100 obs have mean 1 for x1, mean 1 for x2, sd of 2 for both;
the second 100 obs have mean 0 for x1, mean 1 for x2, sd of 1 for both;
the third 100 obs have mean 1 for x1, mean 0 for x2, sd of 0.5 for both.
I was able to create the first two vectors, x1 and x2 in my code above. However, when creating g, I'm not sure where to begin and how to incorporate that into the dataset. I am also not sure on how to combine everything to generate only one vector with the full 300 observations including all of these factors.
Any suggestions?
Thanks!

Difference Constraints

I'm studying for my algorithms final and I came across a practice problem that I can't seem to figure out. Here's the problem:
Consider the following set of difference constraints:
x1 - x4 <= 1
x2 - x1 <= 2
x2 - x4 <= 0
x3 - x1 <= 1
x4 - x1 <= -1
x4 - x3 <= -2
What is the solution that will be provided by the shortest path based algorithm for this set of constraints?
The answer that I got when I first did this was that x1 = 0, x2 = 0, x3 = -2, and x4 = -1. Unfortunately, this doesn't work because of the third constraint.
Can someone walk me through how to get the right answer using the shortest path based algorithm for this question? Thanks!

Resources