How would I recreate the following plot in R?
I'm struggling with generating data, given that the Z-coordinate is determined with the following equation:
where
A vectorized solution would be good, and a 3D, interactive plot would be even better.
I have the following:
## generate the data points from a multivariate normal
library(MASS)
library(ggplot2)
Sigma <- matrix(c(10,3,3,2),2,2)
set.seed(1)
df <- data.frame(mvrnorm(n=100,mu=c(10,10),Sigma=Sigma)) # X1=x, X2=y
theta0 = theta0 <- seq(-5,5,by=0.5)
theta1 <- seq(-5,5,by=0.5)
z = NULL
m <- theta1
Here's a prototype to get you up and running using the rgl packages. If you want other interactivity that rotating then something else needs to be pursued. Also, some of the stuff is hard-coded below (variables names and df) so that could be improved
library(MASS)
library(ggplot2)
Sigma <- matrix(c(10,3,3,2),2,2)
set.seed(1)
df <- data.frame(mvrnorm(n=100,mu=c(10,10),Sigma=Sigma)) # X1=x, X2=y
theta0 = theta0 <- seq(-5,5,by=0.5)
theta1 <- seq(-5,5,by=0.5)
# Produce J
f <- Vectorize(function(t0, t1) { sum((t0 + t1*df$X1 - df$X2)^2)})
z <- outer(theta0, theta1, f)
# Get the rgl library and plot
library(rgl)
persp3d(theta0, theta1, z, col="lightgray", smooth=TRUE)
Related
I was trying to do some manual calculations of knn regression and came across this unusual error. The predicted values done by hand do not match with the ones I got from the 'knnreg' function in the 'caret' package. So I used another package (FNN) as a second check and discovered that my manual calculations do agree with the ones from the FNN package. So I'm really confused now. Here is an example code:
# caret vs. FNN packages
# issue in predictions
library(caret)
library(FNN)
library(dbscan)
n <- 100
x <- rnorm(n)
y <- 2 + 3*x + rnorm(n, sd = 0.5)
x <- as.matrix(x)
# using caret
knn_caret <- knnreg(x, y, k = 5)
yhat_caret <- predict(knn_caret, newdata = x)
# using FNN
knn_FNN <- knn.reg(train = x, y = y, k = 5)
yhat_FNN <- knn_FNN$pred
# manual calculation using the neighbors.
# choose a point
i <- 3
nn <- kNN(x, k = 5) #using the caret package
neighbors <- nn$id[i, ]
mean(y[neighbors]) # manual calculation
yhat_FNN[i] # FNN package
yhat_caret[i] # caret package
If you can point to any mistake that I may have made in my code or any thoughts on this issue is greatly appreciated.
In R, I need to create a raster of probabilities of 4 rasters (distance from road, slope, grass cover and tree cover). For each of these I have created a formula to calculate a weight. Unfortunately I cannot share the data. This function below is what I have tried to do so far but it is not working yet. It gives the error: Non-numeric argument to mathematical function. Any recommendations?
probabilities_raster <- function(tc, gc, road, slp){
# Create structure to hold data
propxy_raster <- raster(ncol=100, nrow=100)
ncell(propxy_raster)
treecover <- (dnorm(tc, mean=0.7, sd=0.1))/(dnorm(0.7, mean=0.7, sd=0.1)) # not working
grasscover <- (dnorm(gc, mean=0.3, sd=0.1))/(dnorm(0.3, mean=0.3, sd=0.1)) # not working
road <- pnorm(-2+4*road) # not working
slope <- exp(-10*slp) # this one works
# Calculate weight
weight <- treecover * grasscover * road * slope
propxy_raster <- weight
return(propxy_raster)
}
raster_1 <- probabilities_raster(tc=raster_treecover, gc=raster_grasscover, road=raster_road, slp=
raster_slope)
Here is a minimal, self-contained reproducible example. Minimal is also important, because your questions really should be:
"How can I use dnorm with a RasterLayer?"
library(raster)
tc <- raster()
values(tc) <- runif(ncell(tc))
x <- dnorm(tc, mean=0.7, sd=0.1)
#Error in dnorm(tc, mean = 0.7, sd = 0.1) :
# Non-numeric argument to mathematical function
I think what you are looking for is
x <- calc(tc, function(i) dnorm(i, 0.7, 0.3))
And with "terra" that would be
library(terra)
tc <- rast()
values(tc) <- runif(ncell(tc))
x <- app(tc, \(i) dnorm(i, 0.7, 0.3))
I am using the package ks for kernel density estimation. Here's an easy example:
n <- 70
x <- rnorm(n)
library(ks)
f_kde <- kde(x)
I am actually interested in the respective exceeding probabilities of my input data, which can be easily returned by ks having f_kde:
p_kde <- pkde(x, f_kde)
This is done in ks with a numerical integration using Simpson's rule. Unfortunately, they only implemented this for a 1d case. In a bivariate case, there's no implementation in ks of any method for returning the probabilities :
y <- rnorm(n)
f_kde <- kde(data.frame(x,y))
# does not work, but it's what I am looking for:
p_kde <- pkde(data.frane(x,y), f_kde)
I couldnt find any package or help searching in stackoverflow to solve this issue in R (some suggestions for Python exist, but I would like to keep it in R). Any line of code or package recommendation is appreciated. Even though I am mostly interested in the bivariate case, any ideas for a multivariate case are appreciated as well.
kde allows multidimensional kernel estimate, so we could use kde to calculate pkde.
For this, we calculate kde on small enough dx and dy steps using eval.points parameter : this gives us the local density estimate on a dx*dy
square.
We verify that the sum of estimates mutiplied by the surface of the squares almost equals 1:
library(ks)
set.seed(1)
n <- 10000
x <- rnorm(n)
y <- rnorm(n)
xy <- cbind(x,y)
xmin <- -10
xmax <- 10
dx <- .1
ymin <- -10
ymax <- 10
dy <- .1
pts.x <- seq(xmin, xmax, dx)
pts.y <- seq(ymin, ymax, dy)
pts <- as.data.frame(expand.grid(x = pts.x, y = pts.y))
f_kde <- kde(xy,eval.points=pts)
pts$est <- f_kde$estimate
sum(pts$est)*dx*dy
[1] 0.9998778
You can now query the pts dataframe for the cumulative probability on the area of your choice :
library(data.table)
setDT(pts)
# cumulative density
pts[x < 1 & y < 2 , .(pkde=sum(est)*dx*dy)]
pkde
1: 0.7951228
# average density around a point
tolerance <-.1
pts[pmin(abs(x-1))<tolerance & pmin(abs(y-2))<tolerance, .(kde = mean(est))]
kde
1: 0.01465478
I calculate the cosine similarity with cosine() from the package 'lsa'. Here with three test vectors:
d <- data.frame(c(-1,1,0,-1,1,1,-1,1,0),c(-1,1,1,1,-1,1,-1,0,1),c(0,0,1,0,-1,-1,0,1,-1))
colnames(d) <- c("vector1","vector2","vector3")
d_dist <- cosine(as.matrix(d))
Now, I want to do dimensionality reduction with cmdscale and after that plot it as a scatterplot:
fit <- cmdscale(d_dist,k=2)
x <- fit[,2]
y <- fit[,1]
plot(x,y)
But I always get the warning In cmdscale (d_dist, k = 2): only 0 of the first 2 eigenvalues are> 0 [translated from German] and an empty fit object.
What am I doing wrong? Thank you so much for your help!
The input should be a distance matrix. E.g.:
d_dist <- 1-d_dist
fit <- cmdscale(d_dist,k=2)
x <- fit[,2]
y <- fit[,1]
plot(x,y)
I am wondering: is it ever necessary to redefine your own columns when kriging? The error below seems to indicate this:
Warning: singular model in variogram fit
> sk1 <- krige(formula=Zs~1, locations=~Xs+Ys, data=sampled, newdata=pred.grid, model=fit.sph, beta=0)
Error in `[.data.frame`(object, , -coord.numbers, drop = FALSE) :
undefined columns selected
Is there a problem that I'm not seeing? Or, do I need to define my own columns? Thanks.
The following program is completely reproducable and runnable from here down:
library(gstat)
x <- seq(0,2000,by=20)
y <- seq(0,2000,by=20)
x = sample(x,10,replace=T)
y = sample(y,10,replace=T)
z = sample(0.532:3.7,10,replace=T)
samples = data.frame(x,y,z)
# detrend the samples:
print(mean(samples$z))
#create object of class gstat
h <- gstat(formula=z~1, locations=~x+y, data=samples)
samples.vgm <- variogram(h) # create method of class "gstatVariogram"
plot(samples.vgm,main='Variogram of Samples NOT detrended') # plot method for class "gstatVariogram"
# DETREND
z = samples$z
x = samples$x
y = samples$y
trend <- lm(z~x+y)
c = trend$coefficients[[1]]
a = trend$coefficients[[2]]
b = trend$coefficients[[3]]
#z_prime = z - (a*x + b*y +c)
# SUBTRACT THE PREDICTED LINE
Xs <- c()
Ys <- c()
Zs <- c()
print('started the loop')
for (i in 1:nrow(samples)){
i = samples[i,]
x=i$x
y=i$y
z=i$z
z_prime = z - (a*x+b*y+c)
Xs <- c(Xs,x)
Ys <- c(Ys,y)
Zs <- c(Zs,z_prime)
}
sampled <- data.frame(Xs=Xs,Ys=Ys,Zs=Zs)
print(sampled)
print('the length of sampled is')
print(length(sampled[[1]]))
# "result" is the new dataset with Z's detrended
# print(levelplot(Zs~Xs+Ys,sampled))
# define the domain or kriging estimation
x <- seq(0,2000,by=20)
y <- seq(0,2000,by=20)
# make data frame with prediction locations
pred.grid <- data.frame(x=rep(x,times=length(y)),y=rep(y,each=length(x)))
#create object of class gstat
g <- gstat(formula=Zs~1, locations=~Xs+Ys, data=sampled)
sampled.vgm <- variogram(g) # create method of class "gstatVariogram"
plot(sampled.vgm,main='Variogram of Samples hopefully detrended') # plot method for class "gstatVariogram"
vg.sph <- vgm(psill=1.0,model='Sph', range = 500)
fit.sph <- fit.variogram(sampled.vgm, model = vg.sph)
sk1 <- krige(formula=Zs~1, locations=~Xs+Ys, data=sampled, newdata=pred.grid, model=fit.sph, beta=0)
Add library(gstat) to the top of your code, then it's reproducible.
To answer your question directly, the reason you receive an undefined columns selected error is because your newdata does not have the correct column names. The column names need to match the data column names, which are Xs and Ys in this case. Redefine the pred.grid to have columns Xs and Ys to solve your problem. I tested, and your code runs.
pred.grid <- data.frame(Xs=rep(x,times=length(y)),Ys=rep(y,each=length(x)))
As for other comments: Warning: singular model in variogram fit is a result of not being able to fit a model based on the sample semivariogram data. If you take a look at the plot of your data (below), it's very clear that no empirical function will be able to fit this. In your case, it is because you have only one point per bin (11 points total) so there's really not enough data to fit the semivariogram. Even reducing the number of bins, there still would not be enough data support to fit an empirical semivariogram.
Changing your number of samples to 500,
x = sample(x,500,replace=T)
y = sample(y,500,replace=T)
z = sample(0.532:3.7,500,replace=T)
it becomes very clear that the data you are generating are uncorrelated such that samples closer to one another in x-y space are not more similar than samples farther away (pure nugget semivariogram). Is this what you wanted?