I'm trying to do bilinear interpolation from a grid to a set of irregular points, same as How can I get the value of a kernel density estimate at specific points?. However, when I try the accepted answer there I get a dimension mismatch error:
n <- 100
x <- rnorm(n)
y <- 3 + 2* x * rexp(n) + rnorm(n)
# add some outliers
y[sample(1:n,20)] <- rnorm(20,20,20)
DF <- data.frame(x,y)
# Calculate 2d density over a grid
library(MASS)
dens <- kde2d(x,y)
# create a new data frame of that 2d density grid
# (needs checking that I haven't stuffed up the order here of z?)
gr <- data.frame(with(dens, expand.grid(x,y)), as.vector(dens$z))
names(gr) <- c("x", "y", "z")
newdata=data.frame(xgr=x, ygr=y)
dens <- fields::interp.surface(gr, newdata)
# the interp.surface output is 2x as long as dim(newdata)[1]
length(dens)
dim(newdata)[1]
Any help appreciated, and apologies if this is trivial!
There's no need to create the gr object. Just pass dens directly to interp.surface:
newdata=data.frame(x=x, y=y)
newdata$z <- fields::interp.surface(dens, newdata)
Related
For example, let say:
x <- rnorm(20)
y <- rnorm(20) + 1
n <- seq(1,20,1)
data <- data.frame(n, x, y)
Is it possible to plot y~x with the indexed value of each pair at the top of the plot?
Can it be done with the base graphics, not ggplot?
It may be simple, but I am struggling to find help via Google. My guess is I'm using a poor selection of words.
Any help is much appreciated!
plot(x,y)
text(x = x, y = y, n, pos = 3)
#Adds text 'n' at co-ordinate (x,y)
# "pos = 3" means the text will be just above the co-ordinates
#See ?text for more
If you wanted to plot all the indices on a same line above the plot boundary, you can specify the appropriate value for y when using text. However, you will first have to pass par(xpd=TRUE) to be able to draw outside plot boundary
Yes we can add label. Try this code:
x <- rnorm(20)
y <- rnorm(20) + 1
n <- seq(1,20,1)
data <- data.frame(n, x, y)
plot(y~x)
with(data, text(y~x, labels = row.names(data)))
I have a three column data frame with latitude, longitude, and underground measurements as the columns. I am trying to figure out how to interpolate data points between the points I have (which are irregularly space) and then create a smooth surface plot of the entire area. I have tried to use the 'surface3d' function in the 'rgl' package but my result looks like a single giant spike. I have been able to plot the data with 'plot3d' but I need to take it a step further and fill in the blank spaces with interpolation. Any ideas or suggestions? I'm also open to using other packages, the rgl just seemed like the best fit at the time.
EDIT: here's an excerpt from my data (measurements of aquifer depth) :
lat_dd_NAD83 long_dd_NAD83 lev_va_ft
1 37.01030 -101.5006 288.49
2 37.03977 -101.6633 191.68
3 37.05201 -100.4994 159.34
4 37.06567 -101.3292 174.07
5 37.06947 -101.4561 285.08
6 37.10098 -102.0134 128.94
Just to add small but (maybe) important note about interpolation.
Using very nice package "akima" you can easily interpolate your data:
library(akima)
library(rgl)
# library(deldir)
# Create some fake data
x <- rnorm(100)
y <- rnorm(100)
z <- x^2 + y^2
# # Triangulate it in x and y
# del <- deldir(x, y, z = z)
# triangs <- do.call(rbind, triang.list(del))
#
# # Plot the resulting surface
# plot3d(x, y, z, type = "n")
# triangles3d(triangs[, c("x", "y", "z")], col = "gray")
n_interpolation <- 200
spline_interpolated <- interp(x, y, z,
xo=seq(min(x), max(x), length = n_interpolation),
yo=seq(min(y), max(y), length = n_interpolation),
linear = FALSE, extrap = TRUE)
x.si <- spline_interpolated$x
y.si <- spline_interpolated$y
z.si <- spline_interpolated$z
persp3d(x.si, y.si, z.si, col = "gray")
Spline - interpolated picture (200 steps)
With this package you can easily change amount of steps of interpolation, etc. You will need at least 10 (the more the better) points to get a reasonable spline interpolation with this package. Linear version works well regardless amount of points.
P.S. Thanks for user 2554330 - didn't knew about deldir, really useful thing in some cases.
You could use the deldir package to get a Delaunay triangulation of your points, then convert it to the form of data required by triangles3d for plotting. I don't know how effective this would be on a really large dataset, but it seems to work on 100 points:
library(deldir)
library(rgl)
# Create some fake data
x <- rnorm(100)
y <- rnorm(100)
z <- x^2 + y^2
# Triangulate it in x and y
del <- deldir(x, y, z = z)
triangs <- do.call(rbind, triang.list(del))
# Plot the resulting surface
plot3d(x, y, z, type = "n")
triangles3d(triangs[, c("x", "y", "z")], col = "gray")
EDITED to add:
The version of rgl on R-forge now has a function to make this easy. You can now produce a plot similar to the one above using
library(deldir)
library(rgl)
plot3d(deldir(x, y, z = z))
There is also a function to construct mesh3d objects from the deldir() output.
Is there any way to create lines in R connecting two points?
I am aware of lines(), function, but it creates line segment what I am looking for is an infinite length line.
Here's an example of Martha's suggestion:
set.seed(1)
x <- runif(2)
y <- runif(2)
# function
segmentInf <- function(xs, ys){
fit <- lm(ys~xs)
abline(fit)
}
plot(x,y)
segmentInf(x,y)
#define x and y values for the two points
x <- rnorm(2)
y <- rnorm(2)
slope <- diff(y)/diff(x)
intercept <- y[1]-slope*x[1]
plot(x, y)
abline(intercept, slope, col="red")
# repeat the above as many times as you like to satisfy yourself
Use segment() function.
#example
x1 <- stats::runif(5)
x2 <- stats::runif(5)+2
y <- stats::rnorm(10)
plot(c(x1,x2), y)
segments(x1, y[1:5], x2, y[6:10], col= 'blue')
I'm trying to (partially) reproduce the cluster plot available throught s.class(...) in package ade4 using ggplot, but this question is actually much more general.
NB: This question refers to "star plots", but really only discusses spider plots.
df <- mtcars[,c(1,3,4,5,6,7)]
pca <-prcomp(df, scale.=T, retx=T)
scores <-data.frame(pca$x)
library(ade4)
km <- kmeans(df,centers=3)
plot.df <- cbind(scores$PC1, scores$PC2)
s.class(plot.df, factor(km$cluster))
The essential feature I'm looking for is the "stars", e.g. a set of lines radiating from a common point (here, the cluster centroids) to a number of other points (here, the points in the cluster).
Is there a way to do that using the ggplot package? If not directly through ggplot, then does anyone know of an add-in that works. For example, there are several variations on stat_ellipse(...) which is not part of the ggplot package (here, and here).
This answer is based on #agstudy's response and the suggestions made in #Henrik's comment. Posting because it's shorter and more directly applicable to the question.
Bottom line is this: star plots are readily made with ggplot using geom_segment(...). Using df, pca, scores, and km from the question:
# build ggplot dataframe with points (x,y) and corresponding groups (cluster)
gg <- data.frame(cluster=factor(km$cluster), x=scores$PC1, y=scores$PC2)
# calculate group centroid locations
centroids <- aggregate(cbind(x,y)~cluster,data=gg,mean)
# merge centroid locations into ggplot dataframe
gg <- merge(gg,centroids,by="cluster",suffixes=c("",".centroid"))
# generate star plot...
ggplot(gg) +
geom_point(aes(x=x,y=y,color=cluster), size=3) +
geom_point(data=centroids, aes(x=x, y=y, color=cluster), size=4) +
geom_segment(aes(x=x.centroid, y=y.centroid, xend=x, yend=y, color=cluster))
Result is identical to that obtained with s.class(...).
The difficulty here is to create data not the plot itself. You should go through the code of the package and extract what it is useful for you. This should be a good start :
dfxy <- plot.df
df <- data.frame(dfxy)
x <- df[, 1]
y <- df[, 2]
fac <- factor(km$cluster)
f1 <- function(cl) {
n <- length(cl)
cl <- as.factor(cl)
x <- matrix(0, n, length(levels(cl)))
x[(1:n) + n * (unclass(cl) - 1)] <- 1
dimnames(x) <- list(names(cl), levels(cl))
data.frame(x)
}
wt = rep(1, length(fac))
dfdistri <- f1(fac) * wt
w1 <- unlist(lapply(dfdistri, sum))
dfdistri <- t(t(dfdistri)/w1)
## create a data.frame
cstar=2
ll <- lapply(seq_len(ncol(dfdistri)),function(i){
z1 <- dfdistri[,i]
z <- z1[z1>0]
x <- x[z1>0]
y <- y[z1>0]
z <- z/sum(z)
x1 <- sum(x * z)
y1 <- sum(y * z)
hx <- cstar * (x - x1)
hy <- cstar * (y - y1)
dat <- data.frame(x=x1, y=y1, xend=x1 + hx, yend=y1 + hy,center=factor(i))
})
dat <- do.call(rbind,ll)
library(ggplot2)
ggplot(dat,aes(x=x,y=y))+
geom_point(aes(shape=center)) +
geom_segment(aes(yend=yend,xend=xend,color=center,group=center))
I've created a nice plot using scatter3d() and Rcmdr. That plot contains two nice surface smooths. Now I'd like to add to this plot one more surface, the truth (i.e. the surface defined by the function generating my observations minus the noise component).
Here is my code so far:
library(car)
set.seed(1)
n <- 200 # number of observations (x,y,z) to be generated
sd <- 0.3 # standard deviation for error term
x <- runif(n) # generate x component
y <- runif(n) # generate y component
r <- sqrt(x^2+y^2) # used to compute z values
z_t <- sin(x^2+3*y^2)/(0.1+r^2) + (x^2+5*y^2)*exp(1-r^2)/2 # calculate values of true regression function
z <- z_t + rnorm(n, sd = sd) # overlay normally distrbuted 'noise'
dm <- data.frame(x=x, y=y, z=z) # data frame containing (x,y,z) observations
dm_t <- data.frame(x=x,y=y, z=z_t) # data frame containing (x,y) observations and the corresponding value of the *true* regression function
# Create 3D scatterplot of:
# - Observations (this includes 'noise')
# - Surface given by Additive Model fit
# - Surface given by bivariate smoother fit
scatter3d(dm$x, dm$y, dm$z, fit=c("smooth","additive"), bg="white",
axis.scales=TRUE, grid=TRUE, ellipsoid=FALSE, xlab="x", ylab="z", zlab="y")
The solution given in another thread is to then define a function:
my_surface <- function(f, n=10, ...) {
ranges <- rgl:::.getRanges()
x <- seq(ranges$xlim[1], ranges$xlim[2], length=n)
y <- seq(ranges$ylim[1], ranges$ylim[2], length=n)
z <- outer(x,y,f)
surface3d(x, y, z, ...)
}
f <- function(x, y)
sin(x^2+3*y^2)/(0.1+r^2) + (x^2+5*y^2)*exp(1-r^2)/2
my_surface(f, alpha=0.2)
This however yields an error, saying (translated from German since this is my system language, I apologize):
Error in outer(x, y, f) :
Dimension [Product 100] does not match the length of the object [200]
I then tried an alternative approach:
x <- seq(0,1,length=20)
y <- x
z <- outer(x,y,f)
surface3d(x,y,z)
This does add a surface to my plot but it doesn't look right at all (i.e. the observations are not even close to it). Here's what the supposed true surface looks like (this is obviously wrong):
Thanks!
I think the problem may in fact be scaling. Here I created a couple of points that sit on the plane z = x+y. Then I proceeded to try to plot that plane using my method above:
library(car)
n <- 50
x <- runif(n)
y <- runif(n)
z <- x+y
scatter3d(x,y,z, surface = FALSE)
f <- function(x,y)
x + y
x_grid <- seq(0,1, length=20)
y_grid <- x_grid
z_grid <- outer(x_grid, y_grid, f)
surface3d(x_grid, y_grid, z_grid)
This gives me the following plot:
Maybe one of you can help me out with this?
The scatter3d function in car rescales data before plotting it, which makes it incompatible with essentially all rgl plotting functions, including surface3d.
You can get a plot something like what you want by using all rgl functions, e.g. plot3d(x, y, z) in place of scatter3d, but of course it will have rgl-style axes rather than car-style axes.