Interpolating a path/curve within R - r

Within R, I want to interpolate an arbitrary path with constant distance
between interpolated points.
The test-data looks like that:
require("rgdal", quietly = TRUE)
require("ggplot2", quietly = TRUE)
r <- readOGR(".", "line", verbose = FALSE)
coords <- as.data.frame(r#lines[[1]]#Lines[[1]]#coords)
names(coords) <- c("x", "y")
print(coords)
x y
-0.44409 0.551159
-1.06217 0.563326
-1.09867 0.310255
-1.09623 -0.273754
-0.67283 -0.392990
-0.03772 -0.273754
0.63633 -0.015817
0.86506 0.473291
1.31037 0.998899
1.43934 0.933198
1.46854 0.461124
1.39311 0.006083
1.40284 -0.278621
1.54397 -0.271321
p.orig <- ggplot(coords, aes(x = x, y = y)) + geom_path(colour = "red") +
geom_point(colour = "yellow")
print(p.orig)
I tried different methods, none of them were really satisfying:
aspline (akima-package)
approx
bezierCurve
with the tourr-package I couldn't get started
aspline
aspline from the akima-package does some weird stuff when dealing with arbitrary paths:
plotInt <- function(coords) print(p.orig + geom_path(aes(x = x, y = y),
data = coords) + geom_point(aes(x = x, y = y), data = coords))
N <- 50 # 50 points to interpolate
require("akima", quietly = TRUE)
xy.int.ak <- as.data.frame(with(coords, aspline(x = x, y = y, n = N)))
plotInt(xy.int.ak)
approx
xy.int.ax <- as.data.frame(with(coords, list(x = approx(x, n = N)$y,
y = approx(y, n = N)$y)))
plotInt(xy.int.ax)
At first sight, approx looks pretty fine; however, testing it with real data gives me
problems with the distances between the interpolated points. Also a smooth, cubic interpolation would be a nice thing.
bezier
Another approach is to use bezier-curves; I used the following
implementation
source("bez.R")
xy.int.bz <- as.data.frame(with(coords, bezierCurve(x, y, N)))
plotInt(xy.int.bz)

How about regular splines using the same method you used for approx? Will that work on the larger data?
xy.int.sp <- as.data.frame(with(coords, list(x = spline(x)$y,
y = spline(y)$y)))

Consider using xspline or grid.xspline (the first is for base graphics, the second for grid):
plot(x,y, type='b', col='red')
xspline(x,y, shape=1)
You can adjust the shape parameter to change the curve, this example just plots the x spline, but you can also have the function return a set of xy coordinates that you would plot yourself.

Related

Calculate 2d spline path in R, with customizable number of interpolation points

I'd like calculate (not plot) 2d spline paths in R. There's an old question on that topic that suggests xspline(): Calculate a 2D spline curve in R
xspline() somewhat works for my purpose, but has important limitations:
I cannot customize the number of interpolation points
I need to call plot.new(), even if I don't want it to draw anything
I only have a single parameter (shape) to customize the spline; I'd like to be able to try a few more different types, if possible
Reproducible example:
library(ggplot2)
# control points
x <- c(.1, .5, .7, .8)
y <- c(.9, .6, .5, .1)
plot.new() # necessary for xspline(); would be great if it could be avoided
# how do I set the number of interpolation points?
# how do I modify the exact path (beyond shape parameter)?
path <- xspline(x, y, shape = 1, draw = FALSE)
# plot path (black) and control points (blue) with ggplot
ggplot(data = NULL, aes(x, y)) +
geom_point(data = as.data.frame(path), size = 0.5) +
geom_point(data = data.frame(x, y), size = 2, color = "blue")
Created on 2021-08-14 by the reprex package (v2.0.0)
Are there any easily available alternatives to xspline()?
It's not clear from your example, but base R's spline function might meet your needs. We can wrap it in a function to make it easier to use the output:
f <- function(x, y, n, method = "natural") {
new_x <- seq(min(x), max(x), length.out = n)
data.frame(x = new_x, y = spline(x, y, n = n, method = method)$y)
}
So the co-ordinates for 10 evenly spaced points along the curve can be obtained like this:
f(x, y, 10)
#> x y
#> 1 0.1000000 0.9000000
#> 2 0.1777778 0.8042481
#> 3 0.2555556 0.7173182
#> 4 0.3333333 0.6480324
#> 5 0.4111111 0.6052126
#> 6 0.4888889 0.5976809
#> 7 0.5666667 0.6222222
#> 8 0.6444444 0.6013374
#> 9 0.7222222 0.4303155
#> 10 0.8000000 0.1000000
And we can show the shape of the curve like this:
ggplot(data = NULL, aes(x, y)) +
geom_point(data = f(x, y, 100), size = 0.5) +
geom_point(data = data.frame(x, y), size = 2, color = "blue")
You can change the method argument to get different shapes - the options are listed in ?spline
EDIT
To use spline on paths, simply create splines on x and y separately. These can be as a function of another variable t, or this can be left out if you want to assume equal time spacing on the path:
f2 <- function(x, y, t = seq_along(x), n, method = "natural") {
new_t <- seq(min(t), max(t), length.out = n)
new_x <- spline(t, x, xout = new_t, method = method)$y
new_y <- spline(t, y, xout = new_t, method = method)$y
data.frame(t = new_t, x = new_x, y = new_y)
}
x <- rnorm(10)
y <- rnorm(10)
ggplot(data = NULL, aes(x, y)) +
geom_point(data = f2(x, y, n = 1000), size = 0.5) +
geom_point(data = data.frame(x, y), size = 2, color = "blue")
Created on 2021-08-14 by the reprex package (v2.0.0)
The xsplinePoints()from the {grid} package allows you to convert a xsplineGrob object to xy coordinates. One solution might be to wrap these functions to return points along an x-spline.
library(grid)
splines <- function(x, y, shape = 1, ..., density = 1) {
# Density controls number of points, though not the exact number
xs <- xsplineGrob(x * density, y * density, shape = shape, ...,
default.units = "inches")
# xsplinePoints seem to always return inches
xy <- xsplinePoints(xs)
# Drop units
xy <- lapply(xy, convertUnit, unitTo = "inches", valueOnly = TRUE)
data.frame(x = xy$x / density, y = xy$y / density)
}
I presume xsplinePoints() makes some behind the scenes calculation based on the size of the graphics device where smaller devices need less points.
The idea behind the density parameter is to let you (indirectly) control how many points are returned by artifically inflating the dimensions before handing the data to grid, and then deflating before returning to the user.
To compare with your example:
library(ggplot2)
# control points
x <- c(.1, .5, .7, .8)
y <- c(.9, .6, .5, .1)
plot.new()
path <- xspline(x, y, shape = 1, draw = FALSE)
# plot path (black) and control points (blue) with ggplot
ggplot(data = NULL, aes(x, y)) +
geom_point(data = as.data.frame(path), size = 0.5) +
geom_point(data = data.frame(x, y), size = 2, color = "blue") +
# density = 1 (red) and density = 3 (green)
geom_point(data = splines(x, y), colour = "red") +
geom_point(data = splines(x, y, density = 3), colour = "green")
Created on 2021-08-14 by the reprex package (v1.0.0)

geom_smooth() with median instead of mean

I am building a plot with ggplot. I have data where y is mostly independent of X, but I randomly have a few extreme values of Y at low values of X. Like this:
set.seed(1)
X <- rnorm(500, mean=5)
y <- rnorm(500)
y[X < 3] <- sample(c(0, 1000), size=length(y[X < 3]),prob=c(0.9, 0.1),
replace=TRUE)
I want to make the point that the MEDIAN y-value is still constant over X values. I can see that this is basically true here:
mean(y[X < 3])
median(y[X < 3])
If I make a geom_smooth() plot, it does mean, and is very affected by outliers:
ggplot(data=NULL, aes(x=X, y=y)) + geom_smooth()
I have a few potential fixes. For example, I could first use group_by/summarize to make a dataset of binned medians and then plot that. I would rather NOT do this because in my real data I have a lot of facetting and grouping variables, and it would be a lot to keep track of (non-ideal). A lot plot definitely looks better, but log does not have nice interpretation in my application (median does have nice interpretation)
ggplot(data=NULL, aes(x=X, y=y)) + geom_smooth() +
scale_y_log10()
Finally, I know about geom_quantile but I think I'm using it wrong. Is there a way to add an error bar? Also- this geom_quantile plot looks way too smooth, and I don't understand why it is sloping down. Am I using it wrong?
ggplot(data=NULL, aes(x=X, y=y)) +
geom_quantile(quantiles=c(0.5))
I realize that this problem probably has a LOT of workarounds, but if possible I would love to use geom_smooth and just provide an argument that tells it to use a median. I want geom_smooth for a side-by-side comparison with consistency. I want to put the mean and median geom_smooths side-by-side to show "hey look, super strong pattern between Y and X is driven by a few large outliers, if we look only at median the pattern disappears".
Thanks!!
You can create your own method to use in geom_smooth. As long as you have a function that produces an object on which the predict generic works to take a data frame with a column called x and translate into appropriate values of y.
As an example, let's create a simple model that interpolates along a running median. We wrap it in its own class and give it its own predict method:
rolling_median <- function(formula, data, n_roll = 11, ...) {
x <- data$x[order(data$x)]
y <- data$y[order(data$x)]
y <- zoo::rollmedian(y, n_roll, na.pad = TRUE)
structure(list(x = x, y = y, f = approxfun(x, y)), class = "rollmed")
}
predict.rollmed <- function(mod, newdata, ...) {
setNames(mod$f(newdata$x), newdata$x)
}
Now we can use our method in geom_smooth:
ggplot(data = NULL, aes(x = X, y = y)) +
geom_smooth(formula = y ~ x, method = "rolling_median", se = FALSE)
Now of course, this doesn't look very "flat", but it is way flatter than the line calculated by the loess method of the standard geom_smooth() :
ggplot(data = NULL, aes(x = X, y = y)) +
geom_smooth(formula = y ~ x, color = "red", se = FALSE) +
geom_smooth(formula = y ~ x, method = "rolling_median", se = FALSE)
Now, I understand that this is not the same thing as "regressing on the median", so you may wish to explore different methods, but if you want to get geom_smooth to plot them, this is how you can go about it. Note that if you want standard errors, you will need to have your predict function return a list with members called fit and se.fit
Here's a modification of #Allan's answer that uses a fixed x window rather than a fixed number of points. This is useful for irregular time series and series with multiple observations at the same time (x value). It uses a loop so it's not very efficient and will be slow for larger data sets.
# running median with time window
library(dplyr)
library(ggplot2)
library(zoo)
# some irregular and skewed data
set.seed(1)
x <- seq(2000, 2020, length.out = 400) # normal time series, gives same result for both methods
x <- sort(rep(runif(40, min = 2000, max = 2020), 10)) # irregular and repeated time series
y <- exp(runif(length(x), min = -1, max = 3))
data <- data.frame(x = x, y = y)
# ggplot(data) + geom_point(aes(x = x, y = y))
# 2 year window
xwindow <- 2
nwindow <- xwindow * length(x) / 20 - 1
# rolling median
rolling_median <- function(formula, data, n_roll = 11, ...) {
x <- data$x[order(data$x)]
y <- data$y[order(data$x)]
y <- zoo::rollmedian(y, n_roll, na.pad = TRUE)
structure(list(x = x, y = y, f = approxfun(x, y)), class = "rollmed")
}
predict.rollmed <- function(mod, newdata, ...) {
setNames(mod$f(newdata$x), newdata$x)
}
# rolling time window median
rolling_median2 <- function(formula, data, xwindow = 2, ...) {
x <- data$x[order(data$x)]
y <- data$y[order(data$x)]
ys <- rep(NA, length(x)) # for the smoothed y values
xs <- setdiff(unique(x), NA) # the unique x values
i <- 1 # for testing
for (i in seq_along(xs)){
j <- xs[i] - xwindow/2 < x & x < xs[i] + xwindow/2 # x points in this window
ys[x == xs[i]] <- median(y[j], na.rm = TRUE) # y median over this window
}
y <- ys
structure(list(x = x, y = y, f = approxfun(x, y)), class = "rollmed2")
}
predict.rollmed2 <- function(mod, newdata, ...) {
setNames(mod$f(newdata$x), newdata$x)
}
# plot smooth
ggplot(data) +
geom_point(aes(x = x, y = y)) +
geom_smooth(aes(x = x, y = y, colour = "nwindow"), formula = y ~ x, method = "rolling_median", se = FALSE, method.args = list(n_roll = nwindow)) +
geom_smooth(aes(x = x, y = y, colour = "xwindow"), formula = y ~ x, method = "rolling_median2", se = FALSE, method.args = list(xwindow = xwindow))
Created on 2022-01-05 by the reprex package (v2.0.1)

Filling parts of a contour plot in R

I have made a contour plot in R with the following code:
library(mvtnorm)
# Define the parameters for the multivariate normal distribution
mu = c(0,0)
sigma = matrix(c(1,0.2,0.2,3),nrow = 2)
# Make a grid in the x-y plane centered in mu, +/- 3 standard deviations
xygrid = expand.grid(x = seq(from = mu[1]-3*sigma[1,1], to = mu[1]+3*sigma[1,1], length.out = 100),
y = seq(from = mu[2]-3*sigma[2,2], to = mu[2]+3*sigma[2,2], length.out = 100))
# Use the mvtnorm library to calculate the multivariate normal density for each point in the grid
distribution = as.matrix(dmvnorm(x = xygrid, mean = mu, sigma = sigma))
# Plot contours
df = as.data.frame(cbind(xygrid, distribution))
myPlot = ggplot() + geom_contour(data = df,geom="polygon",aes( x = x, y = y, z = distribution))
myPlot
I want to illustrate cumulative probability by shading/colouring certain parts of the plot, for instance everything in the region {x<0, y<0} (or any other self defined region).
Is there any way of achieving this in R with ggplot?
So you are able to get the coordinates used to draw the circles in the plot using ggplot_build. Subsequently you could try to use these coordinates in combination with geom_polygon to shade a particular region. My best try:
library(dplyr)
data <- ggplot_build(myPlot)$data[[1]]
xCoor <- 0
yCoor <- 0
df <- data %>% filter(group == '-1-001', x <= xCoor, y <= yCoor) %>% select(x,y)
# Insert the [0,0] coordinate in the right place
index <- which.max(abs(diff(rank(df$y))))
df <- rbind( df[1:index,], data.frame(x=xCoor, y=yCoor), df[(index+1):nrow(df),] )
myPlot + geom_polygon(data = df, aes(x=x, y=y), fill = 'red', alpha = 0.5)
As you can see it's not perfect because the [x,0] and [0,y] coordinates are not included in the data, but it's a start.

How to plot loess surface with ggplot

i have this code and i create a loess surface of my dataframe.
library(gstat)
library(sp)
x<-c(0,55,105,165,270,65,130,155,155,225,250,295,
30,100,110,135,160,190,230,300,30,70,105,170,
210,245,300,0,85,175,300,15,60,90,90,140,210,
260,270,295,5,55,55,90,100,140,190,255,285,270)
y<-c(305,310,305,310,310,260,255,265,285,280,250,
260,210,240,225,225,225,230,210,215,160,190,
190,175,160,160,170,120,135,115,110,85,90,90,
55,55,90,85,50,50,25,30,5,35,15,0,40,20,5,150)
z<-c(870,793,755,690,800,800,730,728,710,780,804,
855,813,762,765,740,765,760,790,820,855,812,
773,812,827,805,840,890,820,873,875,873,865,
841,862,908,855,850,882,910,940,915,890,880,
870,880,960,890,860,830)
dati<-data.frame(x,y,z)
x.range <- as.numeric(c(min(x), max(x)))
y.range <- as.numeric(c(min(y), max(y)))
meuse.loess <- loess(z ~ x * y, dati, degree=2, span = 0.25,
normalize=F)
meuse.mar <- list(x = seq(from = x.range[1], to = x.range[2], by = 1), y = seq(from = y.range[1],
to = y.range[2], by = 1))
meuse.lo <- predict(meuse.loess, newdata=expand.grid(meuse.mar), se=TRUE)
Now I want to plot meuse.lo[[1]] with ggplot2 function... but i don't know how to convert meuse.lo[[1]] in a dataframe with x,y (grid's coordinates) and z (interpolated value) columns. Thanks.
Your problem here is that loess() returns a matrix if you use grid.expand() to generate the new data for loess().
This is mentioned in the help for ?loess.predict:
If newdata was the result of a call to expand.grid, the predictions (and s.e.'s if requested) will be an array of the appropriate dimensions.
Now, you can still use grid.expand() to compute the new data, but force this function to return a data frame and dropping the attributes.
From ?grid.expand:
KEEP.OUT.ATTRS: a logical indicating the "out.attrs" attribute (see below) should be computed and returned.
So, try this:
nd <- expand.grid(meuse.mar, KEEP.OUT.ATTRS = FALSE)
meuse.lo <- predict(meuse.loess, newdata=nd, se=TRUE)
# Add the fitted data to the `nd` object
nd$z <- meuse.lo$fit
library(ggplot2)
ggplot(nd, aes(x, y, col = z)) +
geom_tile() +
coord_fixed()
The result:
ggplot2 is probably not the best choice for 3d graphs. However here is an easy solution with rgl
library(rgl)
plot3d(x, y, z, type="s", size=0.75, lit=FALSE,col="red")
surface3d(meuse.mar[[1]], meuse.mar[[2]], meuse.lo[[1]],
alpha=0.4, front="lines", back="lines")

How to plot the intersection of a hyperplane and a plane in R

I have a set of (2-dimensional) data points that I run through a classifier that uses higher order polynomial transformations. I want to visualize the results as a 2 dimensional scatterplot of the points with the classifier superimbosed on top, preferably using ggplot2 as all other visualizations are made by this. Pretty much like this one that was used in the ClatechX online course on machine learning (the background color is optional).
I can display the points with colors and symbols and all, that's easy but I can't figure out how to draw anything like the classifiers (the intersection of the classifiing hyperplane with the plane representing my threshold). The only thing I found was stat_function and that only takes a function with a single argument.
Edit:
The example that was asked for in the comments:
sample data:
"","x","y","x","x","y","value"
"1",4.17338115745224,0.303530843229964,1.26674990184152,17.4171102853774,0.0921309727918932,-1
"2",4.85514814266935,3.452660451876,16.7631779801937,23.5724634872656,11.9208641959486,1
"3",3.51938610081561,3.41200957307592,12.0081790673332,12.3860785266141,11.6418093267617,1
"4",3.18545089452527,0.933340128976852,2.97310914874565,10.1470974014319,0.87112379635852,-16
"5",2.77556006214581,2.49701633118093,6.93061880335166,7.70373365857888,6.23509055818427,-1
"6",2.45974169578403,4.56341833807528,11.2248303614692,6.05032920997851,20.8247869282818,1
"7",2.73947941488586,3.35344674880616,9.18669833727041,7.50474746458339,11.2456050970786,-1
"8",2.01721803518012,3.55453519499861,7.17027250203368,4.06916860145595,12.6347204524838,-1
"9",3.52376445778646,1.47073399974033,5.1825201951431,12.4169159539591,2.1630584979922,-1
"10",3.77387718763202,0.509284208528697,1.92197605658768,14.2421490273294,0.259370405056702,-1
"11",4.15821685106494,1.03675272315741,4.31104264382058,17.2907673804804,1.0748562089743,-1
"12",2.57985028671101,3.88512040604837,10.0230289934507,6.65562750184287,15.0941605694935,1
"13",3.99800728890114,2.39457673509605,9.5735352407471,15.9840622821066,5.73399774026327,1
"14",2.10979392635636,4.58358959294856,9.67042948411309,4.45123041169019,21.0092935565863,1
"15",2.26988795562647,2.96687697409652,6.73447830932721,5.15239133109813,8.80235897942413,-1
"16",1.11802248633467,0.114183261757717,0.127659454208164,1.24997427994995,0.0130378172656312,-1
"17",0.310411276295781,2.09426849964075,0.650084557879535,0.0963551604515758,4.38596054858751,-1
"18",1.93197490065359,1.72926536411978,3.340897280049,3.73252701675543,2.99035869954433,-1
"19",3.45879891654477,1.13636834081262,3.93046958599847,11.9632899450912,1.29133300600123,-1
"20",0.310697768582031,0.730971727753058,0.227111284709427,0.0965331034018534,0.534319666774291,-1
"21",3.88408110360615,0.915658151498064,3.55649052359657,15.0860860193904,0.838429850404852,-1
"22",0.287852146429941,2.16121324687265,0.622109872005114,0.0828588582043242,4.67084269845782,-1
"23",2.80277011333965,1.22467750683427,3.4324895146344,7.85552030822994,1.4998349957458,-1
"24",0.579150241101161,0.57801398797892,0.334756940497835,0.335415001767533,0.334100170299295-,1
"25",2.37193428212777,1.58276639413089,3.7542178708388,5.62607223873297,2.50514945839009,-1
"26",0.372461311053485,2.51207412336953,0.935650421453748,0.138727428231681,6.31051640130279,-1
"27",3.56567220995203,1.03982002707198,3.70765737388213,12.7140183088242,1.08122568869998,-1
"28",0.634770628530532,2.26303249713965,1.43650656059435,0.402933750845047,5.12131608311011,-1
"29",2.43812176748179,1.91849716124125,4.67752968967431,5.94443775306852,3.68063135769073,-1
"30",1.08741064323112,3.01656032912433,3.28023980783858,1.18246190701233,9.0996362192467,-1
"31",0.98,2.74,2.6852,0.9604,7.5076,1
"32",3.16,1.78,5.6248,9.9856,3.1684,1
"33",4.26,4.28,18.2328,18.1476,18.3184,-1
The code to generate a classifier:
perceptron_train <- function(data, maxIter=10000) {
set.seed(839)
X <- as.matrix(data[1:5])
Y <- data["value"]
d <- dim(X)
X <- cbind(rep(1, d[1]), X)
W <- rep(0, d[2] + 1)
count <- 0
while (count < maxIter){
H <- sign(X %*% W)
indexs <- which(H != Y)
if (length(indexs) == 0){
break
} else {
i <- sample(indexs, 1)
W <- W + 0.1 * (X[i,] * Y[i,])
}
count <- count + 1
point <- as.data.frame(data[i,])
plot_it(data, point, W, paste("plot", sprintf("%05d", count), ".png", sep=""))
}
W
}
The code to generate the plot:
plot_it <- function(data, point, weights, name = "plot.png") {
line <- weights_to_line(weights)
point <- point
png(name)
p = ggplot() + geom_point(data = data, aes(x, y, color = value, size = 2)) + theme(legend.position = "none")
p = p + geom_abline(intercept = line[2], slope = line[1])
print(p)
dev.off()
}
This was solved using material from the question and answers from Issues plotting a fitted SVM model's decision boundary using ggplot2's stat_contour(). I skipped the call to geom_point for the grid-entires and some of the aesthetical definitions like scale_fill_manual and scale_colour_manual. Removing the dots for the grid entries solved the problem with the vanishing contour-line in my case.
train_and_plot_svm <- function(train, kernel = "sigmoid", type ="C", cost, gamma) {
fit <- svm(as.factor(value) ~ x + y, data = train, kernel = kernel, type = type, cost = cost)
grid <- expand.grid (x = seq(from = -0.1, to = 15, length = 100), y = seq(from = -0.1, to = 15, length = 100))
decisionValues <- as.vector(attributes(predict(fit, grid, decision.values = TRUE))$decision)
p <- predict(fit, grid)
grid$value <- p
grid$z <- decisionValues
p <- ggplot() + stat_contour(data = grid, aes(x = x, y = y, z = z), breaks = c(0))
p <- p + geom_point(data = train, aes(x, y, colour = as.factor(value)), alpha = 0.7)
p <- p + xlim(0,15) + ylim(0,15) + theme(legend.position="none")
}
Note that this function doesn't return the result of the svm training but the ggplot2 object.
This is, what I got:

Resources