Plot two size vector length with xyplot - r

Please note that this is a simplified version (and therefore duplicate of my earlier post):
https://stackoverflow.com/questions/18358694/xyplot-2-separate-data-frame-lengths
It may well be that it contained too much information, however quite basic was asked.
So here again:
I would like to plot 2 columns of different length with xyplot (only xyplot please).
The data:
Data <- data.frame(var1=rnorm(10,0,1),prob=seq(0.023,0.365,length=10))
Long <- data.frame(var2=rnorm(20,2,3))
How I would plot the Long (var2) vector of length 20 onto the plot of "Data" where (prob~var1) is plotted first.

You can use approx() inside your call to xyplot to interpolate the values of Data$prob after passing Long$var2 as a separate variable in the call. Notice the custom prepanel plot to adjust the limits.
lattice::xyplot(prob ~ var1, data = Data, z = Long$var2,
xlab = "Var1, Var2",
prepanel = function(x, y, z, ...) {
list(xlim = range(x, z),
ylim = range(y))
},
panel = function(x, y, z, ...) {
b <- approx(y, n = length(z))$y
panel.xyplot(x, y, ...)
panel.xyplot(z, b, col = "orange", ...)
})
EDIT: Actually, it is much cleaner to just reshape the data first.
dd <- rbind(data.frame(var = "var1",
val = Data$var1,
prob = Data$prob),
data.frame(var = "var2",
val = Long$var2,
prob = approx(Data$prob, n = 20)$y))
xyplot(prob ~ val, data = dd, groups = var, auto.key = TRUE)

Related

Use lattice xyplot for grouped data with points and lines

I wish to make a single-panel graph in lattice that shows data (y) from several groups (g) with superimposed lines showing predicted values (y_pred). I generate example data below:
d <- data.frame(x = rep(1:100,2), g = factor(rep(c('a','b'), each = 100)))
d$y_pred <- -0.1*x + 0.001*x^2
d$y_pred <- with(d, ifelse(g=='a', y_pred+2,y_pred))
d$y <- d$y_pred + rnorm(nrow(d),0,1)
Using 'type=c('p','l'), distribute.type=TRUE does not work, nor does my attempt at making a panel:
xyplot(y + y_pred ~ x, data = d,
groups = g,
panel = panel.superpose,
panel.groups=function(...){
panel.xyplot(x, y, type='p')
panel.xyplot(x, y_pred, type='l')
}
)
What should I do here?
Ok, you can do this with latticeExtra
dat <- xyplot(y ~ x, data=d,
groups = g,
type="p"
)
dat
dat + layer(panel.xyplot(x=x, y=y_pred,
groups = g,
type="l",
subscripts=TRUE),
data=d)
But this is really finicky and non-robust. The 'dat + layer()' code will render properly if it is the first thing I run after starting the program, but after that it will frequently render with several of lines for different groups missing.

geom_smooth() with median instead of mean

I am building a plot with ggplot. I have data where y is mostly independent of X, but I randomly have a few extreme values of Y at low values of X. Like this:
set.seed(1)
X <- rnorm(500, mean=5)
y <- rnorm(500)
y[X < 3] <- sample(c(0, 1000), size=length(y[X < 3]),prob=c(0.9, 0.1),
replace=TRUE)
I want to make the point that the MEDIAN y-value is still constant over X values. I can see that this is basically true here:
mean(y[X < 3])
median(y[X < 3])
If I make a geom_smooth() plot, it does mean, and is very affected by outliers:
ggplot(data=NULL, aes(x=X, y=y)) + geom_smooth()
I have a few potential fixes. For example, I could first use group_by/summarize to make a dataset of binned medians and then plot that. I would rather NOT do this because in my real data I have a lot of facetting and grouping variables, and it would be a lot to keep track of (non-ideal). A lot plot definitely looks better, but log does not have nice interpretation in my application (median does have nice interpretation)
ggplot(data=NULL, aes(x=X, y=y)) + geom_smooth() +
scale_y_log10()
Finally, I know about geom_quantile but I think I'm using it wrong. Is there a way to add an error bar? Also- this geom_quantile plot looks way too smooth, and I don't understand why it is sloping down. Am I using it wrong?
ggplot(data=NULL, aes(x=X, y=y)) +
geom_quantile(quantiles=c(0.5))
I realize that this problem probably has a LOT of workarounds, but if possible I would love to use geom_smooth and just provide an argument that tells it to use a median. I want geom_smooth for a side-by-side comparison with consistency. I want to put the mean and median geom_smooths side-by-side to show "hey look, super strong pattern between Y and X is driven by a few large outliers, if we look only at median the pattern disappears".
Thanks!!
You can create your own method to use in geom_smooth. As long as you have a function that produces an object on which the predict generic works to take a data frame with a column called x and translate into appropriate values of y.
As an example, let's create a simple model that interpolates along a running median. We wrap it in its own class and give it its own predict method:
rolling_median <- function(formula, data, n_roll = 11, ...) {
x <- data$x[order(data$x)]
y <- data$y[order(data$x)]
y <- zoo::rollmedian(y, n_roll, na.pad = TRUE)
structure(list(x = x, y = y, f = approxfun(x, y)), class = "rollmed")
}
predict.rollmed <- function(mod, newdata, ...) {
setNames(mod$f(newdata$x), newdata$x)
}
Now we can use our method in geom_smooth:
ggplot(data = NULL, aes(x = X, y = y)) +
geom_smooth(formula = y ~ x, method = "rolling_median", se = FALSE)
Now of course, this doesn't look very "flat", but it is way flatter than the line calculated by the loess method of the standard geom_smooth() :
ggplot(data = NULL, aes(x = X, y = y)) +
geom_smooth(formula = y ~ x, color = "red", se = FALSE) +
geom_smooth(formula = y ~ x, method = "rolling_median", se = FALSE)
Now, I understand that this is not the same thing as "regressing on the median", so you may wish to explore different methods, but if you want to get geom_smooth to plot them, this is how you can go about it. Note that if you want standard errors, you will need to have your predict function return a list with members called fit and se.fit
Here's a modification of #Allan's answer that uses a fixed x window rather than a fixed number of points. This is useful for irregular time series and series with multiple observations at the same time (x value). It uses a loop so it's not very efficient and will be slow for larger data sets.
# running median with time window
library(dplyr)
library(ggplot2)
library(zoo)
# some irregular and skewed data
set.seed(1)
x <- seq(2000, 2020, length.out = 400) # normal time series, gives same result for both methods
x <- sort(rep(runif(40, min = 2000, max = 2020), 10)) # irregular and repeated time series
y <- exp(runif(length(x), min = -1, max = 3))
data <- data.frame(x = x, y = y)
# ggplot(data) + geom_point(aes(x = x, y = y))
# 2 year window
xwindow <- 2
nwindow <- xwindow * length(x) / 20 - 1
# rolling median
rolling_median <- function(formula, data, n_roll = 11, ...) {
x <- data$x[order(data$x)]
y <- data$y[order(data$x)]
y <- zoo::rollmedian(y, n_roll, na.pad = TRUE)
structure(list(x = x, y = y, f = approxfun(x, y)), class = "rollmed")
}
predict.rollmed <- function(mod, newdata, ...) {
setNames(mod$f(newdata$x), newdata$x)
}
# rolling time window median
rolling_median2 <- function(formula, data, xwindow = 2, ...) {
x <- data$x[order(data$x)]
y <- data$y[order(data$x)]
ys <- rep(NA, length(x)) # for the smoothed y values
xs <- setdiff(unique(x), NA) # the unique x values
i <- 1 # for testing
for (i in seq_along(xs)){
j <- xs[i] - xwindow/2 < x & x < xs[i] + xwindow/2 # x points in this window
ys[x == xs[i]] <- median(y[j], na.rm = TRUE) # y median over this window
}
y <- ys
structure(list(x = x, y = y, f = approxfun(x, y)), class = "rollmed2")
}
predict.rollmed2 <- function(mod, newdata, ...) {
setNames(mod$f(newdata$x), newdata$x)
}
# plot smooth
ggplot(data) +
geom_point(aes(x = x, y = y)) +
geom_smooth(aes(x = x, y = y, colour = "nwindow"), formula = y ~ x, method = "rolling_median", se = FALSE, method.args = list(n_roll = nwindow)) +
geom_smooth(aes(x = x, y = y, colour = "xwindow"), formula = y ~ x, method = "rolling_median2", se = FALSE, method.args = list(xwindow = xwindow))
Created on 2022-01-05 by the reprex package (v2.0.1)

Line instead of Dot (R) Not so easy

If you could help me it would be great :
So i'm doing a double curve (SDT) graph, and i have a bit of a problem : here my graph :
First time I have this problem ... Really have no clue how to solve it, well I just think my data is not ordered but how can I order it easily ?
Here's me code (but really nothing special) :
x = TDSindice2$Hit
mean = mean(x)
sd = sd(x)
y = dnorm(x,mean,sd)
plot(x,y, col = "red")
x = TDSindice2$Fa
mean = mean(x)
sd = sd(x)
y = dnorm(x,mean,sd)
par(new=TRUE)
plot(x,y ,type = "l", col ="blue")
Thanks for all :)
You need to order your data in terms of increasing values of x before plotting. For example:
set.seed(1)
x <- runif(50)
y <- 1.2 + (1.4 * x) + (-2.5 * x^2)
plot(x, y)
lines(x, y)
The order() function can be used to generate an index that when applied to a variable/object places the values of that object in the required order (increasing by default):
ord <- order(x)
plot(x[ord], [ord], type = "o")
But you'd be better off have x and y in the same object, a data frame, and just sort the rows of that:
dat <- data.frame(x = x, y = y)
ord <- with(dat, order(x))
plot(y ~ x, data = dat[ord, ], type = "o") ## or
## lines(y ~ x, data = dat[ord, ])
Note that order() is used to index the data hence we don't change the original ordering, we just permute the rows as we supply the object to the plot() function.

Plot the observed and fitted values from a linear regression using xyplot() from the lattice package

I can create simple graphs. I would like to have observed and predicted values (from a linear regression) on the same graph. I am plotting say Yvariable vs Xvariable. There is only 1 predictor and only 1 response. How could I also add linear regression curve to the same graph?
So to conclude need help with:
plotting actuals and predicted both
plotting regression line
Here is one option for the observed and predicted values in a single plot as points. It is easier to get the regression line on the observed points, which I illustrate second
First some dummy data
set.seed(1)
x <- runif(50)
y <- 2.5 + (3 * x) + rnorm(50, mean = 2.5, sd = 2)
dat <- data.frame(x = x, y = y)
Fit our model
mod <- lm(y ~ x, data = dat)
Combine the model output and observed x into a single object for plott
res <- stack(data.frame(Observed = dat$y, Predicted = fitted(mod)))
res <- cbind(res, x = rep(dat$x, 2))
head(res)
Load lattice and plot
require("lattice")
xyplot(values ~ x, data = res, group = ind, auto.key = TRUE)
The resulting plot should look similar to this
To get just the regression line on the observed data, and the regression model is a simple straight line model as per the one I show then you can circumvent most of this and just plot using
xyplot(y ~ x, data = dat, type = c("p","r"), col.line = "red")
(i.e. you don't even need to fit the model or make new data for plotting)
The resulting plot should look like this
An alternative to the first example which can be used with anything that will give coefficients for the regression line is to write your own panel functions - not as scary as it seems
xyplot(y ~ x, data = dat, col.line = "red",
panel = function(x, y, ...) {
panel.xyplot(x, y, ...)
panel.abline(coef = coef(mod), ...) ## using mod from earlier
}
)
That gives a plot from Figure 2 above, but by hand.
Assuming you've done this with caret then
mod <- train(y ~ x, data = dat, method = "lm",
trControl = trainControl(method = "cv"))
xyplot(y ~ x, data = dat, col.line = "red",
panel = function(x, y, ...) {
panel.xyplot(x, y, ...)
panel.abline(coef = coef(mod$finalModel), ...) ## using mod from caret
}
)
Will produce a plot the same as Figure 2 above.
Another option is to use panel.lmlineq from latticeExtra.
library(latticeExtra)
set.seed(0)
xsim <- rnorm(50, mean = 3)
ysim <- (0 + 2 * xsim) * (1 + rnorm(50, sd = 0.3))
## basic use as a panel function
xyplot(ysim ~ xsim, panel = function(x, y, ...) {
panel.xyplot(x, y, ...)
panel.lmlineq(x, y, adj = c(1,0), lty = 1,xol.text='red',
col.line = "blue", digits = 1,r.squared =TRUE)
})

Interpolating a path/curve within R

Within R, I want to interpolate an arbitrary path with constant distance
between interpolated points.
The test-data looks like that:
require("rgdal", quietly = TRUE)
require("ggplot2", quietly = TRUE)
r <- readOGR(".", "line", verbose = FALSE)
coords <- as.data.frame(r#lines[[1]]#Lines[[1]]#coords)
names(coords) <- c("x", "y")
print(coords)
x y
-0.44409 0.551159
-1.06217 0.563326
-1.09867 0.310255
-1.09623 -0.273754
-0.67283 -0.392990
-0.03772 -0.273754
0.63633 -0.015817
0.86506 0.473291
1.31037 0.998899
1.43934 0.933198
1.46854 0.461124
1.39311 0.006083
1.40284 -0.278621
1.54397 -0.271321
p.orig <- ggplot(coords, aes(x = x, y = y)) + geom_path(colour = "red") +
geom_point(colour = "yellow")
print(p.orig)
I tried different methods, none of them were really satisfying:
aspline (akima-package)
approx
bezierCurve
with the tourr-package I couldn't get started
aspline
aspline from the akima-package does some weird stuff when dealing with arbitrary paths:
plotInt <- function(coords) print(p.orig + geom_path(aes(x = x, y = y),
data = coords) + geom_point(aes(x = x, y = y), data = coords))
N <- 50 # 50 points to interpolate
require("akima", quietly = TRUE)
xy.int.ak <- as.data.frame(with(coords, aspline(x = x, y = y, n = N)))
plotInt(xy.int.ak)
approx
xy.int.ax <- as.data.frame(with(coords, list(x = approx(x, n = N)$y,
y = approx(y, n = N)$y)))
plotInt(xy.int.ax)
At first sight, approx looks pretty fine; however, testing it with real data gives me
problems with the distances between the interpolated points. Also a smooth, cubic interpolation would be a nice thing.
bezier
Another approach is to use bezier-curves; I used the following
implementation
source("bez.R")
xy.int.bz <- as.data.frame(with(coords, bezierCurve(x, y, N)))
plotInt(xy.int.bz)
How about regular splines using the same method you used for approx? Will that work on the larger data?
xy.int.sp <- as.data.frame(with(coords, list(x = spline(x)$y,
y = spline(y)$y)))
Consider using xspline or grid.xspline (the first is for base graphics, the second for grid):
plot(x,y, type='b', col='red')
xspline(x,y, shape=1)
You can adjust the shape parameter to change the curve, this example just plots the x spline, but you can also have the function return a set of xy coordinates that you would plot yourself.

Resources