Is there a way to plot a randomwalk process using ggplot - r

So my aim is to compare the movement of the random walk process with stock prices movement.
I created a random walk process and plotted that as follows
P1<-RW(100,10,0,0.0004) plot(P2, main="Random Walk without Drift", xlab="index(",ylab="Price", ylim=c(9.7,10.3), typ='l', col="blue")
and it worked.
But is it possible to use ggplot instead of plot

In base graphics, when you do plot(x) (and no y component), several things go on under the hood. Notably, though, is that it calls xy.coords(x, y), which eventually does ...
else {
if (is.factor(x))
x <- as.numeric(x)
if (setLab)
xlab <- "Index"
y <- x
x <- seq_along(x)
}
which is the clue into how to get ggplot2 to do effectively the same thing: by assign the values to y and creating a sequence into x.
set.seed(42)
P1 <- cumsum(rnorm(1000))
plot(P1, type = "l")
ggplot(mapping = aes(x = seq_along(P1), y = P1)) + geom_line()
or in a "formalized" data.frame:
dat <- data.frame(x = seq_along(P1), y = P1)
ggplot(dat, aes(x = x, y = y)) + geom_line()

Related

Adding loess regresion line on a hexbin plot

I have been trying to find method to add a loess regression line on a hexbin plot. So far I do not have any success... Any suggestions?
My code is as follow:
bin<-hexbin(Dataset$a, Dataset$b, xbins=40)
plot(bin, main="Hexagonal Binning",
xlab = "a", ylab = "b",
type="l")
I would suggest using ggplot2 to build the plot.
Since you didn't include any example data, I've used the palmerpenguins package dataset for the example below.
library(palmerpenguins) # For the data
library(ggplot2) # ggplot2 for plotting
ggplot(penguins, aes(x = body_mass_g,
y = bill_length_mm)) +
geom_hex(bins = 40) +
geom_smooth(method = 'loess', se = F, color = 'red')
Created on 2021-01-05 by the reprex package (v0.3.0)
I don't have a solution for base, but it's possible to do this with ggplot. It should be possible with base too, but if you look at the documentation for ?hexbin, you can see the quote:
Note that when plotting a hexbin object, the grid package is used. You must use its graphics (or those from package lattice if you know how) to add to such plots.
I'm not familiar with how to modify these. I did try ggplotify to convert the base to ggplot and edit that way, but couldn't get the loess line added to the plot window properly.
So here is a solution with ggplot with some fake data that you can try on your Datasets:
library(hexbin)
library(ggplot2)
# fake data with a random walk, replace with your data
set.seed(100)
N <- 1000
x <- rnorm(N)
x <- sort(x)
y <- vector("numeric", length=N)
for(i in 2:N){
y[i] <- y[i-1] + rnorm(1, sd=0.1)
}
# current method
# In documentation for ?hexbin it says:
# "You must use its graphics (or those from package lattice if you know how) to add to such plots."
(bin <- hexbin(x, y, xbins=40))
plot(bin)
# ggplot option. Can play around with scale_fill_gradient to
# get the colour scale similar or use other ggplot options
df <- data.frame(x=x, y=y)
d <- ggplot(df, aes(x, y)) +
geom_hex(bins=40) +
scale_fill_gradient(low = "grey90", high = "black") +
theme_bw()
d
# easy to add a loess fit to the data
# span controls the degree of smoothing, decrease to make the line
# more "wiggly"
model <- loess(y~x, span=0.2)
fit <- predict(model)
loess_data <- data.frame(x=x, y=fit)
d + geom_line(data=loess_data, aes(x=x, y=y), col="darkorange",
size=1.5)
Here are two options; you will need to decide if you want to smooth over the raw data or the binned data.
library(hexbin)
library(grid)
# Some data
set.seed(101)
d <- data.frame(x=rnorm(1000))
d$y <- with(d, 2*x^3 + rnorm(1000))
Method A - binned data
# plot hexbin & smoother : need to grab plot viewport
# From ?hexVP.loess : "Fit a loess line using the hexagon centers of mass
# as the x and y coordinates and the cell counts as weights."
bin <- hexbin(d$x, d$y)
p <- plot(bin)
hexVP.loess(bin, hvp = p$plot.vp, span = 0.4, col = "red", n = 200)
Method B - raw data
# calculate loess predictions outside plot on raw data
l = loess(y ~ x, data=d, span=0.4)
xp = with(d, seq(min(x), max(x), length=200))
yp = predict(l, xp)
# plot hexbin
bin <- hexbin(d$x, d$y)
p <- plot(bin)
# add loess line
pushHexport(p$plot.vp)
grid.lines(xp, yp, gp=gpar(col="red"), default.units = "native")
upViewport()

geom_smooth() with median instead of mean

I am building a plot with ggplot. I have data where y is mostly independent of X, but I randomly have a few extreme values of Y at low values of X. Like this:
set.seed(1)
X <- rnorm(500, mean=5)
y <- rnorm(500)
y[X < 3] <- sample(c(0, 1000), size=length(y[X < 3]),prob=c(0.9, 0.1),
replace=TRUE)
I want to make the point that the MEDIAN y-value is still constant over X values. I can see that this is basically true here:
mean(y[X < 3])
median(y[X < 3])
If I make a geom_smooth() plot, it does mean, and is very affected by outliers:
ggplot(data=NULL, aes(x=X, y=y)) + geom_smooth()
I have a few potential fixes. For example, I could first use group_by/summarize to make a dataset of binned medians and then plot that. I would rather NOT do this because in my real data I have a lot of facetting and grouping variables, and it would be a lot to keep track of (non-ideal). A lot plot definitely looks better, but log does not have nice interpretation in my application (median does have nice interpretation)
ggplot(data=NULL, aes(x=X, y=y)) + geom_smooth() +
scale_y_log10()
Finally, I know about geom_quantile but I think I'm using it wrong. Is there a way to add an error bar? Also- this geom_quantile plot looks way too smooth, and I don't understand why it is sloping down. Am I using it wrong?
ggplot(data=NULL, aes(x=X, y=y)) +
geom_quantile(quantiles=c(0.5))
I realize that this problem probably has a LOT of workarounds, but if possible I would love to use geom_smooth and just provide an argument that tells it to use a median. I want geom_smooth for a side-by-side comparison with consistency. I want to put the mean and median geom_smooths side-by-side to show "hey look, super strong pattern between Y and X is driven by a few large outliers, if we look only at median the pattern disappears".
Thanks!!
You can create your own method to use in geom_smooth. As long as you have a function that produces an object on which the predict generic works to take a data frame with a column called x and translate into appropriate values of y.
As an example, let's create a simple model that interpolates along a running median. We wrap it in its own class and give it its own predict method:
rolling_median <- function(formula, data, n_roll = 11, ...) {
x <- data$x[order(data$x)]
y <- data$y[order(data$x)]
y <- zoo::rollmedian(y, n_roll, na.pad = TRUE)
structure(list(x = x, y = y, f = approxfun(x, y)), class = "rollmed")
}
predict.rollmed <- function(mod, newdata, ...) {
setNames(mod$f(newdata$x), newdata$x)
}
Now we can use our method in geom_smooth:
ggplot(data = NULL, aes(x = X, y = y)) +
geom_smooth(formula = y ~ x, method = "rolling_median", se = FALSE)
Now of course, this doesn't look very "flat", but it is way flatter than the line calculated by the loess method of the standard geom_smooth() :
ggplot(data = NULL, aes(x = X, y = y)) +
geom_smooth(formula = y ~ x, color = "red", se = FALSE) +
geom_smooth(formula = y ~ x, method = "rolling_median", se = FALSE)
Now, I understand that this is not the same thing as "regressing on the median", so you may wish to explore different methods, but if you want to get geom_smooth to plot them, this is how you can go about it. Note that if you want standard errors, you will need to have your predict function return a list with members called fit and se.fit
Here's a modification of #Allan's answer that uses a fixed x window rather than a fixed number of points. This is useful for irregular time series and series with multiple observations at the same time (x value). It uses a loop so it's not very efficient and will be slow for larger data sets.
# running median with time window
library(dplyr)
library(ggplot2)
library(zoo)
# some irregular and skewed data
set.seed(1)
x <- seq(2000, 2020, length.out = 400) # normal time series, gives same result for both methods
x <- sort(rep(runif(40, min = 2000, max = 2020), 10)) # irregular and repeated time series
y <- exp(runif(length(x), min = -1, max = 3))
data <- data.frame(x = x, y = y)
# ggplot(data) + geom_point(aes(x = x, y = y))
# 2 year window
xwindow <- 2
nwindow <- xwindow * length(x) / 20 - 1
# rolling median
rolling_median <- function(formula, data, n_roll = 11, ...) {
x <- data$x[order(data$x)]
y <- data$y[order(data$x)]
y <- zoo::rollmedian(y, n_roll, na.pad = TRUE)
structure(list(x = x, y = y, f = approxfun(x, y)), class = "rollmed")
}
predict.rollmed <- function(mod, newdata, ...) {
setNames(mod$f(newdata$x), newdata$x)
}
# rolling time window median
rolling_median2 <- function(formula, data, xwindow = 2, ...) {
x <- data$x[order(data$x)]
y <- data$y[order(data$x)]
ys <- rep(NA, length(x)) # for the smoothed y values
xs <- setdiff(unique(x), NA) # the unique x values
i <- 1 # for testing
for (i in seq_along(xs)){
j <- xs[i] - xwindow/2 < x & x < xs[i] + xwindow/2 # x points in this window
ys[x == xs[i]] <- median(y[j], na.rm = TRUE) # y median over this window
}
y <- ys
structure(list(x = x, y = y, f = approxfun(x, y)), class = "rollmed2")
}
predict.rollmed2 <- function(mod, newdata, ...) {
setNames(mod$f(newdata$x), newdata$x)
}
# plot smooth
ggplot(data) +
geom_point(aes(x = x, y = y)) +
geom_smooth(aes(x = x, y = y, colour = "nwindow"), formula = y ~ x, method = "rolling_median", se = FALSE, method.args = list(n_roll = nwindow)) +
geom_smooth(aes(x = x, y = y, colour = "xwindow"), formula = y ~ x, method = "rolling_median2", se = FALSE, method.args = list(xwindow = xwindow))
Created on 2022-01-05 by the reprex package (v2.0.1)

Make ggplot with regression line and normal distribution overlay

I am trying to make a plot to show the intuition behind logistic (or probit) regression. How would I make a plot that looks something like this in ggplot?
(Wolf & Best, The Sage Handbook of Regression Analysis and Causal Inference, 2015, p. 155)
Actually, what I would rather even do is have one single normal distribution displayed along the y axis with mean = 0, and a specific variance, so that I can draw horizontal lines going from the linear predictor to the y axis and sideways normal distribution. Something like this:
What this is supposed to show (assuming I haven't misunderstood something) is . I haven't had much success so far...
library(ggplot2)
x <- seq(1, 11, 1)
y <- x*0.5
x <- x - mean(x)
y <- y - mean(y)
df <- data.frame(x, y)
# Probability density function of a normal logistic distribution
pdfDeltaFun <- function(x) {
prob = (exp(x)/(1 + exp(x))^2)
return(prob)
}
# Tried switching the x and y to be able to turn the
# distribution overlay 90 degrees with coord_flip()
ggplot(df, aes(x = y, y = x)) +
geom_point() +
geom_line() +
stat_function(fun = pdfDeltaFun)+
coord_flip()
I think this comes pretty close to the first illustration you give. If this is a thing you don't need to repeat many times, it is probably best to compute the density curves prior to plotting and use a seperate dataframe to plot these.
library(ggplot2)
x <- seq(1, 11, 1)
y <- x*0.5
x <- x - mean(x)
y <- y - mean(y)
df <- data.frame(x, y)
# For every row in `df`, compute a rotated normal density centered at `y` and shifted by `x`
curves <- lapply(seq_len(NROW(df)), function(i) {
mu <- df$y[i]
range <- mu + c(-3, 3)
seq <- seq(range[1], range[2], length.out = 100)
data.frame(
x = -1 * dnorm(seq, mean = mu) + df$x[i],
y = seq,
grp = i
)
})
# Combine above densities in one data.frame
curves <- do.call(rbind, curves)
ggplot(df, aes(x, y)) +
geom_point() +
geom_line() +
# The path draws the curve
geom_path(data = curves, aes(group = grp)) +
# The polygon does the shading. We can use `oob_squish()` to set a range.
geom_polygon(data = curves, aes(y = scales::oob_squish(y, c(0, Inf)),group = grp))
The second illustration is pretty close to your code. I simplified your density function by the standard normal density function and added some extra paramters to stat function:
library(ggplot2)
x <- seq(1, 11, 1)
y <- x*0.5
x <- x - mean(x)
y <- y - mean(y)
df <- data.frame(x, y)
ggplot(df, aes(x, y)) +
geom_point() +
geom_line() +
stat_function(fun = dnorm,
aes(x = after_stat(-y * 4 - 5), y = after_stat(x)),
xlim = range(df$y)) +
# We fill with a polygon, squishing the y-range
stat_function(fun = dnorm, geom = "polygon",
aes(x = after_stat(-y * 4 - 5),
y = after_stat(scales::oob_squish(x, c(-Inf, -1)))),
xlim = range(df$y))

How to plot loess surface with ggplot

i have this code and i create a loess surface of my dataframe.
library(gstat)
library(sp)
x<-c(0,55,105,165,270,65,130,155,155,225,250,295,
30,100,110,135,160,190,230,300,30,70,105,170,
210,245,300,0,85,175,300,15,60,90,90,140,210,
260,270,295,5,55,55,90,100,140,190,255,285,270)
y<-c(305,310,305,310,310,260,255,265,285,280,250,
260,210,240,225,225,225,230,210,215,160,190,
190,175,160,160,170,120,135,115,110,85,90,90,
55,55,90,85,50,50,25,30,5,35,15,0,40,20,5,150)
z<-c(870,793,755,690,800,800,730,728,710,780,804,
855,813,762,765,740,765,760,790,820,855,812,
773,812,827,805,840,890,820,873,875,873,865,
841,862,908,855,850,882,910,940,915,890,880,
870,880,960,890,860,830)
dati<-data.frame(x,y,z)
x.range <- as.numeric(c(min(x), max(x)))
y.range <- as.numeric(c(min(y), max(y)))
meuse.loess <- loess(z ~ x * y, dati, degree=2, span = 0.25,
normalize=F)
meuse.mar <- list(x = seq(from = x.range[1], to = x.range[2], by = 1), y = seq(from = y.range[1],
to = y.range[2], by = 1))
meuse.lo <- predict(meuse.loess, newdata=expand.grid(meuse.mar), se=TRUE)
Now I want to plot meuse.lo[[1]] with ggplot2 function... but i don't know how to convert meuse.lo[[1]] in a dataframe with x,y (grid's coordinates) and z (interpolated value) columns. Thanks.
Your problem here is that loess() returns a matrix if you use grid.expand() to generate the new data for loess().
This is mentioned in the help for ?loess.predict:
If newdata was the result of a call to expand.grid, the predictions (and s.e.'s if requested) will be an array of the appropriate dimensions.
Now, you can still use grid.expand() to compute the new data, but force this function to return a data frame and dropping the attributes.
From ?grid.expand:
KEEP.OUT.ATTRS: a logical indicating the "out.attrs" attribute (see below) should be computed and returned.
So, try this:
nd <- expand.grid(meuse.mar, KEEP.OUT.ATTRS = FALSE)
meuse.lo <- predict(meuse.loess, newdata=nd, se=TRUE)
# Add the fitted data to the `nd` object
nd$z <- meuse.lo$fit
library(ggplot2)
ggplot(nd, aes(x, y, col = z)) +
geom_tile() +
coord_fixed()
The result:
ggplot2 is probably not the best choice for 3d graphs. However here is an easy solution with rgl
library(rgl)
plot3d(x, y, z, type="s", size=0.75, lit=FALSE,col="red")
surface3d(meuse.mar[[1]], meuse.mar[[2]], meuse.lo[[1]],
alpha=0.4, front="lines", back="lines")

Interpolating a path/curve within R

Within R, I want to interpolate an arbitrary path with constant distance
between interpolated points.
The test-data looks like that:
require("rgdal", quietly = TRUE)
require("ggplot2", quietly = TRUE)
r <- readOGR(".", "line", verbose = FALSE)
coords <- as.data.frame(r#lines[[1]]#Lines[[1]]#coords)
names(coords) <- c("x", "y")
print(coords)
x y
-0.44409 0.551159
-1.06217 0.563326
-1.09867 0.310255
-1.09623 -0.273754
-0.67283 -0.392990
-0.03772 -0.273754
0.63633 -0.015817
0.86506 0.473291
1.31037 0.998899
1.43934 0.933198
1.46854 0.461124
1.39311 0.006083
1.40284 -0.278621
1.54397 -0.271321
p.orig <- ggplot(coords, aes(x = x, y = y)) + geom_path(colour = "red") +
geom_point(colour = "yellow")
print(p.orig)
I tried different methods, none of them were really satisfying:
aspline (akima-package)
approx
bezierCurve
with the tourr-package I couldn't get started
aspline
aspline from the akima-package does some weird stuff when dealing with arbitrary paths:
plotInt <- function(coords) print(p.orig + geom_path(aes(x = x, y = y),
data = coords) + geom_point(aes(x = x, y = y), data = coords))
N <- 50 # 50 points to interpolate
require("akima", quietly = TRUE)
xy.int.ak <- as.data.frame(with(coords, aspline(x = x, y = y, n = N)))
plotInt(xy.int.ak)
approx
xy.int.ax <- as.data.frame(with(coords, list(x = approx(x, n = N)$y,
y = approx(y, n = N)$y)))
plotInt(xy.int.ax)
At first sight, approx looks pretty fine; however, testing it with real data gives me
problems with the distances between the interpolated points. Also a smooth, cubic interpolation would be a nice thing.
bezier
Another approach is to use bezier-curves; I used the following
implementation
source("bez.R")
xy.int.bz <- as.data.frame(with(coords, bezierCurve(x, y, N)))
plotInt(xy.int.bz)
How about regular splines using the same method you used for approx? Will that work on the larger data?
xy.int.sp <- as.data.frame(with(coords, list(x = spline(x)$y,
y = spline(y)$y)))
Consider using xspline or grid.xspline (the first is for base graphics, the second for grid):
plot(x,y, type='b', col='red')
xspline(x,y, shape=1)
You can adjust the shape parameter to change the curve, this example just plots the x spline, but you can also have the function return a set of xy coordinates that you would plot yourself.

Resources