Plotting Smoothed Time Series - r

I am trying to follow this tutorial here : https://rc2e.com/timeseriesanalysis ( bottom of the page) and plot a smoothed time series and the original time series on the same plot. I have simulated some data below, smoothed it, and then tried to plot it.
library(dplyr)
library(KernSmooth)
library(ggplot2)
a = rnorm(2000,10,10)
y = ts(a, frequency = 12)
gridsize <- length(y)
bw <- dpill(t, y, gridsize = gridsize)
lp <- locpoly(x = t, y = y, bandwidth = bw, gridsize = gridsize)
smooth <- lp$y
ggplot() +
geom_line(aes(x = t, y = y)) +
geom_line(aes(x = t, y = smooth), linetype = 2)
However, there seems to be some problem. The first error appears : 'x' must be atomic for 'sort.list', method "shell" and "quick"
Could someone please tell me what I am doing wrong?
Thanks

You can fit a smoothed curve to a time series directly in ggplot. Here's an example using gam inside geom_smooth:
library(ggplot2)
set.seed(1)
a <- cumsum(rnorm(2000, 0.1, 10))
t <- seq(as.Date("1854-06-01"), by = "1 month", length.out = 2000)
ggplot(data.frame(t, a), aes(t, a)) +
geom_point(size = 0.1, color = "orange2", alpha = 0.5) +
geom_smooth(method = 'gam', formula = y ~ s(x, k = 30, bs = "cs"),
fill = "orange", color = "orange4", linetype = 2) +
theme_bw()

Related

ggplot2 stat_density_2d: how to fix polygon errors at the dataset bounding box edges?

I am analysing my sfc_POINT dataset using ggplot2::stat_density_2d. I seems, though, that the resulting polygons get all wonky at the edge of the dataset bounding box. For whatever reason, ggplot2 can't seem to draw polygons further than the source dataframe edges.
How do I fix the density polygons? Is there a way to extend the bounding box for stat_density_2d somehow?
For this question I am able to reproduce my problem with a random sample point data. Please see the code here:
library(dplyr)
library(sf)
library(geofi)
library(ggplot2)
# Finland municipalities
muns <- geofi::get_municipalities(year = 2022)
# Create sample points
points <- sf::st_sample(muns, 50) %>% as.data.frame()
points[c("x", "y")] <- sf::st_coordinates(points$geometry)
p4 <- ggplot() +
geom_sf(data = muns) +
coord_sf(default_crs = sf::st_crs(3067)) +
stat_density_2d(geom = "polygon",
data = points,
aes(x = x, y = y, fill = ..level..),
alpha = 0.75) +
scale_fill_viridis_c() +
geom_point(data = points, aes(x = x, y = y), alpha = 0.5)
p4
The "crazy polygons" phenomenon was an old problem that was often encountered when this method (calculating polygons via stat_density2d) was the only option. It was was remediated by the addition of geom_density2d_filled in ggplot2 v3.3.2
ggplot() +
geom_sf(data = muns) +
coord_sf(default_crs = sf::st_crs(3067)) +
geom_density2d_filled(
data = points,
aes(x = x, y = y),
alpha = 0.75) +
geom_point(data = points, aes(x = x, y = y), alpha = 0.5)
If you want to cover the whole range of muns rather than just the range of points, you can pre-calculate the density yourself over the bounding box of muns and pass that to geom_contour_filled:
d <- MASS::kde2d(points$x, points$y, lims = st_bbox(muns)[c(1, 3, 2, 4)])
dens <- data.frame(expand.grid(x = d$x, y = d$y), z = as.vector(d$z))
ggplot() +
geom_contour_filled(data = dens, aes(x = x, y = y, z = z,
alpha = after_stat(level))) +
geom_sf(data = muns, fill = NA, color = "black") +
coord_sf(default_crs = sf::st_crs(3067), expand = FALSE) +
geom_point(data = points, aes(x = x, y = y), alpha = 0.5) +
scale_alpha_manual(values = c(0, rep(0.75, 7)), guide = "none")

How to plot stat_mean for scatterplot in R ggplot2?

For each treatment tmt, I want to plot the means using stat_summary in ggplot2 with different colour size. I find that the there are mulitple means being plotted over the current points. Not sure how to rectify it.
df <- data.frame(x = rnorm(12, 4,1), y = rnorm(12, 6,4), tmt = rep(c("A","B","C"), each = 4))
ggplot(aes(x = x, y = y, fill = tmt), data = df) +
geom_point(shape=21, size=5, alpha = 0.6) +
scale_fill_manual(values=c("pink","blue", "purple")) +
stat_summary(aes(fill = tmt), fun = 'mean', geom = 'point', size = 5) +
scale_fill_manual(values=c("pink","blue", "purple"))
Plot without the last two lines of code
Plot with the entire code
Using stat_summary you compute the mean of y for each pair of x and tmt. If you want the mean of x and the mean of y per tmt I would suggest to manually compute the means outside of ggplot and use a second geom_point to plot the means. In my code below I increased the size and used rectangles for the means:
df <- data.frame(x = rnorm(12, 4,1), y = rnorm(12, 6,4), tmt = rep(c("A","B","C"), each = 4))
library(ggplot2)
library(dplyr)
df_mean <- df |>
group_by(tmt) |>
summarise(across(c(x, y), mean))
ggplot(aes(x = x, y = y, fill = tmt), data = df) +
geom_point(shape=21, size=5, alpha = 0.6) +
geom_point(data = df_mean, shape=22, size=8, alpha = 0.6) +
scale_fill_manual(values=c("pink","blue", "purple"))

Plotting power vs. effect size using R pwr package

I can successfully create plots of power vs. sample size in R using the pwr package. Example code below.
library(pwr)
library(tidyverse)
plot.out <- pwr.t2n.test(n1=30, n2=30, d=0.5, alternative="two.sided")
#See output in link below
plot(plot.out)
plot() output
I would like to create a similar plot -- a two-sample t-test in which effect size is on the y-axis and power is on the x-axis, with fixed sample sizes.
Is there a way to do this using pwr and/or the plot function? Or would I have to unlist the plot.out object and use it somehow?
I'm still new to power curves in R. Thanks in advance for any advice.
In the code below the power is computed in a loop on effect size d_seq. Then the power d is extracted from the results list, a data.frame is created and plotted.
library(pwr)
library(ggplot2)
d_seq <- seq(0, 2, by = 0.1)
pwr_list <- lapply(d_seq, function(d){
pwr.t2n.test(n1 = 30, n2 = 30,
d = d,
power = NULL,
sig.level = 0.05,
alternative = "two.sided")
})
pwr <- sapply(pwr_list, '[[', 'power')
dfpwr <- data.frame(power = pwr, effect.size = d_seq)
ggplot(dfpwr, aes(effect.size, power)) +
geom_point(size = 2, colour = "black") +
geom_line(size = 0.5, colour = "red") +
scale_y_continuous(labels = scales::percent) +
xlab("effect size") +
ylab(expression("test power =" ~ 1 - beta))
To draw a line where power is 80% and get the effect size, first compute the effect size from the pwr vector by linear interpolation.
pwr80 <- approx(x = pwr, y = d_seq, xout = 0.8)
Now create a label for geom_text and plot it.
lbl80 <- paste("Power = 80%\n")
lbl80 <- paste(lbl80, "Effect size =", round(pwr80$y, 2))
ggplot(dfpwr, aes(effect.size, power)) +
geom_point(size = 2, colour = "black") +
geom_line(size = 0.5, colour = "red") +
geom_hline(yintercept = 0.8, linetype = "dotted") +
geom_text(x = pwr80$y, y = pwr80$x,
label = lbl80,
hjust = 1, vjust = -1) +
scale_y_continuous(labels = scales::percent) +
xlab("effect size") +
ylab(expression("test power =" ~ 1 - beta))
To also draw a vertical line, add
geom_vline(xintercept = pwr80$y, linetype = "dotted")

Fitting Rayleigh in R

This code
library(ggplot2)
library(MASS)
# Generate gamma rvs
x <- rgamma(100000, shape = 2, rate = 0.2)
den <- density(x)
dat <- data.frame(x = den$x, y = den$y)
ggplot(data = dat, aes(x = x, y = y)) +
geom_point(size = 3) +
theme_classic()
# Fit parameters (to avoid errors, set lower bounds to zero)
fit.params <- fitdistr(estimate, "gamma", lower = c(0, 0))
# Plot using density points
ggplot(data = dat, aes(x = x,y = y)) +
geom_point(size = 3) +
geom_line(aes(x=dat$x, y=dgamma(dat$x,fit.params$estimate["shape"], fit.params$estimate["rate"])),
color="red", size = 1) +
theme_classic()
fits and plots the distribution of series x. The resulting plot is:
Packages stats and MASS seem not to support the Rayleigh distribution. How can I extend the previous code to the Rayleigh distribution?
In the code below I start by recreating the vector x, this time setting the RNG seed, in order to make the results reproducible. Then a data.frame dat with only that vector is also recreated.
The density functions of the Gamma and Rayleigh distributions are fit to the histogram of x by first estimating their parameters and with stat_function.
library(ggplot2)
library(MASS)
library(extraDistr) # for the Rayleigh distribution functions
# Generate gamma rvs
set.seed(2020)
x <- rgamma(100000, shape = 2, rate = 0.2)
dat <- data.frame(x)
# Fit parameters (to avoid errors, set lower bounds to zero)
fit.params <- fitdistr(dat$x, "gamma", lower = c(0, 0))
ggplot(data = dat, aes(x = x)) +
geom_histogram(aes(y = ..density..), bins = nclass.Sturges(x)) +
stat_function(fun = dgamma,
args = list(shape = fit.params$estimate["shape"],
rate = fit.params$estimate["rate"]),
color = "red", size = 1) +
ggtitle("Gamma density") +
theme_classic()
fit.params.2 <- fitdistrplus::fitdist(dat$x, "rayleigh", start = list(sigma = 1))
fit.params.2$estimate
ggplot(data = dat, aes(x = x)) +
geom_histogram(aes(y = ..density..), bins = nclass.Sturges(x)) +
stat_function(fun = drayleigh,
args = list(sigma = fit.params.2$estimate),
color = "blue", size = 1) +
ggtitle("Rayleigh density") +
theme_classic()
To plot points and lines like in the question, not histograms, use the code below.
den <- density(x)
orig <- data.frame(x = den$x, y = den$y)
ggplot(data = orig, aes(x = x)) +
geom_point(aes(y = y), size = 3) +
geom_line(aes(y = dgamma(x, fit.params$estimate["shape"], fit.params$estimate["rate"])),
color="red", size = 1) +
geom_line(aes(y = drayleigh(x, fit.params.2$estimate)),
color="blue", size = 1) +
theme_classic()

Exclude a particular area from geom_smooth fit automatically

I am plotting different plots in my shiny app.
By using geom_smooth(), I am fitting a smoothing curve on a scatterplot.
I am plotting these plots with ggplot() and rendering with ggplotly().
Is there any way, I can exclude a particular data profile from geom_smooth().
For e.g.:
It can be seen in the fit, the fit is getting disturbed and which is not desirable. I have tried plotly_click(), plotly_brush(), plotly_select(). But, I don't want user's interference when plotting this fit, this makes the process much slower and inaccurate.
Here is my code to plot this:
#plot
g <- ggplot(data = d_f4, aes_string(x = d_f4$x, y = d_f4$y)) + theme_bw() +
geom_point(colour = "blue", size = 0.1)+
geom_smooth(formula = y ~ splines::bs(x, df = 10), method = "lm", color = "green3", level = 1, size = 1)
Unfortunately, I can not include my dataset in my question, because the dataset is quite big.
You can make an extra data.frame without the "outliers" and use this as the input for geom_smooth:
set.seed(8)
test_data <- data.frame(x = 1:100)
test_data$y <- sin(test_data$x / 10) + rnorm(100, sd = 0.1)
test_data[60:65, "y"] <- test_data[60:65, "y"] + 1
data_plot <- test_data[-c(60:65), ]
library(ggplot2)
ggplot(data = test_data, aes(x = x, y = y)) + theme_bw() +
geom_point(colour = "blue", size = 0.1) +
geom_smooth(formula = y ~ splines::bs(x, df = 10), method = "lm", color = "green3", level = 1, size = 1)
ggplot(data = test_data, aes(x = x, y = y)) + theme_bw() +
geom_point(colour = "blue", size = 0.1) +
geom_smooth(data = data_plot, formula = y ~ splines::bs(x, df = 10), method = "lm", color = "green3", level = 1, size = 1)
Created on 2020-11-27 by the reprex package (v0.3.0)
BTW: you don't need aes_string (which is deprecated) and d_f4$x, you can just use aes(x = x)

Resources