R forecasting advice - r

I have create the attached plot using ggplot and data I currently hold. I want to be able to add on a simple linear forecast for future years, ideally with some sort of confidence intervals but can't seem to find anyway to do it without calculating the forecasted values in a separate dataframe

In addition to phivers and danloos solutions (if you only need a quick linear projection) you can extend the axis range and set the geom_smooth layer to fill the plot, not only the data range:
data.frame(x = 1:10, y = 1:10 + runif(10)) %>%
ggplot(aes(x,y)) +
geom_point() +
geom_smooth(method = 'lm', ## lm for linear model
## extend smoother to plot range:
fullrange = TRUE
) +
## extend axis beyond data range:
scale_x_continuous(limits = c(1,20))
sidenote: if you want to emphasise the trend in each plot facet, not the difference between facets, you can set the scales argument to facet_wrap to any of 'free', 'free_x' or 'free_y': facet_wrap(..., scales = 'free')

Related

Is there a way to flip y-axes ticks on ggsurvplot without changing the results?

I am using survival analysis to show the proportion of individuals/duration to reach a developmental milestone, and I would like to flip the y-axis ticks so it has 0 at the top and 1.00 at the bottom. I tried using scale_y_reverse, but this flipped the results too. I just want the axis ticks to go from 0-1, while maintaining the visuals of the first graph. Thanks for your help!!
ggsurvplot(spin_fit, data = spin.data, pval = TRUE, conf.int = TRUE)
ggplot2 <- ggsurvplot(spin_fit, data = spin.data, pval = TRUE)$plot
df1 <- data.frame(time=spin_fit$time, nRisk=spin_fit$n.risk, nRiskRel=spin_fit$n.risk/max(spin_fit$n.risk))
ggplot2 + geom_point(aes(x=time, y=nRiskRel), data = df1, alpha=0.5, size=3)
ggplot2 + ylab("Proportion of Larvae Spinning Cocoon") ]
Here's what happened when I added to the last line:
ggplot2 + ylab("Proportion of Larvae Spinning Cocoon") + scale_y_reverse()
You could try to use scale_y_continuous to set your own breaks and labels:
break_values <- c(0, 0.25, 0.5, 0.75, 1)
ggplot2 + scale_y_continuous(breaks = break_values,
labels = as.character(rev(break_values)))
ggsurvplot also natively supports this without manually messing with the scales through the fun argument.
From the help section:
fun: an arbitrary function defining a transformation of the survival curve. Often used transformations can be specified with a character argument: "event" plots cumulative events (f(y) = 1-y), "cumhaz" plots the cumulative hazard function (f(y) = -log(y)), and "pct" for survival probability in percentage.
So all you need to do is:
ggplot2 <- ggsurvplot(spin_fit, data = spin.data, pval = TRUE, fun = "event")$plot

X-axis in ggplot2 that transitions from linear to log scale

I am trying to create a histogram in ggplot2 where the x-axis transitions from linear scaling to log2 scaling after a pre-defined point. In other words, I want the x-axis to be of a linear scale up to some threshold, and then after that threshold, use the log2 scale.
So, before the threshold, the x-axis should look like what you would get from simply doing:
ggplot(data,aes(x=value)) + geom_histogram()
and after the threshold, the x-axis should look like what you would get from doing:
ggplot(data,aes(x=value)) + geom_histogram() + scale_x_continuous(trans='log2')
The problem is that while I can make those histograms individually (one where everything is on a linear scale, and one where everything is on a log2 scale), I don't know how to get it to transition and have both in one histogram.
I agree with the commenters that this would be problematic as a single figure. However, it could be informative, if you have one figure showing all data, and then an inset/subplot to show a subset. Here I used cowplot::plot_grid to combine two figures, but there are other packages out there for arranging (like gridExtra). Do be extremely cautious about how you label the figures.
library(ggplot2)
x <- rexp(1000, .05) + rep(c(0, 5), each = 500)
cowplot::plot_grid(
ggplot(data.frame(x = x[x<5]), aes(x)) +
geom_histogram() +
labs(title = "Subset, x<5, linear-scale"),
ggplot(data.frame(x), aes(x)) +
geom_vline(xintercept = 5, color = "red", size = 2) +
geom_histogram() +
scale_x_log10() +
labs(title = "All data, log-scale")
)

Adding predict line from glm to ggplot2, larger than original data set

I have included a sample data set just to demonstrate what I am trying to do.
Speed <- c(400,220,490,210,500,270,200,470,480,310,240,490,420,330,280,210,300,470,230,430,460,220,250,200,390)
Hit <- c(0,1,0,1,0,0,1,0,0,1,1,0,0,1,1,1,1,1,0,0,0,1,1,1,0)
obs <- c(1:25)
msl2.data <- as.data.frame(cbind(obs,Hit,Speed))
msl2.glm <- glm(Hit ~ Speed, data = msl2.data, family = binomial)
Doing What I want in the base package.
plot(Hit~ Speed, data = msl2.data, xlim = c(0,700), xlab = "Speed", ylab = "Hit", main = "Plot of hit vs Speed")
pi.hat<-(predict( msl2.glm, data.frame(Speed=c(0:700)), type="response" ))
lines( 0:700, pi.hat, col="blue" )
I am trying to recreate the above plot, but in ggplot. The error I have been unable to work around is the aes(x,y) have different lengths, which is true, but I want them to have different lengths.
Any ideas for this in gg?
You have a couple of approaches; the first does all the modelling
inside of ggplot, the second does it outside and passes the relevant data
to be plot.
First
gplot(dat=msl2.data, aes(Speed, Hit)) +
geom_point() +
geom_smooth(method="glm", method.args=list(family="binomial"),
fullrange=TRUE, se=FALSE) +
xlim(0, 700)
fullrange is specified so the prediction lines covers the x-range. xlim extends the x-axis.
Second
#Create prediction dataframe
pred <- data.frame(Speed=0:700, pi.hat)
ggplot() +
# prediction line
geom_line(data=pred, aes(Speed, pi.hat)) +
# points - note different dataframe is used
geom_point(dat=msl2.data, aes(Speed, Hit))
I generally prefer to do the modelling outside (second approach), and use ggplot purely as a plotting mechanism.

Adjusting a confidence interval in ggplot while maintaining the default plot margin buffers

For the sake of data access, let us use the in-built JohnsonJohnson dataset:
dat <- JohnsonJohnson
df <- data.frame(date = time(dat), Y = as.matrix(dat))
Now to plot the time series with a 99 percent confidence interval:
p1 <- ggplot(df, aes(date, Y)) +
geom_point() +
geom_smooth(level = 0.99) +
theme_bw()
This is close to what I want, except that it is nonsensical to have the confidence interval go below zero.
The recommended remedy is to use coord_cartesian() to set the limits of the plotting area like so:
max <- ggplot_build(p1)$panel$ranges[[1]]$y.range[2]
p2 <- p1 + coord_cartesian(ylim = c(0, max))
However, I do not want the minimum value of the plot margins to be where y = 0. I like having the default buffer of space separating the most extreme values from the edges of the plot margins on all four sides. You can see this in p1 (the plot before adding the coord_cartesian() argument), but only on three sides in p2.
So in short, I'd like to keep the confidence interval flattened where y = 0 (as coord_cartesian() does) without removing any of the underlying data (as scale_y_continuous() would do), but while maintaining the default plot margin buffers that p1 has.
If it is helpful to know, the default plotting range is 10% greater than the range of plotted objects (i.e. the maximum range of all points and confidence intervals) for each dimension.
Here is a solution by editing the data used to plot the confidence intervals (the method to extract the data has been borrowed from Drawing only boundaries of stat_smooth in ggplot2)
First, we create our normal plot:
p1 <- ggplot(df, aes(date, Y)) +
geom_point() +
geom_smooth(level = 0.99) +
theme_bw()
Then, we extract the data for the smoother and edit the ymin-variable
smooth_data <- ggplot_build(p1)$data[[2]]
smooth_data$ymin[smooth_data$ymin<0] <- 0
Then, we create a new plot using these data:
p2 <- ggplot(df,aes(date, Y) )+
geom_point() +
geom_smooth(se=F)+
geom_ribbon(data=smooth_data, aes(x=x,ymin=ymin,ymax=ymax),col="grey60",alpha=0.4,inherit.aes=F) +
theme_bw()

R - ggplot2 change x-axis values to non-log values

I am plotting some payment distribution information and I aggregated the data after scaling it to log-normal (base-e). The histograms turn out great but I want to modify the x-axis to display the non-log equivalents.
My current axis displays [0:2.5:10] values
Alternatively, I would like to see values for exp(2.5), exp(5), etc.
Any suggestions on how to accomplish this? Anything I can add to my plotting statement to scale the x-axis values? Maybe there's a better approach - thoughts?
Current code:
ggplot(plotData, aes_string(pay, fill = pt)) + geom_histogram(bins = 50) + facet_wrap(~M_P)
Answered...Final plot:
Not sure if this is exactly what you are after but you can change the text of the x axis labels to whatever you want using scale_x_continuous.
Here's without:
ggplot(data = cars) + geom_histogram(aes(x = speed), binwidth = 1)
Here's with:
ggplot(data = cars) + geom_histogram(aes(x = speed), binwidth = 1) +
scale_x_continuous(breaks=c(5,10,15,20,25), labels=c(exp(5), exp(10), exp(15), exp(20), exp(25)))

Resources