How to hide the undrlying data in a ggplot scatterplot - r

I am trying to make a scatter plot in ggplot and would like to loose the points and show only the smooth line and the confidence interval. I checked geom_point() function and there is no option for turning it off such that the points/underlying data is hidden. Any suggestions? much appreciated.
Joseph

To plot smooth line with confidence interval around the line you should use geom_smooth(). This will smoothed line using loess if there are less than 1000 observations and gam if more. But you can change smoothing method with argument method=.
ggplot(mtcars,aes(wt,mpg))+geom_smooth()

Related

how to create a wormplot (detrended Q-Q plot) in ggplot?

Q-Q plot is a useful graphical device used to check for example normality of residuals. Q-Q plot is constructed by putting theoretical quantiles on x-axis and observed quantiles on the y-axis. In ggplot, this can be easily done using geom_qq and stat_qq. I would like to produce a wormplot, which is like a Q-Q plot, but on the y-axis, it has a difference between theoretical and observed quantiles (see the figure).
Is there a way to do this in ggplot? For example, is there a simple way to change the y-axis of the geom_qq to show the difference between theoretical and observed quantiles? I know it should be possible to calculate observed quantiles manually, but this would not work well if I would like to create plots of multiple groups or using facets, since then I would also need to calculate the observed quantiles manually for each group separately.
blogpost mentioned in comments contains a guide to code your own statfunctions to create such plots yourself
Otherwise, library qqplotr https://aloy.github.io/qqplotr/index.html contains an option detrend=True which basically produce wormplots with accompanying confidence bands.
If you want lines, and not a band, just do fill=NA, color='black', size=0.5

Remove grey background confidence interval from forecasting plot

I created the forecasting plot with the point forecast and confidence interval. However, i only want the point forecast(blue line) without the confidence interval(the grey background). How do i do that? Below is my current code and the screenshot of my plot.
plot(snv.data$mean,main="Forecast for monthly Turnover in Food
Retailing",xlab="Years",ylab="$ million",+ geom_smooth(se=FALSE))
Currently it seems to me that you try a mixture between the base function plot and the ggplot2 function geom_smooth. I don't think it is a very good idea in this case.
Since you want to use geom_smooth why not try to do it all with `ggplot2'?
Here is how you would do it with ggplot2 ( I used R-included airmiles data as example data)
library(ggplot2)
data = data.frame("Years"=seq(1937,1960,1),"Miles"=airmiles) #Creating a sample dataset
ggplot(data,aes(x=Years,y=Miles))+
geom_point()+
geom_smooth(se=F)
With ggplot you can set options like your x and y variables once and for all in the aes() of your ggplot() call, which is the reason why I didnt need any aes() call for geom_point().
Then I add the smoother function geom_smooth(), with option se=F to remove the confidence interval

In R and ggplot2 package, How to Add Lines?

I am doing a survival analysis and have produced the survival graph using
plot function to plot the Kaplan-Meier(KA variable) estimate as y value against time.
lines function to plot the step lines between estimate i and i+1, for each i=1,2,....
The code is as follows:
plot(KA)
for( i in 1:(length(KA)-1)){
lines(c(i,i+1),c(KA[i],KA[i])) # The horizontal step lines
lines(c(i+1,i+1),c(KA[i],KA[i+1])) # The vertical step lines
}
Now I want to make a more beautiful survival graph using ggplot2 package.
The question is: how to add the step lines into the graph?
I'm sorry, I can not put graphs as my reputation is less than 10.
Have a look at either geom_step, geom_path or geom_segment.
They might be helpful in what you are trying to achieve.
http://had.co.nz/ggplot2/geom_step.html
http://had.co.nz/ggplot2/geom_path.html
http://had.co.nz/ggplot2/geom_segment.html

Plot histogram and density function curve on one chart

I have a density function f, and I do MCMC sampling for it. To evaluate the goodness of the sampling, I need to plot the hist and curve within the same chart. The problem of
hist(samples);
curve(dfun,add=TRUE);
is that they are on the different scale: the frequency of a certain bin is usually hundreds, while the maximum of a density function is about 1 or so. What I want to do is to configure two plots at the same height, with one y-axis on the left and the other on the right. Can anyone help? Thank you.
Use the prob=TRUE argument to hist:
hist(samples, prob=TRUE)
curve(dfun,add=TRUE)
Also see this SO question

Problem with axis limits when plotting curve over histogram [duplicate]

This question already has an answer here:
How To Avoid Density Curve Getting Cut Off In Plot
(1 answer)
Closed 6 years ago.
newbie here. I have a script to create graphs that has a bit that goes something like this:
png(Test.png)
ht=hist(step[i],20)
curve(insert_function_here,add=TRUE)
I essentially want to plot a curve of a distribution over an histogram. My problem is that the axes limits are apparently set by the histogram instead of the curve, so that the curve sometimes gets out of the Y axis limits. I have played with par("usr"), to no avail. Is there any way to set the axis limits based on the maximum values of either the histogram or the curve (or, in the alternative, of the curve only)?? In case this changes anything, this needs to be done within a for loop where multiple such graphs are plotted and within a series of subplots (par("mfrow")).
Inspired by other answers, this is what i ended up doing:
curve(insert_function_here)
boundsc=par("usr")
ht=hist(A[,1],20,plot=FALSE)
par(usr=c(boundsc[1:2],0,max(boundsc[4],max(ht$counts))))
plot(ht,add=TRUE)
It fixes the bounds based on the highest of either the curve or the histogram.
You could determine the mx <- max(curve_vector, ht$counts) and set ylim=(0, mx), but I rather doubt the code looks like that since [] is not a proper parameter passing idiom and step is not an R plotting function, but rather a model selection function. So I am guessing this is code in Matlab or some other idiom. In R, try this:
set.seed(123)
png("Test.png")
ht=hist(rpois(20,1), plot=FALSE, breaks=0:10-0.1)
# better to offset to include discrete counts that would otherwise be at boundaries
plot(round(ht$breaks), dpois( round(ht$breaks), # plot a Poisson density
mean(ht$counts*round(ht$breaks[-length(ht$breaks)]))),
ylim=c(0, max(ht$density)+.1) , type="l")
plot(ht, freq=FALSE, add=TRUE) # plot the histogram
dev.off()
You could plot the curve first, then compute the histogram with plot=FALSE, and use the plot function on the histogram object with add=TRUE to add it to the plot.
Even better would be to calculate the the highest y-value of the curve (there may be shortcuts to do this depending on the nature of the curve) and the highest bar in the histogram and give this value to the ylim argument when plotting the histogram.

Resources