I have created a series of marginal plots looking at two variables using ggMarginal and ggplot2. However, I want to include an axis on the histogram that has the scale so I know roughly how many values fall into each bin.
I have already checked the documentation for ggMarginal and am at a loss.
Related
Q-Q plot is a useful graphical device used to check for example normality of residuals. Q-Q plot is constructed by putting theoretical quantiles on x-axis and observed quantiles on the y-axis. In ggplot, this can be easily done using geom_qq and stat_qq. I would like to produce a wormplot, which is like a Q-Q plot, but on the y-axis, it has a difference between theoretical and observed quantiles (see the figure).
Is there a way to do this in ggplot? For example, is there a simple way to change the y-axis of the geom_qq to show the difference between theoretical and observed quantiles? I know it should be possible to calculate observed quantiles manually, but this would not work well if I would like to create plots of multiple groups or using facets, since then I would also need to calculate the observed quantiles manually for each group separately.
blogpost mentioned in comments contains a guide to code your own statfunctions to create such plots yourself
Otherwise, library qqplotr https://aloy.github.io/qqplotr/index.html contains an option detrend=True which basically produce wormplots with accompanying confidence bands.
If you want lines, and not a band, just do fill=NA, color='black', size=0.5
I am creating several boxplots in ggplot2 with a log10 scale using
coord_trans(y="log10")
It is important that only the scale and not the data itself is log-transformed. One data set includes zero values, which is creating -inf values so that the boxplot cannot be drawn on a log-transformed scale.
I have tried to use
scale_y_continuous(trans=pseudo_log_trans(base=10))
However, this makes changes to the data instead of the scale. Outliers of the boxplot change and the boxplot stats extracted through ggplot_build(examplefig)$data are different from the original data.
Is there any way to create a boxplot in ggplot2 with a log10 scale and data including zero values? There should be no transformation of the data itself and outliers should be displayed like in the boxplot with the original data.
This is the very first question I ask here and I am new to R, so I hope the question is clear.
data <- data.frame(x=c(1,2,3,4,5,6,7,8,9,10),y1=c(-2,-7,-12,-17,-22,-27,-32,-37,-42,-47),y2=c(1003,2003,3003,4003,5003,6003,7003,8003,9003,10003),y3=c(-3,-6,-9,-12,-15,-18,-21,-24,-27,-30))
I want to draw three scatter plots in ggplot2, with the same independent variable(x) and three different dependent variables(y1,y2,y3). Then add continuous geom_vline(geom_vline(xintercept=3.2)) to them like the following picture, but I don't know how to do it. Is there anyone can help me?
I am searching and trying the following plot in R for ages, but nothing seems to work.
What I want is a quantitative variable in the Y axis and a categorical variable in the X axis, and just an horizontal histogram (of the Y variable) for each category.
I couldn't find a package that does this. Any suggestions?
This question already has an answer here:
How To Avoid Density Curve Getting Cut Off In Plot
(1 answer)
Closed 6 years ago.
newbie here. I have a script to create graphs that has a bit that goes something like this:
png(Test.png)
ht=hist(step[i],20)
curve(insert_function_here,add=TRUE)
I essentially want to plot a curve of a distribution over an histogram. My problem is that the axes limits are apparently set by the histogram instead of the curve, so that the curve sometimes gets out of the Y axis limits. I have played with par("usr"), to no avail. Is there any way to set the axis limits based on the maximum values of either the histogram or the curve (or, in the alternative, of the curve only)?? In case this changes anything, this needs to be done within a for loop where multiple such graphs are plotted and within a series of subplots (par("mfrow")).
Inspired by other answers, this is what i ended up doing:
curve(insert_function_here)
boundsc=par("usr")
ht=hist(A[,1],20,plot=FALSE)
par(usr=c(boundsc[1:2],0,max(boundsc[4],max(ht$counts))))
plot(ht,add=TRUE)
It fixes the bounds based on the highest of either the curve or the histogram.
You could determine the mx <- max(curve_vector, ht$counts) and set ylim=(0, mx), but I rather doubt the code looks like that since [] is not a proper parameter passing idiom and step is not an R plotting function, but rather a model selection function. So I am guessing this is code in Matlab or some other idiom. In R, try this:
set.seed(123)
png("Test.png")
ht=hist(rpois(20,1), plot=FALSE, breaks=0:10-0.1)
# better to offset to include discrete counts that would otherwise be at boundaries
plot(round(ht$breaks), dpois( round(ht$breaks), # plot a Poisson density
mean(ht$counts*round(ht$breaks[-length(ht$breaks)]))),
ylim=c(0, max(ht$density)+.1) , type="l")
plot(ht, freq=FALSE, add=TRUE) # plot the histogram
dev.off()
You could plot the curve first, then compute the histogram with plot=FALSE, and use the plot function on the histogram object with add=TRUE to add it to the plot.
Even better would be to calculate the the highest y-value of the curve (there may be shortcuts to do this depending on the nature of the curve) and the highest bar in the histogram and give this value to the ylim argument when plotting the histogram.