I have question probably similar to Fitting a density curve to a histogram in R. Using qplot I have created 7 histograms with this command:
(qplot(V1, data=data, binwidth=10, facets=V2~.)
For each slice, I would like to add a fitting gaussian curve. When I try to use lines() method, I get error:
Error in plot.xy(xy.coords(x, y), type = type, ...) :
plot.new has not been called yet
What is the command to do it correctly?
Have you tried stat_function?
+ stat_function(fun = dnorm)
You'll probably want to plot the histograms using aes(y = ..density..) in order to plot the density values rather than the counts.
A lot of useful information can be found in this question, including some advice on plotting different normal curves on different facets.
Here are some examples:
dat <- data.frame(x = c(rnorm(100),rnorm(100,2,0.5)),
a = rep(letters[1:2],each = 100))
Overlay a single normal density on each facet:
ggplot(data = dat,aes(x = x)) +
facet_wrap(~a) +
geom_histogram(aes(y = ..density..)) +
stat_function(fun = dnorm, colour = "red")
From the question I linked to, create a separate data frame with the different normal curves:
grid <- with(dat, seq(min(x), max(x), length = 100))
normaldens <- ddply(dat, "a", function(df) {
data.frame(
predicted = grid,
density = dnorm(grid, mean(df$x), sd(df$x))
)
})
And plot them separately using geom_line:
ggplot(data = dat,aes(x = x)) +
facet_wrap(~a) +
geom_histogram(aes(y = ..density..)) +
geom_line(data = normaldens, aes(x = predicted, y = density), colour = "red")
ggplot2 uses a different graphics paradigm than base graphics. (Although you can use grid graphics with it, the best way is to add a new stat_function layer to the plot. The ggplot2 code is the following.
Note that I couldn't get this to work using qplot, but the transition to ggplot is reasonably straighforward, the most important difference is that your data must be in data.frame format.
Also note the explicit mapping of the y aesthetic aes=aes(y=..density..)) - this is slighly unusual but takes the stat_function results and maps it to the data:
library(ggplot2)
data <- data.frame(V1 <- rnorm(700), V2=sample(LETTERS[1:7], 700, replace=TRUE))
ggplot(data, aes(x=V1)) +
stat_bin(aes(y=..density..)) +
stat_function(fun=dnorm) +
facet_grid(V2~.)
Related
I have the following code:
mapping <- aes(
x = values
, color = factor(par_a)
)
plot <- (ggplot(data=data, mapping=mapping)
+ geom_histogram(binwidth = 5, na.rm = TRUE)
+ facet_grid(par_b ~ par_c ~ par_d, scales = "free")
)
Since I am asked to use instead hist() because of the possibility to use plot=FALSE, now I want to adjust the code.
mapping <- aes(
x = values
, color = factor(par_a)
)
plot2 <- hist(values, breaks = seq(min(values), max(values)+5, by = 5))
+ facet_grid(par_b ~ par_c ~ par_d, scales = "free")
However, I have no idea how to implement the 'color = factor(par_a)' or the whole line 'facet_grid(par_b ~ par_c ~ par_d, scales = "free")'. I guess these functions are not explicitly supported for 'hist()', but I would really appreciate it if someone could tell me what the alternatives for them would be?
the base plotting can use the function par(mfrow=c(num_rows,num_cols)) in order to build subplots. The next plot, hist, etc. calls that you make will fill the desired subplots. and to color your bars within the plots you can make a variable as described here to pass to the color parameter of hist.
I need to create "two plots" in "one plot" with ggplot. I managed to do it with base R as follows:
x=rnorm(10)
y=rnorm(10)*20+100
plot(1:10,rev(sort(x)),cex=2,col='red',ylim=c(0,2.2))
segments(x0=1:10, x1=1:10, y0=1.8,y1=1.8+y/max(y)*.2,lwd=3,col='dodgerblue')
However, I am struggling with ggplot, how can it be done?
Here's one possible translation of that code.
ggplot(data.frame(idx=seq_along(x), x,y)) +
geom_point(aes(idx, rev(sort(x))), col="red") +
geom_segment(aes(x=idx, xend=idx, y=1.8, yend=1.8+y/max(y)*.2), color="dodgerblue")
In general with ggplot2, you can add multiple views of data to a plot by adding additional layers (geoms)
My solution is similar to #MrFlick.
I would always recommend having a plot data frame and referring to the variables from there as you can more easily relate variables to plot aesthetics.
library(tidyverse)
plot_df <- data.frame(x, y) %>%
arrange(-x) %>%
mutate(id = 1:10)
ggplot(plot_df) +
geom_point(aes(id, x), color = "red", pch = 1, size = 5) +
geom_segment(aes(x = id, xend = id, y = 1.8, yend = 1.8+y/max(y)*.2),
lwd = 2, color = 'dodgerblue') +
scale_y_continuous(limits = c(0,2.2)) +
theme_light()
Ultimately, the goal of ggplot is to add aesthetics (in this case, the points and the segments) to form the final plot.
If you'd like to learn more, check out the ggplot cheat sheet and read more on the ideas behind ggplot: https://ggplot2.tidyverse.org/
I have a vector of sample means and I've been tying to plot a probability histogram using hist(x) and ggplot but the bins exceed 1(which is very unusual for a probability distribution),I then used a PlotRelativeFrequency(hist(x)) function to force R to plot a histogram of probabilities,It worked! but My problem is,I cannot plot a density function over the histogram.When I used the lines(density(x)) function it plots a density function that goes way off the graph.
Since your question is tagged with ggplot, I'll give a ggplot answer.
To make histograms relative you have to set aes(y = stat(density)) such that it integrates to 1. Then, you could give the stat_function() the relevant density function for any theoretical distribution. The downside is that you'll have to pre-compute the parameters.
df <- data.frame(x = rnorm(500, 10, 2))
pars <- list(mean = mean(df$x), sd = sd(df$x))
library(ggplot2)
ggplot(df, aes(x)) +
geom_histogram(binwidth = 1, aes(y = stat(density))) +
stat_function(fun = function(x) {dnorm(x, mean = pars$mean, sd = pars$sd)})
Next up, we can plot the empirical density using kernel density estimates, which does everything pretty much automatically:
ggplot(df, aes(x)) +
geom_histogram(binwidth = 1, aes(y = stat(density))) +
geom_density()
Lastly, you can have a look at this stats function, that essentially automates the first version. Full disclaimer: I'm the author of that github repo.
library(ggnomics)
ggplot(df, aes(x)) +
geom_histogram(binwidth = 1, aes(y = stat(density))) +
stat_theodensity()
I'm quite new to ggplot but I like the systematic way how you build your plots. Still, I'm struggeling to achieve desired results. I can replicate plots where you have categorical data. However, for my use I often need to fit a model to certain observations and then highlight them in a combined plot. With the usual plot function I would do:
library(splines)
set.seed(10)
x <- seq(-1,1,0.01)
y <- x^2
s <- interpSpline(x,y)
y <- y+rnorm(length(y),mean=0,sd=0.1)
plot(x,predict(s,x)$y,type="l",col="black",xlab="x",ylab="y")
points(x,y,col="red",pch=4)
points(0,0,col="blue",pch=1)
legend("top",legend=c("True Values","Model values","Special Value"),text.col=c("red","black","blue"),lty=c(NA,1,NA),pch=c(4,NA,1),col=c("red","black","blue"),cex = 0.7)
My biggest problem is how to build the data frame for ggplot which automatically then draws the legend? In this example, how would I translate this into ggplot to get a similar plot? Or is ggplot not made for this kind of plots?
Note this is just a toy example. Usually the model values are derived from a more complex model, just in case you wante to use a stat in ggplot.
The key part here is that you can map colors in aes by giving a string, which will produce a legend. In this case, there is no need to include the special value in the data.frame.
df <- data.frame(x = x, y = y, fit = predict(s, x)$y)
ggplot(df, aes(x, y)) +
geom_line(aes(y = fit, col = 'Model values')) +
geom_point(aes(col = 'True values')) +
geom_point(aes(col = 'Special value'), x = 0, y = 0) +
scale_color_manual(values = c('True values' = "red",
'Special value' = "blue",
'Model values' = "black"))
is there a way in ggplot2 to get the plot type "b"? See example:
x <- c(1:5)
y <- x
plot(x,y,type="b")
Ideally, I want to replace the points by their values to have something similar to this famous example:
EDIT:
Here some sample data (I want to plot each "cat" in a facet with plot type "b"):
df <- data.frame(x=rep(1:5,9),y=c(0.02,0.04,0.07,0.09,0.11,0.13,0.16,0.18,0.2,0.22,0.24,0.27,0.29,0.31,0.33,0.36,0.38,0.4,0.42,0.44,0.47,0.49,0.51,0.53,0.56,0.58,0.6,0.62,0.64,0.67,0.69,0.71,0.73,0.76,0.78,0.8,0.82,0.84,0.87,0.89,0.91,0.93,0.96,0.98,1),cat=rep(paste("a",1:9,sep=""),each=5))
Set up the axes by drawing the plot without any content.
plot(x, y, type = "n")
Then use text to make your data points.
text(x, y, labels = y)
You can add line segments with lines.
lines(x, y, col = "grey80")
EDIT: Totally failed to clock the mention of ggplot in the question. Try this.
dfr <- data.frame(x = 1:5, y = 1:5)
p <- ggplot(dfr, aes(x, y)) +
geom_text(aes(x, y, label = y)) +
geom_line(col = "grey80")
p
ANOTHER EDIT: Given your new dataset and request, this is what you need.
ggplot(df, aes(x, y)) + geom_point() + geom_line() + facet_wrap(~cat)
YET ANOTHER EDIT: We're starting to approach a real question. As in 'how do you make the lines not quite reach the points'.
The short answer is that that isn't a standard way to do this in ggplot2. The proper way to do this would be to use geom_segment and interpolate between your data points. This is quite a lot of effort however, so I suggest an easier fudge: draw big white circles around your points. The downside to this is that it makes the gridlines look silly, so you'll have to get rid of those.
ggplot(df, aes(x, y)) +
facet_wrap(~cat) +
geom_line() +
geom_point(size = 5, colour = "white") +
geom_point() +
opts(panel.background = theme_blank())
There's an experimental grob in gridExtra to implement this in Grid graphics,
library(gridExtra)
grid.newpage() ; grid.barbed(pch=5)
This is now easy with ggh4x::geom_pointpath. Set shape = NA and add a geom_text layer.
library(ggh4x)
#> Loading required package: ggplot2
df <- data.frame(x = rep(1:5, each = 5),
y = c(outer(seq(0, .8, .2), seq(0.02, 0.1, 0.02), `+`)),
cat = rep(paste0("a", 1:5)))
ggplot(df, aes(x, y)) +
geom_text(aes(label = cat)) +
geom_pointpath(aes(group = cat, shape = NA))
Created on 2021-11-13 by the reprex package (v2.0.1)
Another way to make great slope graphs is using the package CGPfunctions.
library(CGPfunctions)
newggslopegraph(newcancer, Year, Survival, Type)
You have also many options to choose. You can find a good tutorial here:
https://www.r-bloggers.com/2018/06/creating-slopegraphs-with-r/