I'm currently trying to plot a histogram with an overlay (given by my_fun) using the following code.
dfr = data.frame(x)
ggplot(dfr,aes(x)) + geom_histogram(colour="darkblue", binwidth = 0.1,
aes(y =..density..), size=1, fill="blue", freq = TRUE)+
stat_function(fun = my_fun, colour = "red")
The x-axis in ggplot is from 1 to 2 (which is the range of my data). However, I would like my plot to have an x-axis from 0 to 3, so that the overlay can be drawn over the range (0, 3).
I've tried adding coord_cartesian(xlim=c(0, 3)) but this does not work. Could you please provide me with some suggestions on changing the range? Thank You.
Just guessing here since you provided only a little useful information in your question, but this works for me:
dat <- data.frame(x=rnorm(100))
ggplot(dat,aes(x=x)) +
geom_histogram(aes(y=..density..),freq=TRUE) +
stat_function(fun = dnorm, colour="red") +
xlim(c(-4,4))
using xlim rather than coord_cartesian. But since you haven't provided any details on your data or function, I can't assure you that this will work in your case.
Related
I'm using visual studio with R version 3.5.1 where I tried to plot legend to the graph.
f1 = function(x) {
return(x+1)}
x1 = seq(0, 1, by = 0.01)
data1 = data.frame(x1 = x1, f1 = f1(x1), F1 = cumtrapz(x1, f1(x1)) )
However, when I tried to plot it, it never give me a legend!
For example, I used the same code in this (Missing legend with ggplot2 and geom_line )
ggplot(data = data1, aes(x1)) +
geom_line(aes(y = f1), color = "1") +
geom_line(aes(y = F1), color = "2") +
scale_color_manual(values = c("red", "blue"))
I also looked into (How to add legend to ggplot manually? - R
) and many other websites in stackoverflo, and I have tried every single function in https://www.rstudio.com/wp-content/uploads/2016/11/ggplot2-cheatsheet-2.1.pdf
i.e.
theme(legend.position = "bottom")
scale_fill_discrete(...)
group
guides()
show.legend=TRUE
I even tried to use the original plot() and legend() function. Neither worked.
I thought there might be something wrong with the dataframe, but I split them(x2,f1,F1) apart, it still didn't work.
I thought there might be something wrong with IDE, but the code given by kohske acturally plotted legend!
d<-data.frame(x=1:5, y1=1:5, y2=2:6)
ggplot(d, aes(x)) +
geom_line(aes(y=y1, colour="1")) +
geom_line(aes(y=y2, colour="2")) +
scale_colour_manual(values=c("red", "blue"))
What's wrong with the code?
As far as I know, you only have X and Y variables in your aesthetics. Therefore there is no need for a legend. You have xlab and ylab to describe your two lines. If you want to have legends, you should put the grouping in the aesthetics, which might require recoding your dataset
d<- data.frame(x=c(1:5, 1:5), y=c(1:5, 2:6), colorGroup = c(rep("redGroup", 5),
rep("blueGroup", 5)))
ggplot(d, aes(x, y, color = colorGroup )) + geom_line()
This should give you two lines and a legend
I would like to create a graph with the normal function from x=-2 to x=2 filled under the curve from -2 to 0.
I've tried with ggplot2
qplot(c(-2, 2), stat="function", fun=dnorm, geom="line") +
+ geom_area(aes(xlim=c(-2,0)),stat="function", fun=dnorm)
But I get this graph completely filled instead (the black colour)
How can I get a plot filled only from -2 to 0?
Other options or packages are welcome.
I've also tried with only one command with ggplot and filled option but I can't get it either.
I know some people does it using polygons but the result is not so soft and nice.
PD: I repeat, the solution I'm looking for involves not generating x,y coordinates beforehand but using directly the function with stat="function", fun=dnorm or similar. Thus, my question is not a duplicate.
I've also tried
ggplot(NULL,aes(x=c(-2,2))) + geom_area(aes(x=c(-2,0)),stat="function", fun=dnorm, fill="red") +
geom_area(aes(x=c(0,2)),stat="function", fun=dnorm, fill="blue")
But again it fills all the curve with a single color, blue. The red half seems to be overwritten. The same with geom_ribbon and other options.
Try this:
ggplot(data.frame(x = c(-2, 2)), aes(x)) +
stat_function(fun = dnorm) +
stat_function(fun = dnorm,
xlim = c(-2,0),
geom = "area")
Can't you generate your distribution data with dnorm instead?
library(ggplot2)
x<-seq(-2,2, 0.01)
y<-dnorm(x,0,1)
xddf <- data.frame(x=x,y=y)
qplot(x,y,data=xddf,geom="line")+
geom_ribbon(data=subset(xddf ,x>-2 & x<0),aes(ymax=y),ymin=0,
fill="red",colour=NA,alpha=0.5)+
scale_y_continuous(limits=c(0, .4))
These days, with after_stat() and after_scale(), you could also use
a more flexible approach that lets you explicitly map ranges of x values
to filled sections.
For example, filling some normal distribution quantiles:
library(ggplot2)
breaks <- qnorm(c(0, .05, .2, .5, .8, .95, 1))
ggplot(data.frame(x = c(-2, 2)), aes(x)) +
scale_fill_brewer("x") +
stat_function(
n = 512,
fun = dnorm,
geom = "area",
colour = "gray30",
aes(
fill = after_stat(x) |> cut(!!breaks),
group = after_scale(fill)
)
)
This approach also works with other statistics, e.g. stat_density() for kernel density estimates:
set.seed(42)
ggplot(data.frame(x = rnorm(1000)), aes(x)) +
scale_fill_brewer("x") +
stat_density(
n = 512,
geom = "area",
colour = "gray30",
aes(
fill = after_stat(x) |> cut(!!breaks),
group = after_scale(fill)
)
)
I was looking to plot a normal distribution in ggplot, and at the suggestion of #nrussell I have used
ggplot(data.frame(x = c(-5, 5)), aes(x)) + stat_function(fun = dnorm)
I am wondering if there is any way to, within the context of stat_function, layer a single colored point directly onto the curve. For example, if I wanted to put a dot where the x axis is marked 2.
I have experimented with geom_point but this appears to be better at creating scatterplots: I can't seem to pipe in the aesthetics from the stat_function for the layer be created.
Any advice would be greatly appreciated.
There may be a way to do this with another stat_function layer, but after playing around with it for a few minutes it seemed easier to just use geom_point to add a single point:
library(ggplot2)
##
ggplot(
data.frame(x = c(-5, 5)),
aes(x))+
stat_function(fun = dnorm)+
geom_point(
data=data.frame(x=2,y=dnorm(2)),
aes(x,y),
color="red",
size=4)
##
Use annotate, and just specify x = 2, y = dnorm(2). Rather than trying to pull info out of stat_function()
ggplot(data.frame(x = c(-5, 5)), aes(x)) +
stat_function(fun = dnorm) +
annotate(geom = "point", x = 2, y = dnorm(2), color = "red")
Annotate is best for small additions. To use geom_point() you'd want to define a new data.frame, good if you wanted to plot more than one point.
I am trying to improve the clarity and aspect of a histogram of discrete values which I need to represent with a log scale.
Please consider the following MWE
set.seed(99)
data <- data.frame(dist = as.integer(rlnorm(1000, sdlog = 2)))
class(data$dist)
ggplot(data, aes(x=dist)) + geom_histogram()
which produces
and then
ggplot(data, aes(x=dist)) + geom_line() + scale_x_log10(breaks=c(1,2,3,4,5,10,100))
which probably is even worse
since now it gives the impression that the something is missing between "1" and "2", and also is not totally clear which bar has value "1" (bar is on the right of the tick) and which bar has value "2" (bar is on the left of the tick).
I understand that technically ggplot provides the "right" visual answer for a log scale. Yet as observer I have some problem in understanding it.
Is it possible to improve something?
EDIT:
This what happen when I applied Jaap solution to my real data
Where do the dips between x=0 and x=1 and between x=1 and x=2 come from? My value are discrete, but then why the plot is also mapping x=1.5 and x=2.5?
The first thing that comes to mind, is playing with the binwidth. But that doesn't give a great solution either:
ggplot(data, aes(x=dist)) +
geom_histogram(binwidth=10) +
scale_x_continuous(expand=c(0,0)) +
scale_y_continuous(expand=c(0.015,0)) +
theme_bw()
gives:
In this case it is probably better to use a density plot. However, when you use scale_x_log10 you will get a warning message (Removed 524 rows containing non-finite values (stat_density)). This can be resolved by using a log plus one transformation.
The following code:
library(ggplot2)
library(scales)
ggplot(data, aes(x=dist)) +
stat_density(aes(y=..count..), color="black", fill="blue", alpha=0.3) +
scale_x_continuous(breaks=c(0,1,2,3,4,5,10,30,100,300,1000), trans="log1p", expand=c(0,0)) +
scale_y_continuous(breaks=c(0,125,250,375,500,625,750), expand=c(0,0)) +
theme_bw()
will give this result:
I am wondering, what if, y-axis is scaled instead of x-axis. It will results into few warnings wherever values are 0, but may serve your purpose.
set.seed(99)
data <- data.frame(dist = as.integer(rlnorm(1000, sdlog = 2)))
class(data$dist)
ggplot(data, aes(x=dist)) + geom_histogram() + scale_y_log10()
Also you may want to display frequencies as data labels, since people might ignore the y-scale and it takes some time to realize that y scale is logarithmic.
ggplot(data, aes(x=dist)) + geom_histogram(fill = 'skyblue', color = 'grey30') + scale_y_log10() +
stat_bin(geom="text", size=3.5, aes(label=..count.., y=0.8*(..count..)))
A solution could be to convert your data to a factor:
library(ggplot2)
set.seed(99)
data <- data.frame(dist = as.integer(rlnorm(1000, sdlog = 2)))
ggplot(data, aes(x=factor(dist))) +
geom_histogram(stat = "count") +
theme(axis.text.x = element_text(angle = 90, hjust = 1))
Resulting in:
I had the same issue and, inspired by #Jaap's answer, I fiddled with the histogram binwidth using the x-axis in log scale.
If you use binwidth = 0.201, the bars will be juxtaposed as expected. However, this means you can only have up to five bars between two x coordinates.
set.seed(99)
data <- data.frame(dist = as.integer(rlnorm(1000, sdlog = 2)))
class(data$dist)
ggplot(data, aes(x=dist)) +
geom_histogram(binwidth = 0.201, color = 'red') +
scale_x_log10()
Result:
I've become quite fond of boxplots in which jittered points are overlain over the boxplots to represent the actual data, as below:
set.seed(7)
l1 <- gl(3, 1, length=102, labels=letters[1:3])
l2 <- gl(2, 51, length=102, labels=LETTERS[1:2]) # Will use this later
y <- runif(102)
d <- data.frame(l1, l2, y)
ggplot(d, aes(x=l1, y=y)) +
geom_point(position=position_jitter(width=0.2), alpha=0.5) +
geom_boxplot(fill=NA)
(These are particularly helpful when there are very different numbers of data points in each box.)
I'd like to use this technique when I am also (implicitly) using position_dodge to separate boxplots by a second variable, e.g.
ggplot(d, aes(x=l1, y=y, colour=l2)) +
geom_point(position=position_jitter(width=0.2), alpha=0.5) +
geom_boxplot(fill=NA)
However, I can't figure out how to dodge the points by the colour variable (here, l2) and also jitter them.
Here is an approach that manually performs the jittering and dodging.
# a plot with no dodging or jittering of the points
dp <- ggplot(d, aes(x=l1, y=y, colour=l2)) +
geom_point(alpha=0.5) +
geom_boxplot(fill=NA)
# build the plot for rendering
foo <- ggplot_build(dp)
# now replace the 'x' values in the data for layer 1 (unjittered and un-dodged points)
# with the appropriately dodged and jittered points
foo$data[[1]][['x']] <- jitter(foo$data[[2]][['x']][foo$data[[1]][['group']]],amount = 0.2)
# now draw the plot (need to explicitly load grid package)
library(grid)
grid.draw(ggplot_gtable(foo))
# note the following works without explicitly loading grid
plot(ggplot_gtable(foo))
I don't think you'll like it, but I've never found a way around this except to produce your own x values for the points. In this case:
d$l1.num <- as.numeric(d$l1)
d$l2.num <- (as.numeric(d$l2)/3)-(1/3 + 1/6)
d$x <- d$l1.num + d$l2.num
ggplot(d, aes(l1, y, colour = l2)) + geom_boxplot(fill = NA) +
geom_point(aes(x = x), position = position_jitter(width = 0.15), alpha = 0.5) + theme_bw()
It's certainly a long way from ideal, but becomes routine pretty quickly. If anyone has an alternative solution, I'd be very happy!
The new position_jitterdodge() works for this. However, it requires the fill aesthetic to tell it how to group points, so you have to specify a manual fill to get uncolored boxes:
ggplot(d, aes(x=l1, y=y, colour=l2, fill=l2)) +
geom_point(position=position_jitterdodge(width=0.2), alpha=0.5) +
geom_boxplot() + scale_fill_manual(values=rep('white', length(unique(l2))))
I'm using a newer version of ggplot2 (ggplot2_2.2.1.9000) and I was struggling to find an answer that worked for a similar plot of my own. #John Didon's answer produced an error for me; Error in position_jitterdodge(width = 0.2) : unused argument (width = 0.2). I had previous code that worked with geom_jitter that stopped working after downloading the newer version of ggplot2. This is how I solved it below - minimal-fuss code....
ggplot(d, aes(x=l1, y=y, colour=l2, fill=l2)) +
geom_point(position = position_jitterdodge(dodge.width = 1,
jitter.width = 0.5), alpha=0.5) +
geom_boxplot(position = position_dodge(width = 1), fill = NA)
Another option would be to use facets:
set.seed(7)
l1 <- gl(3, 1, length=102, labels=letters[1:3])
l2 <- gl(2, 51, length=102, labels=LETTERS[1:2]) # Will use this later
y <- runif(102)
d <- data.frame(l1, l2, y)
ggplot(d, aes(x=l1, y=y, colour=l2)) +
geom_point(position=position_jitter(width=0.2), alpha=0.5) +
geom_boxplot(fill=NA) +
facet_grid(.~l2) +
theme_bw()
Sorry, donĀ“t have enough points to post the resulting graph.