library("ggplot2")
eq = function(x){x^-1}
ggplot(data.frame(x=c(-6,6)), aes(x = x, y=eq(x)))+
geom_line(data=as.data.frame(curve(from=-6, to=-.01, eq)))+
geom_line(data=as.data.frame(curve(from=.01, to=6, eq)))
I am trying to produce a single plot, and this code gives me the plot I want, but with two additional plots, one with each geom_line. I don't understand why those additional two plots are being created.
In addition to my comment above, you don't need two separate calls to geom_line to produce this plot. You can use stat_function if you redefine your function as follows.
eq <- function(x) ifelse(x==0, NA,x^-1)
Then you can plot it as follows
df <- data.frame(x=seq(-6,6,.01))
ggplot(df) + stat_function(aes(x), fun = eq)
As #shayaa noted in the comments, curve itself generates plots, which is why you are getting the extra plots. To avoid this, you can just create a dataframe before you plot, and subset it in geom_line:
library("ggplot2")
eq = function(x){x^-1}
df <- data.frame(x =seq(-6, 6, 0.01), y = eq(seq(-6, 6, 0.01)))
ggplot(df) +
geom_line(data=subset(df, x<=-.01), aes(x = x, y = y)) +
geom_line(data=subset(df, x>=.01), aes(x = x, y = y))
Related
The grouping variable for creating a geom_violin() plot in ggplot2 is expected to be discrete for obvious reasons. However my discrete values are numbers, and I would like to show them on a continuous scale so that I can overlay a continuous function of those numbers on top of the violins. Toy example:
library(tidyverse)
df <- tibble(x = sample(c(1,2,5), size = 1000, replace = T),
y = rnorm(1000, mean = x))
ggplot(df) + geom_violin(aes(x=factor(x), y=y))
This works as you'd imagine: violins with their x axis values (equally spaced) labelled 1, 2, and 5, with their means at y=1,2,5 respectively. I want to overlay a continuous function such as y=x, passing through the means. Is that possible? Adding + scale_x_continuous() predictably gives Error: Discrete value supplied to continuous scale. A solution would presumably spread the violins horizontally by the numeric x values, i.e. three times the spacing between 2 and 5 as between 1 and 2, but that is not the only thing I'm trying to achieve - overlaying a continuous function is the key issue.
If this isn't possible, alternative visualisation suggestions are welcome. I know I could replace violins with a simple scatter plot to give a rough sense of density as a function of y for a given x.
The functionality to plot violin plots on a continuous scale is directly built into ggplot.
The key is to keep the original continuous variable (instead of transforming it into a factor variable) and specify how to group it within the aesthetic mapping of the geom_violin() object. The width of the groups can be modified with the cut_width argument, depending on the data at hand.
library(tidyverse)
df <- tibble(x = sample(c(1,2,5), size = 1000, replace = T),
y = rnorm(1000, mean = x))
ggplot(df, aes(x=x, y=y)) +
geom_violin(aes(group = cut_width(x, 1)), scale = "width") +
geom_smooth(method = 'lm')
By using this approach, all geoms for continuous data and their varying functionalities can be combined with the violin plots, e.g. we could easily replace the line with a loess curve and add a scatter plot of the points.
ggplot(df, aes(x=x, y=y)) +
geom_violin(aes(group = cut_width(x, 1)), scale = "width") +
geom_smooth(method = 'loess') +
geom_point()
More examples can be found in the ggplot helpfile for violin plots.
Try this. As you already guessed, spreading the violins by numeric values is the key to the solution. To this end I expand the df to include all x values in the interval min(x) to max(x) and use scale_x_discrete(drop = FALSE) so that all values are displayed.
Note: Thanks #ChrisW for the more general example of my approach.
library(tidyverse)
set.seed(42)
df <- tibble(x = sample(c(1,2,5), size = 1000, replace = T), y = rnorm(1000, mean = x^2))
# y = x^2
# add missing x values
x.range <- seq(from=min(df$x), to=max(df$x))
df <- df %>% right_join(tibble(x = x.range))
#> Joining, by = "x"
# Whatever the desired continuous function is:
df.fit <- tibble(x = x.range, y=x^2) %>%
mutate(x = factor(x))
ggplot() +
geom_violin(data=df, aes(x = factor(x, levels = 1:5), y=y)) +
geom_line(data=df.fit, aes(x, y, group=1), color = "red") +
scale_x_discrete(drop = FALSE)
#> Warning: Removed 2 rows containing non-finite values (stat_ydensity).
Created on 2020-06-11 by the reprex package (v0.3.0)
I have two probability distribution curves, a Gamma and a standarized Normal, that I need to compare:
library(ggplot2)
pgammaX <- function(x) pgamma(x, shape = 64.57849, scale = 0.08854802)
f <- ggplot(data.frame(x=c(-4, 9)), aes(x)) + stat_function(fun=pgammaX)
f + stat_function(fun = pnorm)
The output is like this
However I need to have the two curves separated by means of the faceting mechanism provided by ggplot2, sharing the Y axis, in a way like shown below:
I know how to do the faceting if the depicted graphics come from data (i.e., from a data.frame), but I don't understand how to do it in a case like this, when the graphics are generated on line by functions. Do you have any idea on this?
you can generate the data similar to what stat_function is doing ahead of time, something like:
x <- seq(-4,9,0.1)
dat <- data.frame(p = c(pnorm(x), pgammaX(x)), g = rep(c(0,1), each = 131), x = rep(x, 2) )
ggplot(dat)+geom_line(aes(x,p, group = g)) + facet_grid(~g)
The issue with doing facet_wrap is that the same stat_function is designed to be applied to each panel of the faceted variable which you don't have.
I would instead plot them separately and use grid.arrange to combine them.
f1 <- ggplot(data.frame(x=c(-4, 9)), aes(x)) + stat_function(fun = pgammaX) + ggtitle("Gamma") + theme(plot.title = element_text(hjust = 0.5))
f2 <- ggplot(data.frame(x=c(-4, 9)), aes(x)) + stat_function(fun = pnorm) + ggtitle("Norm") + theme(plot.title = element_text(hjust = 0.5))
library(gridExtra)
grid.arrange(f1, f2, ncol=2)
Otherwise create the data frame with y values from both pgammaX and pnorm and categorize them under a faceting variable.
Finally I got the answer. First, I need to have two data sets and attach each function to each data set, as follows:
library(ggplot2)
pgammaX <- function(x) pgamma(x, shape = 64.57849, scale = 0.08854802)
a <- data.frame(x=c(3,9), category="Gamma")
b <- data.frame(x=c(-4,4), category="Normal")
f <- ggplot(a, aes(x)) + stat_function(fun=pgammaX) + stat_function(data = b, mapping = aes(x), fun = pnorm)
Then, using facet_wrap(), I separate into two graphics according to the category assigned to each data set, and establishing a free_x scale.
f + facet_wrap("category", scales = "free_x")
The result is shown below:
How can I fill a geom_violin plot in ggplot2 with different colors based on a fixed cutoff?
For instance, given the setup:
library(ggplot2)
set.seed(123)
dat <- data.frame(x = rep(1:3,each = 100),
y = c(rnorm(100,-1),rnorm(100,0),rnorm(100,1)))
dat$f <- with(dat,ifelse(y >= 0,'Above','Below'))
I'd like to take this basic plot:
ggplot() +
geom_violin(data = dat,aes(x = factor(x),y = y))
and simply have each violin colored differently above and below zero. The naive thing to try, mapping the fill aesthetic, splits and dodges the violin plots:
ggplot() +
geom_violin(data = dat,aes(x = factor(x),y = y, fill = f))
which is not what I want. I'd like a single violin plot at each x value, but with the interior filled with different colors above and below zero.
Here's one way to do this.
library(ggplot2)
library(plyr)
#Data setup
set.seed(123)
dat <- data.frame(x = rep(1:3,each = 100),
y = c(rnorm(100,-1),rnorm(100,0),rnorm(100,1)))
First we'll use ggplot::ggplot_build to capture all the calculated variables that go into plotting the violin plot:
p <- ggplot() +
geom_violin(data = dat,aes(x = factor(x),y = y))
p_build <- ggplot2::ggplot_build(p)$data[[1]]
Next, if we take a look at the source code for geom_violin we see that it does some specific transformations of this computed data frame before handing it off to geom_polygon to draw the actual outlines of the violin regions.
So we'll mimic that process and simply draw the filled polygons manually:
#This comes directly from the source of geom_violin
p_build <- transform(p_build,
xminv = x - violinwidth * (x - xmin),
xmaxv = x + violinwidth * (xmax - x))
p_build <- rbind(plyr::arrange(transform(p_build, x = xminv), y),
plyr::arrange(transform(p_build, x = xmaxv), -y))
I'm omitting a small detail from the source code about duplicating the first row in order to ensure that the polygon is closed.
Now we do two final modifications:
#Add our fill variable
p_build$fill_group <- ifelse(p_build$y >= 0,'Above','Below')
#This is necessary to ensure that instead of trying to draw
# 3 polygons, we're telling ggplot to draw six polygons
p_build$group1 <- with(p_build,interaction(factor(group),factor(fill_group)))
And finally plot:
#Note the use of the group aesthetic here with our computed version,
# group1
p_fill <- ggplot() +
geom_polygon(data = p_build,
aes(x = x,y = y,group = group1,fill = fill_group))
p_fill
Note that in general, this will clobber nice handling of any categorical x axis labels. So you will often need to do the plot using a continuous x axis and then if you need categorical labels, add them manually.
This is a personal project to learn the syntax of the data.table package. I am trying to use the data values to create multiple graphs and label each based on the by group value. For example, given the following data:
# Generate dummy data
require(data.table)
set.seed(222)
DT = data.table(grp=rep(c("a","b","c"),each=10),
x = rnorm(30, mean=5, sd=1),
y = rnorm(30, mean=8, sd=1))
setkey(DT, grp)
The data consists of random x and y values for 3 groups (a, b, and c). I can create a formatted plot of all values with the following code:
# Example of plotting all groups in one plot
require(ggplot2)
p <- ggplot(data=DT, aes(x = x, y = y)) +
aes(shape = factor(grp))+
geom_point(aes(colour = factor(grp), shape = factor(grp)), size = 3) +
labs(title = "Group: ALL")
p
This creates the following plot:
Instead I would like to create a separate plot for each by group, and change the plot title from “Group: ALL” to “Group: a”, “Group: b”, “Group: c”, etc. The documentation for data.table says:
.BY is a list containing a length 1 vector for each item in by. This can be useful when by is not known in advance. The by variables are also available to j directly by name; useful for example for titles of graphs if j is a plot command, or to branch with if()
That being said, I do not understand how to use .BY or .SD to create separate plots for each group. Your help is appreciated.
Here is the data.table solution, though again, not what I would recommend:
make_plot <- function(dat, grp.name) {
print(
ggplot(dat, aes(x=x, y=y)) +
geom_point() + labs(title=paste0("Group: ", grp.name$grp))
)
NULL
}
DT[, make_plot(.SD, .BY), by=grp]
What you really should do for this particular application is what #dmartin recommends. At least, that's what I would do.
Instead of using data.table, you could use facet_grid in ggplot with the labeller argument:
p <- ggplot(data=DT, aes(x = x, y = y)) + aes(shape = factor(grp)) +
geom_point(aes(colour = factor(grp), shape = factor(grp)), size = 3) +
facet_grid(. ~ grp, labeller = label_both)
See the ggplot documentation for more information.
I see you already have a "facetting" option. I had done this
p+facet_wrap('grp')
But this gives the same result:
p+facet_wrap(~grp)
I have a large matrix mdat (1000 rows and 16 columns) contains first column as x variable and other columns as y variables. What I want to do is to make scatter plot in R having 15 figures on the same window. For example:
mdat <- matrix(c(1:50), nrow = 10, ncol=5)
In the above matrix, I have 10 rows and 5 columns. Is it possible that to use the first column as variable on x axes and other columns as variable on y axes, so that I have four different scatterplots on the same window? Keep in mind that I will not prefer par(mfrow=, because in that case I have to run each graph and then produce them on same window. What I need is a package so that I will give it just data and x, y varaibeles, and have graphs on same windows.
Is there some package available that can do this? I cannot find one.
Perhaps the simplest base R way is mfrow (or mfcol)
par(mfrow = c(2, 2)) ## the window will have 2 rows and 2 columns of plots
for (i in 2:ncol(mdat)) plot(mdat[, 1], mdat[, i])
See ?par for everything you might want to know about further adjustments.
Another good option in base R is layout (the help has some nice examples). To be fancy and pretty, you could use the ggplot2 package, but you'll need to reshape your data into a long format.
require(ggplot2)
require(reshape2)
molten <- melt(as.data.frame(mdat), id = "V1")
ggplot(molten, aes(x = V1, y = value)) +
facet_wrap(~ variable, nrow = 2) +
geom_point()
Alternatively with colors instead of facets:
ggplot(molten, aes(x = V1, y = value, color = variable)) +
geom_point()
#user4299 You can re-write shujaa's ggplot command in this form, using qplot which means 'quick plot' which is easier when starting out. Then instead of faceting, use variable to drive the color. So first command produces the same output as shujaa's answer, then the second command gives you all the lines on one plot with different colors and a legend.
qplot(data = molten, x = V1, y = value, facets = . ~ variable, geom = "point")
qplot(data = molten, x = V1, y = value, color = variable, geom = "point")
Maybe
library(lattice)
x = mdat[,1]; y = mdat[,-1]
df = data.frame(X = x, Y = as.vector(y),
Grp = factor(rep(seq_len(ncol(y)), each=length(x))))
xyplot(Y ~ X | Grp, df)