I have got a data frame with several 1000 rows in the form of
group = c("gr1","gr1","gr1","gr1","gr1","gr1","gr1","gr1","gr1","gr1","gr2","gr2","gr2","gr2","gr2","gr2","gr2","gr2","gr2","gr2","gr3","gr3","gr3","gr3","gr3","gr3","gr3","gr3","gr3","gr3")
pos = c(1,2,3,4,5,6,7,8,9,10,1,2,3,4,5,6,7,8,9,10,1,2,3,4,5,6,7,8,9,10)
color = c(2,2,2,2,3,3,2,2,3,2,1,2,2,2,1,1,1,1,1,1,2,2,2,2,2,2,1,1,2,2)
df = data.frame(group, pos, color)
and would like to make a kind of heatmap in which one axes has a continuous scale (position). The color column is categorical. However due to the large amount of data points I want to use binning, i.e. use it as a continuous variable.
This is more or less how the plot should look like:
I can't think of a way to create such a plot using ggplot2/R. I have tried several geometries, e.g. geom_point()
ggplot(data=df, aes(x=strain, y=pos, color=color)) +
geom_point() +
scale_colour_gradientn(colors=c("yellow", "black", "orange"))
Thanks for your help in advance.
Does this help you?
group = c("gr1","gr1","gr1","gr1","gr1","gr1","gr1","gr1","gr1","gr1","gr2","gr2","gr2","gr2","gr2","gr2","gr2","gr2","gr2","gr2","gr3","gr3","gr3","gr3","gr3","gr3","gr3","gr3","gr3","gr3")
pos = c(1,2,3,4,5,6,7,8,9,10,1,2,3,4,5,6,7,8,9,10,1,2,3,4,5,6,7,8,9,10)
color = c(2,2,2,2,3,3,2,2,3,2,1,2,2,2,1,1,1,1,1,1,2,2,2,2,2,2,1,1,2,2)
df = data.frame(group, pos, color)
ggplot(data = df, aes(x = group, y = pos)) + geom_tile(aes(fill = color))
Looks like this
Improved version with 3 color gradient if you like
ggplot(data = df, aes(x = group, y = pos)) + geom_tile(aes(fill = color))+ scale_fill_gradientn(colours=c("orange","black","yellow"),values=rescale(c(1, 2, 3)),guide="colorbar")
I am trying to identify why I have a purple line appearing along the x axis that is the same color as "Prypchan, Lida" from my legend. I took a look at the data and do not see any issues there.
ggplot(LosDoc_Ex, aes(x = LOS)) +
geom_density(aes(colour = AttMD)) +
theme(legend.position = "bottom") +
xlab("Length of Stay") +
ylab("Distribution") +
labs(title = "LOS Analysis * ",
caption = "*exluding Residential and WSH",
color = "Attending MD: ")
Usually I'd wait for a reproducible example, but in this case, I'd say the underlying explanation is really quite straightforward:
geom_density() creates a polygon, not a line.
Using a sample dataset from ggplot2's own package, we can observe the same straight line below the density plots, covering the x-axis & y-axis. The colour of the line simply depends on which plot is on top of the rest:
p <- ggplot(diamonds, aes(carat, colour = cut)) +
Workaround 1: You can manually calculate the density values yourself for each colour group in a new data frame, & plot the results using geom_line() instead of geom_density():
diamonds2 <- diamonds %>%
nest(-cut) %>%
mutate(density = map(data, ~density(.x$carat))) %>%
mutate(density.x = map(density, ~.x[["x"]]),
density.y = map(density, ~.x[["y"]])) %>%
select(cut, density.x, density.y) %>%
ggplot(diamonds2, aes(x = density.x, y = density.y, colour = cut)) +
Workaround 2: Or you can take the data generated by the original plot, & plot that using geom_line(). The colours would need to be remapped to the legend values though:
lp <- layer_data(p)
if(is.factor(diamonds$cut)) {
col.lev = levels(diamonds$cut)
} else {
col.lev = sort(unique(diamonds$cut))
lp$cut <- factor(lp$group, labels = col.lev)
ggplot(lp, aes(x = x, y = ymax, colour = cut)) +
There are two simple workarounds. First, if you only want lines and no filled areas, you can simply use geom_line() with the density stat:
ggplot(diamonds, aes(x = carat, y = stat(density), colour = cut)) +
geom_line(stat = "density")
Note that for this to work, we need to set the y aesthetic to stat(density).
Second, if you want the area under the lines to be filled, you can use geom_density_line() from the ggridges package. It works exactly like geom_density() but draws a line (with filled area underneath) rather than a polygon.
ggplot(diamonds, aes(x = carat, colour = cut, fill = cut)) +
geom_density_line(alpha = 0.2)
Created on 2018-12-14 by the reprex package (v0.2.1)
I am currently reading R for Data Science and trying to create some graphs. I understand that to get proportion in bar chart, you need to use group = 1. For example, the code below works:
ggplot(data = diamonds) + geom_bar(mapping = aes(x = cut, fill = color))
But I don't get the same plot for proportions.
ggplot(data = diamonds) + geom_bar(mapping = aes(x = cut, fill = color, y = ..prop.., group = 1))
I do get proportion but not by color.
Here's one way to do it using ..count..
Let's say I have the following data frame:
df<- data.frame(delta=rep(rep(c(0.1,0.2,0.3),each=3),n), metric=rep(rep(c('P','R','C'),3),n),value=rnorm(9*n, 0.0, 1.0))
My goal is to do a boxplot by multiple factors:
p<- ggplot(data = df, aes(x = factor(delta), y = value)) +
The output is:
So far so good, but if I do:
p+ geom_point(aes(color = factor(metric)))
I get:
I do not know what it is doing. My goal is to color the outliers as it is done here. Note that this solution changes the inside color of the boxes to white and set the border to different colors. I want to keep the same color of the boxes while having the outliers inherit those colors. I want to know how to make the outliers get the same colors from their respective boxplots.
Do you want just to change the outliers' colour ? If so, you can do it easily by drawing boxplot twice.
p <- ggplot(data = df, aes(x = factor(delta), y = value)) +
geom_boxplot(aes(colour=factor(metric))) +
geom_boxplot(aes(fill=factor(metric)), outlier.colour = NA)
# outlier.shape = 21 # if you want a boarder
colss <- c(P="firebrick3",R="skyblue", C="mediumseagreen")
p + scale_colour_manual(values = colss) + # outliers colours
scale_fill_manual(values = colss) # boxes colours
# the development version ('s geom_boxplot() has an argument outlier.fill,
# so I guess under code would return the similar output in the near future.
p2 <- ggplot(data = df, aes(x = factor(delta), y = value)) +
geom_boxplot(aes(fill=factor(metric)), outlier.shape = 21, outlier.colour = NA)
Maybe this:
ggplot(data = df, aes(x = as.factor(delta), y = value,fill=as.factor(metric))) +
geom_boxplot(outlier.size = 1)+ geom_point(pch = 21,position=position_jitterdodge(jitter.width=0))
I have the following example.
# Example Data
x <- data.frame(var1=rnorm(800,0,1),
type=factor(rep(c("x", "y"), length.out=800)),
set=factor(rep(c("A","B","C","D"), each=200))
Now, I would like to plot (thin) parallel coordinate plots of these lines, with points for each of the variable values. I would like to overlay a boxplot (each of a different color for each method) on these parallel coordinate plots at the variables values. On top of this, I would like to facet for the groups and types, say using set~type. Is this possible to do using ggplot2?
Any suggestions? Thanks!
You need to put data in long format first. I didn't put in points, since the graph is already cluttered enough, but you can do so by adding a geom_point.
x$id <- 1:nrow(x)
x2 <- gather(x, var, value, var1:var3)
ggplot(x2, aes(var, value)) +
geom_line(aes(group = id), size = 0.05, alpha = 0.3) +
geom_boxplot(aes(fill = var), alpha = 0.5) +
facet_grid(set ~ type) +
Or perhaps violins
Replacing the boxplots with violins looks pretty cool as well.
ggplot(x2, aes(var, value)) +
geom_line(aes(group = id), size = 0.05, alpha = 0.3) +
geom_violin(aes(fill = var), col = NA, alpha = 0.6) +
facet_grid(set ~ type) +
ggplot(data = sortmax, aes(x = Date, y = price, colour = Grade)) +geom_line(aes(group = Grade)) + geom_point()
I have five different graphs for five different grades . All the graphs are intersecting and over writing each other because of common values of price on y axis. How can I increase the distance between all these graphs ?
It will be useful if you can post output of command: dput(sortmax)
You can try separating the graphs completely by using facet_grid:
ggplot(data = sortmax, aes(x = Date, y = price, color=Grade)) +
geom_line() +
facet_grid(Grade ~ .)
If you group your data only by one variable, you can also use facet_wrap. If 5 different Grade result in a too wide plot you can choose to add nrow or ncol (number of rows/columns) argument to adjust the final layout
Variant of rnso answer:
ggplot(data = sortmax, aes(x = Date, y = price, color=Grade)) +
geom_line() +