Convert a geom_tile in dotplot in ggplot2 - r

I am doing several heatmaps in ggplot2 using geom_tile. They work great but what if instead of tiles (little rectangles) I want to have dots. My input is a binary matrix (converted in a table using melt function).
My x and y are discrete factors. How do I produce circles or dots instead of tiles.....any idea?
Thanks!
example:
dat=data.frame(sample = c("a","a","a","b","b","b","c","c","c"), cond=c("x","y","z","x","y","z","x","y","z"),value=c("1","4","6","2","3","7","4","6","7"),score=c(0,1,1,0,0,0,1,1,1))
if I use the following plot:
ggplot(dat, aes(x = sample, y = cond, color = value)) +
geom_point()
I get the wrong plot. Instead, I would like to have or not have a dot where the score is 0 or 1 and color them by value factor.

I assume you mean to map score to your color aesthetic and not value, as written in your shared code.
Simply convert color to a factor in your initial aesthetics call:
ggplot(dat, aes(x = sample, y = cond, color = as.factor(score))) +
geom_point()
EDIT:
The user indicated that he would like to filter observations where score is not equal to 1, and then color the points by value. You can do so by adding the following pipe operation:
I assume you mean to map score to your color aesthetic and not value, as written in your shared code.
Simply convert color to a factor in your initial aesthetics call:
dat %>%
filter(score == 1) %>%
ggplot(aes(x = sample, y = cond, color = as.factor(value))) +
geom_point()
Note that there are only 3 levels of the factor score and we are missing level b from sample on the x-axis. Keep all levels by specifying drop = FALSE in scale_x_discrete():
dat %>%
filter(score == 1) %>%
ggplot(aes(x = sample, y = cond, color = as.factor(value))) +
geom_point() +
scale_x_discrete(drop = FALSE)

Related

Where does ggplot set the order of the color scheme?

I have a data set that I'm showing in a series of violin plots with one categorical variable and one continuous numeric variable. When R generated the original series of violins, the categorical variable was plotted alphabetically (I rotated the plot, so it appears alphabetically from bottom to top). I thought it would look better if I sorted them using the numeric variable.
When I do this, the color scheme doesn't turn out as I wanted it to. It's like R assigned the colors to the violins before it sorted them; after the sorting, they kept their original colors - which is the opposite of what I wanted. I wanted R to sort them first and then apply the color scheme.
I'm using the viridis color scheme here, but I've run into the same thing when I used RColorBrewer.
Here is my code:
# Start plotting
g <- ggplot(NULL)
# Violin plot
g <- g + geom_violin(data = df, aes(x = reorder(catval, -numval,
na.rm = TRUE), y = numval, fill = catval), trim = TRUE,
scale = "width", adjust = 0.5)
(snip)
# Specify colors
g <- g + scale_colour_viridis_d()
# Remove legend
g <- g + theme(legend.position = "none")
# Flip for readability
g <- g + coord_flip()
# Produce plot
g
Here is the resulting plot.
If I leave out the reorder() argument when I call geom_violin(), the color order is what I would like, but then my categorical variable is sorted alphabetically and not by the numeric variable.
Is there a way to get what I'm after?
I think this is a reproducible example of what you're seeing. In the diamonds dataset, the mean price of "Good" diamonds is actually higher than the mean for "Very Good" diamonds.
library(dplyr)
diamonds %>%
group_by(cut) %>%
summarize(mean_price = mean(price))
# A tibble: 5 x 2
cut mean_price
<ord> <dbl>
1 Fair 4359.
2 Good 3929.
3 Very Good 3982.
4 Premium 4584.
5 Ideal 3458.
By default, reorder uses the mean of the sorting variable, so Good is plotted above Very Good. But the fill is still based on the un-reordered variable cut, which is a factor in order of quality.
ggplot(diamonds, aes(x = reorder(cut, -price),
y = price, fill = cut)) +
geom_violin() +
coord_flip()
If you want the color to follow the ordering, then you could reorder upstream of ggplot2, or reorder in both aesthetics:
ggplot(diamonds, aes(x = reorder(cut, -price),
y = price,
fill = reorder(cut, -price))) +
geom_violin() +
coord_flip()
Or
diamonds %>%
mutate(cut = reorder(cut, -price)) %>%
ggplot(aes(x = cut, y = price, fill = cut)) +
geom_violin() +
coord_flip()

Method of ordering groups in ggplot line plot

I have created a plot with the following code:
df %>%
mutate(vars = factor(vars, levels = reord)) %>%
ggplot(aes(x = EI1, y = vars, group = groups)) +
geom_line(aes(color=groups)) +
geom_point() +
xlab("EI1 (Expected Influence with Neighbor)") +
ylab("Variables")
The result is:
While the ei1_other group is in descending order on x, the ei1_gun points are ordered by variables. I would like both groups to follow the same order, such that ei1_gun and ei1_other both start at Drugs and then descend in order of the variables, rather than descending by order of x values.
The issue is that the order by which geom_line connects the points is determined by the value on the x-axis. To solve this issue simply swap x and y and make use of coord_flip.
As no sample dataset was provided I use an example dataset based on mtcars to illustrate the issue and the solution. In my example data make is your vars, value your EI1 and name your groups:
library(ggplot2)
library(dplyr)
library(tidyr)
library(forcats)
example_data <- mtcars %>%
mutate(make = row.names(.)) %>%
select(make, hp, mpg) %>%
mutate(make = fct_reorder(make, hp)) %>%
pivot_longer(-make)
Mapping make on x and value on y results in an unordered line plot as in you example. The reason is that the order by which the points get connected is determined by value:
example_data %>%
ggplot(aes(x = value, y = make, color = name, group = name)) +
geom_line() +
geom_point() +
xlab("EI1 (Expected Influence with Neighbor)") +
ylab("Variables")
In contrast, swapping x and y, i.e. mapping make on x and value on y, and making use of coord_flip gives a nice ordererd line plot as the order by which the points get connected is now determined by make (of course we also have to swap xlab and ylab):
example_data %>%
ggplot(aes(x = make, y = value, color = name, group = name)) +
geom_line() +
geom_point() +
coord_flip() +
ylab("EI1 (Expected Influence with Neighbor)") +
xlab("Variables")

Boxplots grouped by group value

I have the following example data frame:
Parameter<-c("As","Hg","Pb")
Loc1<-c("1","10","12")
Loc2<-c("3","14","9")
Loc3<-c("5","12","8")
Loc4<-c("9","20","6")
x<-data.frame(Parameter,Loc1,Loc2,Loc3,Loc4)
x$Loc1<-as.numeric(x$Loc1)
x$Loc2<-as.numeric(x$Loc2)
x$Loc3<-as.numeric(x$Loc3)
x$Loc4<-as.numeric(x$Loc4)
The Parameter column holds the names of the heavy metal and Loc1 to Loc4 columns hold the measured value of the heavy metal at the individual location.
I need a plot with one boxplot for each heavy metal at each location. The location is the grouping value. I tried the following:
melt<-melt(x, id=c("Parameter"))
ggplot(melt)+
geom_boxplot (aes(x=Parameter, y=value, colour=variable))
However, the resulting plot did not somehow grouped the boxplots by location.
A boxplot with one observation per Parameter per Location makes little sense (see my example at the end of my post). I assume you are in fact after a barplot.
You can do something like this
library(tidyverse)
x %>%
gather(Location, Value, -Parameter) %>%
ggplot(aes(x = Parameter, y = Value, fill = Location)) +
geom_bar(stat = "identity")
Or with dodged bars
x %>%
gather(Location, Value, -Parameter) %>%
ggplot(aes(x = Parameter, y = Value, fill = Location)) +
geom_bar(stat = "identity", position = "dodge")
To demonstrate why a boxplot makes little sense, let's show the plot
x %>%
gather(Location, Value, -Parameter) %>%
ggplot(aes(x = Parameter, y = Value, colour = Location)) +
geom_boxplot()
Note that a single observation per group results in the bar being reduced to a single horizontal line. This is probably not what you want to show.

gradient fill violin plots using ggplot2

I want to gradient fill a violin plot based on the density of points in the bins (blue for highest density and red for lowest).
I have generated a plot using the following commands but failed to color it based on density (in this case the width of the violin. I also would like to generate box plots with similar coloring).
library("ggplot2")
data(diamonds)
ggplot(diamonds, aes(x=cut,y=carat)) + geom_violin()
to change the colour of the violin plot you use fill = variable, like this:
ggplot(diamonds, aes(x=cut,y=carat)) + geom_violin(aes(fill=cut))
same goes for boxplot
ggplot(diamonds, aes(x=cut,y=carat)) + geom_boxplot(aes(fill=cut))
but whatever value you have has to have the same value for each cut, that is, if you wanted to use for example mean depth/cut as the color variable you would have to code it.
with dplyr group your diamonds by cut and with summarize get the mean depth (or any other variable)
library(dplyr)
diamonds_group <- group_by(diamonds, cut)
diamonds_group <- summarize(diamonds_group, Mean_Price = mean(price))
Then I used diamonds2 as a copy of diamonds to then manipulate the dataset
diamonds2 <- diamonds
I merge both dataframes to get the Mean_Depth as a variable in diamonds2
diamonds2 <- merge(diamonds2, diamonds_group)
And now I can plot it with mean depth as a color variable
ggplot(diamonds2, aes(x=cut,y=carat)) + geom_boxplot(aes(fill=Mean_Price)) + scale_fill_gradient2(midpoint = mean(diamonds2$price))
Just answered this for another thread, but believe it's possibly more appropriate for this thread. You can create a pseudo-fill by drawing many segments. You can get those directly from the underlying data in the ggplot_built object.
If you want an additional polygon outline ("border"), you'd need to create this from the x/y coordinates. Below one option.
library(tidyverse)
p <- ggplot(diamonds, aes(x=cut,y=carat)) + geom_violin()
mywidth <- .35 # bit of trial and error
# all you need for the gradient fill
vl_fill <- data.frame(ggplot_build(p)$data) %>%
mutate(xnew = x- mywidth*violinwidth, xend = x+ mywidth*violinwidth)
# the outline is a bit more convoluted, as the order matters
vl_poly <- vl_fill %>%
select(xnew, xend, y, group) %>%
pivot_longer(-c(y, group), names_to = "oldx", values_to = "x") %>%
arrange(y) %>%
split(., .$oldx) %>%
map(., function(x) {
if(all(x$oldx == "xnew")) x <- arrange(x, desc(y))
x
}) %>%
bind_rows()
ggplot() +
geom_polygon(data = vl_poly, aes(x, y, group = group),
color= "black", size = 1, fill = NA) +
geom_segment(data = vl_fill, aes(x = xnew, xend = xend, y = y, yend = y,
color = violinwidth))
Created on 2021-04-14 by the reprex package (v1.0.0)

plot factor frequency by group (a yield plot)

I have a data frame containing the test_outcome (PASS/FAIL) for each test_type performed on each test_subject. For example:
test_subject, test_type, test_outcome
person_a, height, PASS
person_b, height, PASS
person_c, height, FAIL
person_d, height, PASS
person_a, weight, FAIL
person_b, weight, FAIL
person_c, weight, PASS
person_d, weight, PASS
I would like to prepare a yield plot by test_type and test_subject.
Y-axis = yield i.e. num pass/(num pass + num fail)
X-axis = test_subject
fill: = A line for each test_type.
I would prefer to use ggplot2, can you please recommend the best approach here? e.g. how to reshape the data before plotting?
A quick dplyr answer, you will want to tidy up the graph based on your desired colours etc.
library(dplyr)
library(ggplot2)
dat <- dat %>% group_by(test_subject, test_type) %>%
summarise(passrate = sum(test_outcome=="PASS") / n())
ggplot(dat, aes(x = test_subject, y = passrate, fill = test_type)) +
geom_bar(stat = "identity", position = "dodge")
Edit: a line graph was requested. Normally, categorical groups shouldn't be connected by a line graph - as there is no reason to order them in a particular way.
ggplot(dat, aes(x = test_subject, y = passrate, col = test_type)) +
geom_line(aes(group = test_type)) +
geom_point()

Resources