Boxplots grouped by group value - r

I have the following example data frame:
Parameter<-c("As","Hg","Pb")
Loc1<-c("1","10","12")
Loc2<-c("3","14","9")
Loc3<-c("5","12","8")
Loc4<-c("9","20","6")
x<-data.frame(Parameter,Loc1,Loc2,Loc3,Loc4)
x$Loc1<-as.numeric(x$Loc1)
x$Loc2<-as.numeric(x$Loc2)
x$Loc3<-as.numeric(x$Loc3)
x$Loc4<-as.numeric(x$Loc4)
The Parameter column holds the names of the heavy metal and Loc1 to Loc4 columns hold the measured value of the heavy metal at the individual location.
I need a plot with one boxplot for each heavy metal at each location. The location is the grouping value. I tried the following:
melt<-melt(x, id=c("Parameter"))
ggplot(melt)+
geom_boxplot (aes(x=Parameter, y=value, colour=variable))
However, the resulting plot did not somehow grouped the boxplots by location.

A boxplot with one observation per Parameter per Location makes little sense (see my example at the end of my post). I assume you are in fact after a barplot.
You can do something like this
library(tidyverse)
x %>%
gather(Location, Value, -Parameter) %>%
ggplot(aes(x = Parameter, y = Value, fill = Location)) +
geom_bar(stat = "identity")
Or with dodged bars
x %>%
gather(Location, Value, -Parameter) %>%
ggplot(aes(x = Parameter, y = Value, fill = Location)) +
geom_bar(stat = "identity", position = "dodge")
To demonstrate why a boxplot makes little sense, let's show the plot
x %>%
gather(Location, Value, -Parameter) %>%
ggplot(aes(x = Parameter, y = Value, colour = Location)) +
geom_boxplot()
Note that a single observation per group results in the bar being reduced to a single horizontal line. This is probably not what you want to show.

Related

Add a gradient fill to geom_col

Here is come basic code for a column plot:
library(tidyverse)
diamonds %>%
group_by(cut) %>%
summarise(
count = n()
) %>%
ggplot(
aes(
x = cut,
y = count,
fill = count
)
) +
geom_col() +
scale_fill_viridis_c(
option = "plasma"
)
I could not find any examples of what I would like to do so I will try and explain it as best I can. I have applied a colour gradient to the fill aesthetic which colours the whole column plot one colour. Is it possible to have it such that each column of the plot contains the full colour spectrum up until it's respective value?
By which I mean the "Ideal" column of my plot would look exactly like the key in the legend. Then the "Premium" column would look like the key in the legend but cut off ~2/3 of the way up.
Thanks
You can do this fairly easily with a bit of data manipulation. You need to give each group in your original data frame a sequential number that you can associate with the fill scale, and another column the value of 1. Then you just plot using position_stack
library(ggplot2)
library(dplyr)
diamonds %>%
group_by(cut) %>%
mutate(fill_col = seq_along(cut), height = 1) %>%
ggplot(aes(x = cut, y = height, fill = fill_col)) +
geom_col(position = position_stack()) +
scale_fill_viridis_c(option = "plasma")

Method of ordering groups in ggplot line plot

I have created a plot with the following code:
df %>%
mutate(vars = factor(vars, levels = reord)) %>%
ggplot(aes(x = EI1, y = vars, group = groups)) +
geom_line(aes(color=groups)) +
geom_point() +
xlab("EI1 (Expected Influence with Neighbor)") +
ylab("Variables")
The result is:
While the ei1_other group is in descending order on x, the ei1_gun points are ordered by variables. I would like both groups to follow the same order, such that ei1_gun and ei1_other both start at Drugs and then descend in order of the variables, rather than descending by order of x values.
The issue is that the order by which geom_line connects the points is determined by the value on the x-axis. To solve this issue simply swap x and y and make use of coord_flip.
As no sample dataset was provided I use an example dataset based on mtcars to illustrate the issue and the solution. In my example data make is your vars, value your EI1 and name your groups:
library(ggplot2)
library(dplyr)
library(tidyr)
library(forcats)
example_data <- mtcars %>%
mutate(make = row.names(.)) %>%
select(make, hp, mpg) %>%
mutate(make = fct_reorder(make, hp)) %>%
pivot_longer(-make)
Mapping make on x and value on y results in an unordered line plot as in you example. The reason is that the order by which the points get connected is determined by value:
example_data %>%
ggplot(aes(x = value, y = make, color = name, group = name)) +
geom_line() +
geom_point() +
xlab("EI1 (Expected Influence with Neighbor)") +
ylab("Variables")
In contrast, swapping x and y, i.e. mapping make on x and value on y, and making use of coord_flip gives a nice ordererd line plot as the order by which the points get connected is now determined by make (of course we also have to swap xlab and ylab):
example_data %>%
ggplot(aes(x = make, y = value, color = name, group = name)) +
geom_line() +
geom_point() +
coord_flip() +
ylab("EI1 (Expected Influence with Neighbor)") +
xlab("Variables")

ggplot geom_bar leave blank spaces for 0 values by group

Below is a simple ggplot bar plot:
x<-c(1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,3,3,3,3,3,3,3,3,3)
y<-c(1,2,3,4,5,3,3,3,3,4,5,5,6,7,6,5,4,3,2,3,4,5,3,2,1,1,1,1,1)
d<-cbind(x,y)
ggplot(data=d,aes(x=x,fill=as.factor(y)))+
geom_bar(position = position_dodge())
The issue I'm having is that each value of y is not present in each grouping x. So for example, group 1 along the x-axis only contains groups 1-5 of the y variable, and doesn't have any values for 6 or 7. What I would like is for the plot to leave blank spaces when there is are no values for a y in the given x-grouping, this way it is easier to compare the x-groups.
A solution is to compute the frequencies manually and plot the graph based on that frequencies table.
library(ggplot2)
d1 <- data.frame(table(d))
d1$x <- factor(d1$x)
ggplot(d1, aes(x, Freq, fill = factor(y))) +
geom_bar(stat = "identity", position = position_dodge())
library(tidyverse)
# set factor levels
d2 <- d %>% data.frame() %>% mutate(x=factor(x, levels=c(1:3)),
y=factor(y, levels=c(1:7)))
# count frequencies and send to ggplot2
d2 %>% group_by(x, y, .drop=F) %>% tally() %>%
ggplot(aes(x=x, y=n, fill=y, color=y)) +
geom_bar(position = position_dodge2(),
stat="identity")
Another way to do this using dplyr is to use tally() to count the frequencies, but you need to make sure that you have your variables set as factors first.
Using color=y & fill=y in the aes statement helps to show exactly where on the plot the zero values are. So, now you can see that it is y=6 & y=7 missing from x=1 & x=3, and y=1 missing from x=2
And I chose position_dodge2 for my own personal preferences.

Convert a geom_tile in dotplot in ggplot2

I am doing several heatmaps in ggplot2 using geom_tile. They work great but what if instead of tiles (little rectangles) I want to have dots. My input is a binary matrix (converted in a table using melt function).
My x and y are discrete factors. How do I produce circles or dots instead of tiles.....any idea?
Thanks!
example:
dat=data.frame(sample = c("a","a","a","b","b","b","c","c","c"), cond=c("x","y","z","x","y","z","x","y","z"),value=c("1","4","6","2","3","7","4","6","7"),score=c(0,1,1,0,0,0,1,1,1))
if I use the following plot:
ggplot(dat, aes(x = sample, y = cond, color = value)) +
geom_point()
I get the wrong plot. Instead, I would like to have or not have a dot where the score is 0 or 1 and color them by value factor.
I assume you mean to map score to your color aesthetic and not value, as written in your shared code.
Simply convert color to a factor in your initial aesthetics call:
ggplot(dat, aes(x = sample, y = cond, color = as.factor(score))) +
geom_point()
EDIT:
The user indicated that he would like to filter observations where score is not equal to 1, and then color the points by value. You can do so by adding the following pipe operation:
I assume you mean to map score to your color aesthetic and not value, as written in your shared code.
Simply convert color to a factor in your initial aesthetics call:
dat %>%
filter(score == 1) %>%
ggplot(aes(x = sample, y = cond, color = as.factor(value))) +
geom_point()
Note that there are only 3 levels of the factor score and we are missing level b from sample on the x-axis. Keep all levels by specifying drop = FALSE in scale_x_discrete():
dat %>%
filter(score == 1) %>%
ggplot(aes(x = sample, y = cond, color = as.factor(value))) +
geom_point() +
scale_x_discrete(drop = FALSE)

using facets on every column with color grouping

I've seen a lot of people use facets to visualize data. I want to be able to run this on every column in my dataset and then have it grouped by some categorical value within each individual plot.
I've seen others use gather() to plot histogram or densities. I can do that ok, but I guess I fundamentally misunderstand how to use this technique.
I want to be able to do just what I have below - but when I have it grouped by a category. For example, histogram of every column but stacked by the value color. Or dual density plots of every column with these two lines of different colors.
I'd like this - but instead of clarity it is every single column like this...
library(tidyverse)
# what I want but clarity should be replaced with every column except FILL
ggplot(diamonds, aes(x = price, fill = color)) +
geom_histogram(position = 'stack') +
facet_wrap(clarity~.)
# it would look exactly like this, except it would have the fill value by a group.
gathered_data = gather(diamonds %>% select_if(is.numeric))
ggplot(gathered_data , aes(value)) +
geom_histogram() +
theme_classic() +
facet_wrap(~key, scales='free')
tidyr::gather needs four pieces:
1) data (in this case diamonds, passed through the pipe into the first parameter of gather below)
2) key
3) value
4) names of the columns that will be converted to key / value pairs.
gathered_data <- diamonds %>%
gather(key, value,
select_if(diamonds, is.numeric) %>% names())
It's not entirely clear what you are looking for. A picture of your expected output would have been much more illuminating than a description (not all of us are native English speakers...), but perhaps something like this?
diamonds %>%
rename(group = color) %>% # change this line to use another categorical
# column as the grouping variable
group_by(group) %>% # select grouping variable + all numeric variables
select_if(is.numeric) %>%
ungroup() %>%
tidyr::gather(key, value, -group) %>% # gather all numeric variables
ggplot(aes(x = value, fill = group)) +
geom_histogram(position = "stack") +
theme_classic() +
facet_wrap(~ key, scales = 'free')
# alternate example using geom density
diamonds %>%
rename(group = cut) %>%
group_by(group) %>%
select_if(is.numeric) %>%
ungroup() %>%
tidyr::gather(key, value, -group) %>%
ggplot(aes(x = value, color = group)) +
geom_density() +
theme_classic() +
facet_wrap(~ key, scales = 'free')

Resources