How to create a bar plot and show average Y values - r

I want to create a bar plot based on the following data:
Station Delay
A 5
B 6
A 4
A 3
B 8
X axis should contain stations "A" and "B", while bars (Y axis) should show average delay per a station.
I tried this, but it does not give a correct result:
barplot(c(data$Station, data$Delay),
main="BARPLOT", xlab="Stations", ylab="Delays",
names.arg=data$Station)

df <- data.frame(Station = c("A", "B", "A", "A", "B"), Delay= c(5, 6, 4, 3, 8))
library(dplyr)
df <- df %>% group_by(Station) %>% summarise(me = mean(Delay))
library(ggplot2)
ggplot(aes(x = Station, y = me), data = df) + geom_bar(stat = "identity")
or directly with stat_summary
ggplot(aes(x = Station, y = Delay), data = df) + stat_summary(fun.y = "mean", geom = "bar")

In base R, you can do:
m_data <- data.frame(data$Station, m_del=ave(data$Delay, data$Station), stringsAsFactors=F)
barplot(unique(m_data)$m_del, names=unique(m_data)$Station, main="BARPLOT", xlab="Stations", ylab="Delays")
Or with the package data.table, you can do:
library(data.table)
m_data <- setDT(data)[, mean(Delay), by=Station]
m_data[, barplot(V1, names=Station, main="BARPLOT", xlab="Stations", ylab="Delays")]

Related

How to change the default color of group scatter ggplot while some groups have no values

considering data below, assume one of the levels of group (B here) has no values.
I use scale_color_discrete(drop= F) to force ggplot show B in legend.
But I am not successful at changing the colors of groups following this function with scale_manual and so.
How do these two work together?
set.seed(1)
# Data simulation
x <- runif(20)
y <- 5 * x ^ 2 + rnorm(length(x), sd = 2)
group <- ifelse(x < 0.4, "A",
ifelse(x > 0.8, "C", "B"))
x <- x + runif(length(x), -0.2, 0.2)
# Data frame
df <- data.frame(x = x, y = y, group = group) %>%
mutate(group= factor(group, levels = c("A", "B", "C"))) %>%
filter(group != "B")
df %>%
ggplot(aes(x=x,y=y, color = group)) +
geom_point() +
scale_color_discrete(drop= F)

How to specify multiple xlims for facetted data in ggplot2 R?

The data is facetted by two variables (see graph). Each variable has a different range. I want to specify the range so that all plots in var1 and vae2 are bound by the min and max values of those variables. See sample code attached. I don't want to use setscales = "free" on facet_wrap.
var1 <- rnorm(100, 6, 2)
var2 <- rnorm(100,15,2)
spp.val <- rnorm(100,10,2)
spp <- rep(c("A","B","C","D"), 25)
df <- data.frame(var1, var2,spp, spp.val)
df <- gather(df,
key = "var",
value = "var.val",
var1,var2)
df$var <- as.factor(as.character(df$var))
df$spp <- as.factor(as.character(df$spp))
ggplot(aes(x = var.val, y = spp.val), data = df) +
geom_point() +
facet_grid(spp~var)
#I want the limits for each facet_grid to be set as follows
xlim(min(df[df$var == "var1",]), max(df[df$var == "var1",])
xlim(min(df[df$var == "var2",]), max(df[df$var == "var2",])
Is this what you want?
library(tidyverse)
tibble(
var1 = rnorm(100, 6, 2),
var2 = rnorm(100, 15, 2),
spp.val = rnorm(100, 10, 2),
spp = rep(c("A", "B", "C", "D"), 25)
) |>
pivot_longer(starts_with("var"), names_to = "var", values_to = "var.val") |>
mutate(across(c(spp, var), factor)) |>
ggplot(aes(var.val, spp.val)) +
geom_point() +
facet_grid(spp ~var, scales = "free_x")
Created on 2022-04-23 by the reprex package (v2.0.1)

Plot a barplot with repeated labels

I would like to plot data with repeated x-axis labels in the form of bar-plot without merging the values with repeated labels.
In the example I have a table de:
de <- data.frame(mean=c(10, 2, 3, 1, 4, 5, 3, 9),
base=c('A','A','C','G','T','T','T','A'))
And I would like to have a plot like this:
But when I run this in R:
ggplot(de, aes( y = mean, x =base))+
geom_bar(stat = 'identity')
This is what I get:
It merges the identical bases into one column, whereas I want a separate column for each value of base, even the repeated ones, as shown in the table above.
The easy way is to:
Set non-unique labels for As and Ts in your "base" column; for example Ax, Ay, Tx, Ty etc:
de <- data.frame(mean=c(10, 2, 3, 1, 4, 5, 3, 9),
base=c("Ax", "Ay", "C", "G", "Tx","Ty", "Tz", "A"))
And then change the x-axis labels:
ggplot(de, aes( y = mean, x =base))+
geom_bar(stat = 'identity') +
scale_x_discrete(labels=c("A", "A", "C", "G", "T","T", "T", "A"))
Building off #sargg's excellent answer, we can prevent the possibility of human error by generating the unique base names and the ggplot labels automatically with dplyr:
library(dplyr)
de2 <- de %>%
group_by(base) %>%
mutate(unique_base = paste0(base, row_number()))
# A tibble: 8 x 3
# Groups: base [4]
mean base unique_base
<dbl> <fct> <chr>
1 10 A A1
2 2 A A2
3 3 C C1
4 1 G G1
5 4 T T1
6 5 T T2
7 3 T T3
8 9 A A3
ggplot(de2, aes(y = mean, x =unique_base))+
geom_bar(stat = 'identity') +
scale_x_discrete(labels=de2$base)
For an even DRY-er answer, we can pass the data in like this (note the curly braces: see this answer for more information):
de2 %>% {
ggplot(., aes( y = mean, x =unique_base))+
geom_bar(stat = 'identity') +
scale_x_discrete(labels=.$base)
}
This lets us access the de2 data frame from within the ggplot call with ., thus letting us specify the labels with labels=.$base, rather than having to specify the dataframe de2 twice.
Though there already is an accepted solution, I will post another one, creating the desired labels from the original dataset.
First, an example dataset creation code.
set.seed(1234)
values <- sample(20, 8)
base <- c('A', 'A', 'C', 'G', 'T', 'T', 'T', 'A')
de <- data.frame(base, values)
Now the code to plot the graph.
library(tidyverse)
de %>%
mutate(base1 = paste0(seq_along(base), base)) %>%
ggplot(aes(x = base1, y = values)) +
geom_bar(stat = 'identity') +
geom_text(aes(x = base1, y = -1,
label = base)) +
theme(axis.title.x=element_blank(),
axis.text.x=element_blank(),
axis.ticks.x=element_blank())

Filter in ggplot2's geoms using common aesthetics and data frames across geoms

Say I have the following data frame:
# Dummy data frame
df <- data.frame(x = rep(1:5, 2), y = runif(10), z = rep(c("A", "B"), each = 5))
# x y z
# 1 1 0.92024937 A
# 2 2 0.37246007 A
# 3 3 0.76632809 A
# 4 4 0.03418754 A
# 5 5 0.33770400 A
# 6 1 0.15367174 B
# 7 2 0.78498276 B
# 8 3 0.03341913 B
# 9 4 0.77484244 B
# 10 5 0.13309999 B
I'd like to plot cases where z == "A" as points and cases where z == "B" as lines. Simple enough.
library(ggplot2)
# Plot data
g <- ggplot()
g <- g + geom_point(data = df %>% filter(z == "A"), aes(x = x, y = y))
g <- g + geom_line(data = df %>% filter(z == "B"), aes(x = x, y = y))
g
My data frame and aesthetic for the points and lines are identical, so this seems a bit verbose – especially if I want to do this lots of times (e.g., z == "A" through z == "Z"). Is there a way that I could state ggplot(df, aes(x = x, y = y)) and then subsequently state my filtering or subsetting criteria within the appropriate geoms?
I find the example in the question itself the most readable, although verbose. The second part of the question about dealing with more cases just requires a more sophisticated test in filter using for example %in% (or grep, grepl, etc.) when dealing with multiple cases. Taking advantage of the possibility of accessing default plot data within a layer, and as mentioned by #MrFlick moving the mapping of aesthetics out of the individual layers results in more concise code. All earlier answers get the plot done, so in this respect my answer is not better than any of them...
library(ggplot2)
library(dplyr)
df <- data.frame(x = rep(1:5, 4),
y = runif(20),
z = rep(c("A", "B", "C", "Z"), each = 5))
g <- ggplot(data = df, aes(x = x, y = y)) +
geom_point(data = . %>% filter(z %in% c("A", "B", "C"))) +
geom_line(data = . %>% filter(z == "Z"))
g
Another option would be to spread the data and then just supply the y aesthetic.
library(tidyverse)
df %>% spread(z,y) %>%
ggplot(aes(x = x))+
geom_point(aes(y = A))+
geom_line(aes(y = B))
You can plot lines and points for all z records, but remove unwanted lines and points with passing NA to scale_linetype_manual and scale_shape_manual:
library(ggplot2)
ggplot(df, aes(x, y, linetype = z, shape = z)) +
geom_line() +
geom_point() +
scale_linetype_manual(values = c(1, NA)) +
scale_shape_manual(values = c(NA, 16))

ordering and plotting by one variable conditional on a second

Task: I would like to reorder a factor variable by the difference between the factor variable when a second variable equals 1 and the factor variable when the second variable equals 0. Here is a reproducible example to clarify:
# Package
library(tidyverse)
# Create fake data
df1 <- data.frame(place = c("A", "B", "C"),
avg = c(3.4, 4.5, 1.8))
# Plot, but it's not in order of value
ggplot(df1, aes(x = place, y = avg)) +
geom_point(size = 4)
# Now put it in order
df1$place <- factor(df1$place, levels = df1$place[order(df1$avg)])
# Plots in order now
ggplot(df1, aes(x = place, y = avg)) +
geom_point(size = 4)
# Adding second, conditional variable (called: new)
df2 <- data.frame(place = c("A", "A", "B", "B", "C", "C"),
new = rep(0:1, 3),
avg = c(3.4, 2.3, 4.5, 4.2, 2.1, 1.8))
ggplot(df2, aes(x = place, y = avg, col = factor(new))) +
geom_point(size = 3)
Goal: I would like to order and plot the factor variable place by the difference of avg between place when new is 1 and place when new is 0
You can create the levels for the place column by:
library(tidyr)
df2$place <- factor(df2$place, levels=with(spread(df2, new, avg), place[order(`1` - `0`)]))
ggplot(df2, aes(x = place, y = avg, col = factor(new))) +
geom_point(size = 3) + labs(color = 'new')
gives:
If I understand the goal correctly, then factor A has the biggest difference:
avg(new = 0) - avg(new = 1) = 1.1
So you can spread the data frame to calculate the difference, then gather, then plot avg versus place, reordered by diff. Or if you want A first, by -diff.
But let me know if I didn't understand correctly :)
df2 %>%
spread(new, avg) %>%
mutate(diff = `0` - `1`) %>%
gather(new, avg, -diff, -place) %>%
ggplot(aes(reorder(place, diff), avg)) +
geom_point(aes(color =factor(new)), size = 3)
Calculate the column first using dplyr:
df2 %>% group_by(place) %>% mutate(diff=diff(avg))
ggplot(df2, aes(x=place, y=diff, color=diff)+
geom_point(size=3)

Resources