line graph of four variables in r in just one plot [duplicate] - r

This question already has answers here:
Plotting two variables as lines using ggplot2 on the same graph
(5 answers)
Closed 7 months ago.
I would like to create a line graph that shows how the trend of five air pollutants were during the years 2009 to 2019.
Year
CO2
NO2
O3
PM2.5
2009
30
18
20
30
2010
32
16
22
20
2011
33
16
24
20
2012
32
15
25
22
2013
34
14
27
24
2014
36
14
28
22
2015
38
13
29
20
2016
39
13
30
18
2017
40
12
32
16
2018
44
13
34
15
2019
45
11
38
14
I gave that code but it is a histogram, i would like to have a line graph were all four are in the same plot.
df %>%
ggplot(aes(x = Year, y = n, fill = airpollutants)) +
geom_col() +
facet_wrap(~Year) + ggtitle("trend of airpollutants")
I want this output:
https://cdn.ablebits.com/_img-blog/line-graph/line-graph-excel.png

You could reshape your data from wide to long and colour every airpollutants like this:
df <- read.table(text = "Year CO2 NO2 O3 PM2.5
2009 30 18 20 30
2010 32 16 22 20
2011 33 16 24 20
2012 32 15 25 22
2013 34 14 27 24
2014 36 14 28 22
2015 38 13 29 20
2016 39 13 30 18
2017 40 12 32 16
2018 44 13 34 15
2019 45 11 38 14
", header = TRUE)
library(ggplot2)
library(dplyr)
library(reshape)
df %>%
melt(id = "Year") %>%
mutate(variable = as.factor(variable)) %>%
ggplot(aes(x = Year, y = value, colour = variable)) +
geom_line() +
labs(colour = "airpollutants") +
ggtitle("trend of airpollutants")
Created on 2022-07-26 by the reprex package (v2.0.1)

Usually you'll want to be in long format when plotting in ggplot2.
One way to draw multiple lines without going long is to map over the columns
ggplot(data = df) + purrr::map2(df[-1], names(df[-1]), \(x,y) geom_line(aes(x = df$Year, y = x, col = y))) +
labs(x = "Concentration",
y = "Year",
col = "Pollutant")

set.seed(123)
library(ggplot2)
library(tidyr)
# Example data
df <- data.frame(year = 2009:2019,
CO2 = sample(30:40, 11),
NO2 = sample(10:20, 11),
O3 = sample(20:30, 11),
PM2.5 = sample(15:25, 11))
# Convert to long format
df_long <- pivot_longer(df,
cols = c(CO2, NO2, O3, PM2.5),
values_to = "Concentration",
names_to = "Pollutant")
# Plot
ggplot(df_long,
aes(
x = year,
y = Concentration,
color = Pollutant,
linetype = Pollutant
)) +
geom_line(size = 0.7) +
ggtitle("Trend of Airpollutants") +
xlab("Year") +
ylab("Concentration") +
scale_x_continuous(breaks = seq(2009, 2019, by = 1), limits = c(2009,2019)) +
theme_minimal()

Related

Same y-axis scale with ggarrange

I would like to use ggarrange to combine two plots into a single one, and use the same scaling for the y-axis to make it easier to compare the values, in my case, temperature calculations.
This is my sample data:
Sample Group Temperature_A Temperature_B
1 Sample01 A 20 34
2 Sample02 B 21 31
3 Sample03 A 25 35
4 Sample04 A 18 27
5 Sample05 B 19 29
6 Sample06 A 22 32
7 Sample07 B 23 33
8 Sample08 B 17 28
9 Sample09 A 15 26
10 Sample10 B 28 39
Using the following code, I get the following plot:
p1 <- ggplot(df, aes(Group, Temperature_A)) +
geom_boxplot()
p2 <- ggplot(df, aes(Group, Temperature_B)) +
geom_boxplot()
plate <- ggarrange(p1, p2, align = "hv")
plate
What I would like to have is for them to share their y-axis so that I can easily compare the two temperature calculation methods. I can do this manually with + scale_y_continuous(limits = c(a,b)) where I set the appropriate values for a and b, however, I have a lot of different dataframes with different temperature ranges. Ideally, I would use the lowest value (+ some space) from both plots, and the highest value (+ some space) from both plots for the limits of both plots. Is there a way to achieve this?
My manual approach and the desired output:
p1 <- ggplot(df, aes(Group, Temperature_A)) +
geom_boxplot() +
scale_y_continuous(limits = c(10, 40))
p2 <- ggplot(df, aes(Group, Temperature_B)) +
geom_boxplot() +
scale_y_continuous(limits = c(10, 40))
plate <- ggarrange(p1, p2, align = "hv")
plate
This is just to illustrate how you could use facets. Because you have two columns for temperature, this is called the 'wide format'. If you reshape the data to the long format, you can easily use facets.
library(ggplot2)
df <- read.table(text = " Sample Group Temperature_A Temperature_B
1 Sample01 A 20 34
2 Sample02 B 21 31
3 Sample03 A 25 35
4 Sample04 A 18 27
5 Sample05 B 19 29
6 Sample06 A 22 32
7 Sample07 B 23 33
8 Sample08 B 17 28
9 Sample09 A 15 26
10 Sample10 B 28 39", header = TRUE)
# Reshape
long <- tidyr::pivot_longer(
df, c("Temperature_A", "Temperature_B"),
values_to = "Temperature", names_to = "Temp_name"
)
# Example of facets
ggplot(long) +
geom_boxplot(aes(Group, Temperature)) +
facet_wrap(~ Temp_name)
Created on 2021-07-27 by the reprex package (v1.0.0)
I agree with teunbrand. Here is an slightly modified approach:
bring data in long format
ignore outlieres outlier.shape = NA
adapt ylim and
facet_wrap
library(tidyverse)
df1 <- df %>%
pivot_longer(
cols = starts_with("Temperature"),
names_to = "Temperature",
values_to = "values"
)
ggplot(df1, aes(Group, values)) +
geom_boxplot(outlier.shape = NA) +
coord_cartesian(ylim = c(10, 40)) +
facet_wrap(~Temperature) +
theme_bw()
Specifically for J.Doe who appeared unsatisfied with the current accepted answer, I'll post another answer which again ignores the 'ggarrange' part, but generalises better. Instead, the {patchwork} package is used.
Notably, you can use & scale_y_continuous(...) to apply a scale to all previous plots. You can use the plot + plot syntax to combine plots.
library(ggplot2)
library(patchwork)
df <- read.table(text = " Sample Group Temperature_A Temperature_B
1 Sample01 A 20 34
2 Sample02 B 21 31
3 Sample03 A 25 35
4 Sample04 A 18 27
5 Sample05 B 19 29
6 Sample06 A 22 32
7 Sample07 B 23 33
8 Sample08 B 17 28
9 Sample09 A 15 26
10 Sample10 B 28 39", header = TRUE)
p1 <- ggplot(df, aes(Group, Temperature_A)) +
geom_boxplot()
p2 <- ggplot(df, aes(Group, Temperature_B)) +
geom_boxplot()
p1 + p2 & scale_y_continuous(limits = c(10, 40))
The same works if you have a list of plots and use wrap_plot().
plot_list <- list(p1, p2)
wrap_plots(plot_list) & scale_y_continuous(limits = c(10, 40))
Created on 2022-12-08 by the reprex package (v2.0.0)

How to change behaviour of `check_overlap = TRUE`?

My data is structured as follows:
> Comparison
# A tibble: 12 x 3
round TotalShots Year
<int> <dbl> <dbl>
1 1 70 2021
2 2 68 2021
3 3 76 2021
4 4 73 2021
5 5 66 2021
6 6 70 2021
7 1 115 2020
8 2 106 2020
9 3 75 2020
10 4 73 2020
11 5 82 2020
12 6 84 2020
I can plot this in ggplot2 via:
ggplot(Comparison, aes(x = round, y = TotalShots,
colour = factor(Year), label = TotalShots)) +
geom_line() +
geom_point(size = 14) +
geom_text(colour = "black", size = 5, check_overlap = TRUE)
However, in the plot, I have the label, at Rd3 printing as 76 and not 75. I assume this is because of check_overlap = TRUE however the plot is wrong, as year = 2020 for round = 3 should have the label of 75 and not 76.
Is there any way to please fix this?
You can try using ggrepel library for clarity of labels and to avoid overlapping.
library(ggrepel)
library(ggplot2)
ggplot(Comparison, aes(x = round, y = TotalShots,
colour = factor(Year), label = TotalShots)) +
geom_line() +
geom_point(size = 14) +
geom_label_repel(colour = "black", size = 5, nudge_y = 0.8)

Putting multiple plots on each page of a large pdf file [duplicate]

This question already has answers here:
Printing multiple ggplots into a single pdf, multiple plots per page
(8 answers)
Closed 4 years ago.
I am trying to print dozens of ggplot boxplots to a pdf file. I want four plots per page of the pdf. How can i use a for loop to create the plots and end up with my desired format?
names_vec <- colnames(raw_data)
pdf(file = 'test1.pdf')
for(i in names_vec) {
print(ggplot(raw_data, aes(x = Group, y = raw_data[[i]])) +
geom_boxplot(na.rm = TRUE) +
labs(title = i, y = 'Relative Intensity') +
theme(axis.text.x = element_text(size = 8, angle = 45)))
}
dev.off()
This is what I've done so far.
par(mfrow = c(2,2)) is not working for me. Similarly, grid.arrange doesn't seem to be compatible with the loop strategy.
Sample code below:
Group glycine serine alanine threonine
1 Gatorade NA NA NA NA
2 Gatorade NA NA NA NA
3 Gatorade NA NA NA NA
4 Lime 17950 203400 2512000 864500
5 Lime 17950 193400 2621000 828500
6 Lime 18270 203200 2381000 885200
7 Lime 19370 214400 2623000 869000
8 Lime 17860 221200 2629000 786600
9 Lime 17570 196000 2667000 868900
10 Michelob 11820 388900 1563000 339100
11 Michelob 10670 419300 1460000 351100
12 Michelob 10240 363800 1601000 333800
13 Michelob 10550 390000 1498000 358000
14 Michelob 9073 391700 1575000 368500
15 Michelob 9507 363700 1358000 358200
16 Porch 15840 303200 3604000 229700
17 Porch 16390 290800 3769000 253900
18 Porch 15340 271900 3476000 222900
19 Porch 17590 284800 3707000 232200
20 Porch 17080 340200 3925000 262200
21 Porch 13380 265900 3595000 223000
22 26-2 Beer 17620 117100 3732000 159900
23 26-2 Beer 16350 136500 3509000 148500
24 26-2 Beer 16460 116100 3364000 143100
25 26-2 Beer 17510 131500 3440000 147500
26 26-2 Beer 15360 116700 3442000 134900
27 26-2 Beer 15770 117400 3539000 144100
28 Marathon 17150 215300 2848000 190200
29 Marathon 17480 146400 3018000 176600
30 Marathon 15450 160200 3003000 205500
31 Marathon 15070 154200 2808000 185300
32 Marathon 15610 158200 2790000 199800
33 Marathon 16610 157700 2788000 205500
names_vec <- c('glycine', 'serine', 'alanine', 'threonine')
p <- list()
for(i in names_vec) {
p[[i]] <- ggplot(raw_data, aes(x = Group, y = raw_data[[i]])) +
geom_boxplot(na.rm = TRUE) +
labs(title = i, y = 'Relative Intensity') +
theme(axis.text.x = element_text(size = 8, angle = 45))
}
pdf(file = 'test1.pdf')
multiplot(p[[1]], p[[2]], p[[3]], p[[4]], cols = 2)
dev.off()
Unfortunately, this yields a page with four plots that are identical save the titles, which are correct.
You can use marrangeGrob() function from the gridExtra package
library(ggplot2)
names_vec <- c('glycine', 'serine', 'alanine', 'threonine')
plot_lst <- vector("list", length = length(names_vec))
for (i in seq_along(names_vec)) {
g <- ggplot(raw_data, aes(x = Group, y = raw_data[[i]])) +
geom_boxplot(na.rm = TRUE) +
labs(title = i, y = 'Relative Intensity') +
theme(axis.text.x = element_text(size = 8, angle = 45))
plot_lst[[i]] <- g
}
Put plot list into multiple pages each of which is has 4 plots
library(gridExtra)
ml <- marrangeGrob(plot_lst, nrow = 2, ncol = 2)
## interactive use
ml
## non-interactive use, multipage pdf
## ggsave("multipage.pdf", ml)
Data:
library(readr)
raw_data <- read_table("ID Group glycine serine alanine threonine
1 Gatorade NA NA NA NA
2 Gatorade NA NA NA NA
3 Gatorade NA NA NA NA
4 Lime 17950 203400 2512000 864500
5 Lime 17950 193400 2621000 828500
6 Lime 18270 203200 2381000 885200
7 Lime 19370 214400 2623000 869000
8 Lime 17860 221200 2629000 786600
9 Lime 17570 196000 2667000 868900
10 Michelob 11820 388900 1563000 339100
11 Michelob 10670 419300 1460000 351100
12 Michelob 10240 363800 1601000 333800
13 Michelob 10550 390000 1498000 358000
14 Michelob 9073 391700 1575000 368500
15 Michelob 9507 363700 1358000 358200
16 Porch 15840 303200 3604000 229700
17 Porch 16390 290800 3769000 253900
18 Porch 15340 271900 3476000 222900
19 Porch 17590 284800 3707000 232200
20 Porch 17080 340200 3925000 262200
21 Porch 13380 265900 3595000 223000
22 26-2 Beer 17620 117100 3732000 159900
23 26-2 Beer 16350 136500 3509000 148500
24 26-2 Beer 16460 116100 3364000 143100
25 26-2 Beer 17510 131500 3440000 147500
26 26-2 Beer 15360 116700 3442000 134900
27 26-2 Beer 15770 117400 3539000 144100
28 Marathon 17150 215300 2848000 190200
29 Marathon 17480 146400 3018000 176600
30 Marathon 15450 160200 3003000 205500
31 Marathon 15070 154200 2808000 185300
32 Marathon 15610 158200 2790000 199800
33 Marathon 16610 157700 2788000 205500")
Created on 2018-10-29 by the reprex package (v0.2.1.9000)
Edited response. Is this what you were after? If not please explain further.
This seems like a case where bringing the data in the proper format is much easier than fiddling with ggplot. Once the data is in the long format instead of wide format the for loop is not necessary anymore.
This code produces the following graph:
library(tidyverse)
raw_data = read_delim("stackoverflowdata.csv", col_names = TRUE, delim = ";") %>%
gather(compound, value, -Group)
ggplot(raw_data,aes(x=Group, y =value)) +
geom_boxplot(na.rm = TRUE) +
facet_wrap(vars(compound), scales="free_y") +
labs(y = 'Relative Intensity') +
theme(axis.text.x = element_text(size = 8, angle = 45, hjust = 1 ))

ggplot color bar for less than and great than Average

What I am trying to write is, if the Team on the X axis is less than Average on the X axis or in this case 17.62, color it different and same for Greater than. So Average on the X axis is black and then the ones above and below are different colors.
dput()Road.Team Goalie.Pts.at.Home.Vs.Road.TEAM
1 ANA 16.67692
2 ARI 23.20000
3 BOS 18.86667
4 BUF 23.14667
5 CAR 23.44615
6 CBJ 16.62857
7 CGY 17.33333
8 CHI 21.77143
9 COL 18.70769
10 DAL 18.70000
11 DET 20.32000
12 EDM 16.94118
13 FLA 23.68000
14 LAK 12.14118
15 MIN 17.49333
16 MTL 21.71429
17 NJD 11.64444
18 NSH 17.12000
19 NYI 15.55556
20 NYR 18.04444
21 OTT 13.31429
22 PHI 21.25714
23 PIT 20.28235
24 SJS 14.45714
25 STL 15.80000
26 TBL 12.21333
27 TOR 10.44444
28 VAN 14.40000
29 VGK 16.35000
30 WPG 19.41333
31 WSH 15.26154
32 Average 17.62340
rm(stats)
stats <- read.csv("HomeGoaliesPtsforVisitingTeam.csv")
stats
library(tidyverse)
p <- stats %>%
gather(key, value, -Road.Team) %>%
ggplot(aes(x=Road.Team, y=value)) +
geom_col(position = "dodge")
p + labs(y = "FanDuel Goalie Pts",x = "Road Team",title = "Points for Home Goalies Vs Road Team")
Create a logical variable in your stats data frame and pass that to the fill argument in aes(). Something like:
stats$compare <- stats$y <= mean(stats$y)
p <- stats %>%
gather(key, value, -Road.Team) %>%
ggplot(aes(x=Road.Team, y=value, fill=compare)) +
geom_col(position = "dodge")

Ordering a 2 bar plot in R

I have a data set as below and I have created a graph with below code as suggested in a previous question. What I want to do is order the bars by rankings rather than team names. Is that possible to do in ggplot?
Team Names PLRankingsReverse Grreserve
Liverpool 20 20
Chelsea 19 19
Manchester City 15 18
Arsenal 16 17
Tottenham 18 16
Manchester United 8 15
Everton 10 14
Watford 13 13
Burnley 17 12
Southampton 9 11
WBA 11 10
Stoke 4 9
Bournemouth 12 8
Leicester 7 7
Middlesbrough 14 6
C. Palace 6 5
West Ham 1 4
Hull 3 3
Swansea 5 2
Sunderland 2 1
And here is the code:
alldata <- read.csv("premierleague.csv")
library(ggplot2)
library(reshape2)
alldata <- melt(alldata)
ggplot(alldata, aes(x = Team.Names, y= value, fill = variable), xlab="Team Names") +
geom_bar(stat="identity", width=.5, position = "dodge")
Thanks for the help!
In this case you need to sort your data frame prior to melting and capture the order. You can then use this to set the limit order on scale_x_discrete, or you can factor Team Name in your aes string.
Using factor:
ordr <- order(alldata$`Team Names`, alldata$PLRankingsReverse, decreasing = TRUE)
alldata <- melt(alldata)
ggplot(alldata, aes(x = factor(`Team Name`, ordr), y = value, fill = variable) +
labs(x = "Team Name") +
geom_bar(stat = "identity", width = .5, position = "dodge")
Using scale_x_discrete:
ordr <- alldata$`Team Name`[order(alldata$PLRankingsReverse, decreasing = TRUE)]
alldata <- melt(alldata)
ggplot(alldata, aes(x = `Team Name`, y = value, fill = variable) +
labs(x = "Team Name") +
geom_bar(stat = "identity", width =. 5, position = "dodge") +
scale_x_discrete(limits = ordr)

Resources