Same y-axis scale with ggarrange - r

I would like to use ggarrange to combine two plots into a single one, and use the same scaling for the y-axis to make it easier to compare the values, in my case, temperature calculations.
This is my sample data:
Sample Group Temperature_A Temperature_B
1 Sample01 A 20 34
2 Sample02 B 21 31
3 Sample03 A 25 35
4 Sample04 A 18 27
5 Sample05 B 19 29
6 Sample06 A 22 32
7 Sample07 B 23 33
8 Sample08 B 17 28
9 Sample09 A 15 26
10 Sample10 B 28 39
Using the following code, I get the following plot:
p1 <- ggplot(df, aes(Group, Temperature_A)) +
geom_boxplot()
p2 <- ggplot(df, aes(Group, Temperature_B)) +
geom_boxplot()
plate <- ggarrange(p1, p2, align = "hv")
plate
What I would like to have is for them to share their y-axis so that I can easily compare the two temperature calculation methods. I can do this manually with + scale_y_continuous(limits = c(a,b)) where I set the appropriate values for a and b, however, I have a lot of different dataframes with different temperature ranges. Ideally, I would use the lowest value (+ some space) from both plots, and the highest value (+ some space) from both plots for the limits of both plots. Is there a way to achieve this?
My manual approach and the desired output:
p1 <- ggplot(df, aes(Group, Temperature_A)) +
geom_boxplot() +
scale_y_continuous(limits = c(10, 40))
p2 <- ggplot(df, aes(Group, Temperature_B)) +
geom_boxplot() +
scale_y_continuous(limits = c(10, 40))
plate <- ggarrange(p1, p2, align = "hv")
plate

This is just to illustrate how you could use facets. Because you have two columns for temperature, this is called the 'wide format'. If you reshape the data to the long format, you can easily use facets.
library(ggplot2)
df <- read.table(text = " Sample Group Temperature_A Temperature_B
1 Sample01 A 20 34
2 Sample02 B 21 31
3 Sample03 A 25 35
4 Sample04 A 18 27
5 Sample05 B 19 29
6 Sample06 A 22 32
7 Sample07 B 23 33
8 Sample08 B 17 28
9 Sample09 A 15 26
10 Sample10 B 28 39", header = TRUE)
# Reshape
long <- tidyr::pivot_longer(
df, c("Temperature_A", "Temperature_B"),
values_to = "Temperature", names_to = "Temp_name"
)
# Example of facets
ggplot(long) +
geom_boxplot(aes(Group, Temperature)) +
facet_wrap(~ Temp_name)
Created on 2021-07-27 by the reprex package (v1.0.0)

I agree with teunbrand. Here is an slightly modified approach:
bring data in long format
ignore outlieres outlier.shape = NA
adapt ylim and
facet_wrap
library(tidyverse)
df1 <- df %>%
pivot_longer(
cols = starts_with("Temperature"),
names_to = "Temperature",
values_to = "values"
)
ggplot(df1, aes(Group, values)) +
geom_boxplot(outlier.shape = NA) +
coord_cartesian(ylim = c(10, 40)) +
facet_wrap(~Temperature) +
theme_bw()

Specifically for J.Doe who appeared unsatisfied with the current accepted answer, I'll post another answer which again ignores the 'ggarrange' part, but generalises better. Instead, the {patchwork} package is used.
Notably, you can use & scale_y_continuous(...) to apply a scale to all previous plots. You can use the plot + plot syntax to combine plots.
library(ggplot2)
library(patchwork)
df <- read.table(text = " Sample Group Temperature_A Temperature_B
1 Sample01 A 20 34
2 Sample02 B 21 31
3 Sample03 A 25 35
4 Sample04 A 18 27
5 Sample05 B 19 29
6 Sample06 A 22 32
7 Sample07 B 23 33
8 Sample08 B 17 28
9 Sample09 A 15 26
10 Sample10 B 28 39", header = TRUE)
p1 <- ggplot(df, aes(Group, Temperature_A)) +
geom_boxplot()
p2 <- ggplot(df, aes(Group, Temperature_B)) +
geom_boxplot()
p1 + p2 & scale_y_continuous(limits = c(10, 40))
The same works if you have a list of plots and use wrap_plot().
plot_list <- list(p1, p2)
wrap_plots(plot_list) & scale_y_continuous(limits = c(10, 40))
Created on 2022-12-08 by the reprex package (v2.0.0)

Related

line graph of four variables in r in just one plot [duplicate]

This question already has answers here:
Plotting two variables as lines using ggplot2 on the same graph
(5 answers)
Closed 7 months ago.
I would like to create a line graph that shows how the trend of five air pollutants were during the years 2009 to 2019.
Year
CO2
NO2
O3
PM2.5
2009
30
18
20
30
2010
32
16
22
20
2011
33
16
24
20
2012
32
15
25
22
2013
34
14
27
24
2014
36
14
28
22
2015
38
13
29
20
2016
39
13
30
18
2017
40
12
32
16
2018
44
13
34
15
2019
45
11
38
14
I gave that code but it is a histogram, i would like to have a line graph were all four are in the same plot.
df %>%
ggplot(aes(x = Year, y = n, fill = airpollutants)) +
geom_col() +
facet_wrap(~Year) + ggtitle("trend of airpollutants")
I want this output:
https://cdn.ablebits.com/_img-blog/line-graph/line-graph-excel.png
You could reshape your data from wide to long and colour every airpollutants like this:
df <- read.table(text = "Year CO2 NO2 O3 PM2.5
2009 30 18 20 30
2010 32 16 22 20
2011 33 16 24 20
2012 32 15 25 22
2013 34 14 27 24
2014 36 14 28 22
2015 38 13 29 20
2016 39 13 30 18
2017 40 12 32 16
2018 44 13 34 15
2019 45 11 38 14
", header = TRUE)
library(ggplot2)
library(dplyr)
library(reshape)
df %>%
melt(id = "Year") %>%
mutate(variable = as.factor(variable)) %>%
ggplot(aes(x = Year, y = value, colour = variable)) +
geom_line() +
labs(colour = "airpollutants") +
ggtitle("trend of airpollutants")
Created on 2022-07-26 by the reprex package (v2.0.1)
Usually you'll want to be in long format when plotting in ggplot2.
One way to draw multiple lines without going long is to map over the columns
ggplot(data = df) + purrr::map2(df[-1], names(df[-1]), \(x,y) geom_line(aes(x = df$Year, y = x, col = y))) +
labs(x = "Concentration",
y = "Year",
col = "Pollutant")
set.seed(123)
library(ggplot2)
library(tidyr)
# Example data
df <- data.frame(year = 2009:2019,
CO2 = sample(30:40, 11),
NO2 = sample(10:20, 11),
O3 = sample(20:30, 11),
PM2.5 = sample(15:25, 11))
# Convert to long format
df_long <- pivot_longer(df,
cols = c(CO2, NO2, O3, PM2.5),
values_to = "Concentration",
names_to = "Pollutant")
# Plot
ggplot(df_long,
aes(
x = year,
y = Concentration,
color = Pollutant,
linetype = Pollutant
)) +
geom_line(size = 0.7) +
ggtitle("Trend of Airpollutants") +
xlab("Year") +
ylab("Concentration") +
scale_x_continuous(breaks = seq(2009, 2019, by = 1), limits = c(2009,2019)) +
theme_minimal()

ggplot color bar for less than and great than Average

What I am trying to write is, if the Team on the X axis is less than Average on the X axis or in this case 17.62, color it different and same for Greater than. So Average on the X axis is black and then the ones above and below are different colors.
dput()Road.Team Goalie.Pts.at.Home.Vs.Road.TEAM
1 ANA 16.67692
2 ARI 23.20000
3 BOS 18.86667
4 BUF 23.14667
5 CAR 23.44615
6 CBJ 16.62857
7 CGY 17.33333
8 CHI 21.77143
9 COL 18.70769
10 DAL 18.70000
11 DET 20.32000
12 EDM 16.94118
13 FLA 23.68000
14 LAK 12.14118
15 MIN 17.49333
16 MTL 21.71429
17 NJD 11.64444
18 NSH 17.12000
19 NYI 15.55556
20 NYR 18.04444
21 OTT 13.31429
22 PHI 21.25714
23 PIT 20.28235
24 SJS 14.45714
25 STL 15.80000
26 TBL 12.21333
27 TOR 10.44444
28 VAN 14.40000
29 VGK 16.35000
30 WPG 19.41333
31 WSH 15.26154
32 Average 17.62340
rm(stats)
stats <- read.csv("HomeGoaliesPtsforVisitingTeam.csv")
stats
library(tidyverse)
p <- stats %>%
gather(key, value, -Road.Team) %>%
ggplot(aes(x=Road.Team, y=value)) +
geom_col(position = "dodge")
p + labs(y = "FanDuel Goalie Pts",x = "Road Team",title = "Points for Home Goalies Vs Road Team")
Create a logical variable in your stats data frame and pass that to the fill argument in aes(). Something like:
stats$compare <- stats$y <= mean(stats$y)
p <- stats %>%
gather(key, value, -Road.Team) %>%
ggplot(aes(x=Road.Team, y=value, fill=compare)) +
geom_col(position = "dodge")

How to draw barcharts in 1 graph for totals of 4 variables?

I am trying to draw a barcharts for 4 variables in one graph.
YearMonth Var1 Var2 Var3 Var4
201501 11 21 31 41
201501 12 22 32 42
201502 13 23 33 43
201502 14 24 34 44
201503 15 25 35 45
201503 16 26 36 46
I want to:
Sum the Variables respectively by month
then draw bar charts for the 4 variables by month totals in one graph
x-axis is %Y-%m, y-axis is count
Different fill colors for diff variables.
I got lost in the ggplot, stat_summary and other functions....
I appreciate any help with my question.
To get wanted results you'll need to mung your data and plot it. Package tidyverse will take care of that.
First you gather (transform) your data and group it by variable and YearMonth, then you sum per group. Finally you plot using ggplot2
library(tidyverse)
d %>%
gather(variable, value, -YearMonth) %>%
group_by(variable, YearMonth) %>%
summarise(valueSum = sum(value)) %>%
ggplot(aes(YearMonth, valueSum, fill = variable)) +
geom_bar(stat = "identity", position = "dodge") +
labs(x = "YearMonth",
y = "Sum",
main = "Sum per Year Month")

Ordering a 2 bar plot in R

I have a data set as below and I have created a graph with below code as suggested in a previous question. What I want to do is order the bars by rankings rather than team names. Is that possible to do in ggplot?
Team Names PLRankingsReverse Grreserve
Liverpool 20 20
Chelsea 19 19
Manchester City 15 18
Arsenal 16 17
Tottenham 18 16
Manchester United 8 15
Everton 10 14
Watford 13 13
Burnley 17 12
Southampton 9 11
WBA 11 10
Stoke 4 9
Bournemouth 12 8
Leicester 7 7
Middlesbrough 14 6
C. Palace 6 5
West Ham 1 4
Hull 3 3
Swansea 5 2
Sunderland 2 1
And here is the code:
alldata <- read.csv("premierleague.csv")
library(ggplot2)
library(reshape2)
alldata <- melt(alldata)
ggplot(alldata, aes(x = Team.Names, y= value, fill = variable), xlab="Team Names") +
geom_bar(stat="identity", width=.5, position = "dodge")
Thanks for the help!
In this case you need to sort your data frame prior to melting and capture the order. You can then use this to set the limit order on scale_x_discrete, or you can factor Team Name in your aes string.
Using factor:
ordr <- order(alldata$`Team Names`, alldata$PLRankingsReverse, decreasing = TRUE)
alldata <- melt(alldata)
ggplot(alldata, aes(x = factor(`Team Name`, ordr), y = value, fill = variable) +
labs(x = "Team Name") +
geom_bar(stat = "identity", width = .5, position = "dodge")
Using scale_x_discrete:
ordr <- alldata$`Team Name`[order(alldata$PLRankingsReverse, decreasing = TRUE)]
alldata <- melt(alldata)
ggplot(alldata, aes(x = `Team Name`, y = value, fill = variable) +
labs(x = "Team Name") +
geom_bar(stat = "identity", width =. 5, position = "dodge") +
scale_x_discrete(limits = ordr)

Drawing colored US State map with cut_number() in R

I have a dataframe called "drawdata":
GeoName Ranking
1 Alabama 15
2 Alaska 2
3 Arizona 28
4 Arkansas 12
5 California 19
6 Colorado 7
7 Connecticut 42
8 Delaware 37
9 District of Columbia 9
10 Florida 38
11 Georgia 11
12 Hawaii 48
13 Idaho 10
14 Illinois 16
15 Indiana 26
16 Iowa 34
17 Kansas 27
18 Kentucky 20
19 Louisiana 4
20 Maine 51
21 Maryland 30
22 Massachusetts 39
23 Michigan 14
24 Minnesota 23
25 Mississippi 41
26 Missouri 32
27 Montana 25
28 Nebraska 21
29 Nevada 45
30 New Hampshire 47
31 New Jersey 33
32 New Mexico 5
33 New York 44
34 North Carolina 13
35 North Dakota 31
36 Ohio 35
37 Oklahoma 6
38 Oregon 18
39 Pennsylvania 40
40 Rhode Island 49
41 South Carolina 29
42 South Dakota 46
43 Tennessee 43
44 Texas 3
45 Utah 17
46 Vermont 50
47 Virginia 8
48 Washington 24
49 West Virginia 22
50 Wisconsin 36
51 Wyoming 1
And I want to draw a US State map with different colors for each ranking. The code I have is:
names(drawdata) = c('region','value')
drawdata[,1] = tolower(drawdata[,1])
states = data.frame(state.center, state.abb)
states_map = map_data("state")
df = merge(drawdata, states_map, by = "region")
df$num = 49
p1 = ggplot(data = df, aes(x = long, y = lat, group = group))
p1 = p1 + geom_polygon(aes(fill = cut_number(value, num[1])))
p1 = p1 + geom_path(colour = 'gray', linestyle = 2)
p1 = p1 + scale_fill_brewer('', palette = 'PuRd')
p1 = p1 + coord_map()
p1 = p1 + scale_x_continuous(breaks=NULL) + scale_y_continuous(breaks=NULL)
p1 = p1 + theme(legend.position="none")
p1 = p1 + geom_text(data = states, aes(x = x, y = y, label = state.abb, group = NULL), size = 2)
p1
This perfectly works if 'num', or the number of colors to fill, is small. However, when I set 'num=49', then it produces an error:
Error in cut.default(x, breaks(x, "n", n), include.lowest = TRUE, ...) :
'breaks' are not unique
When I alter the code from
p1 = p1 + geom_polygon(aes(fill = cut_number(value, num[1])))
to
p1 = p1 + geom_polygon(aes(fill = cut_number(unique(value), num[1])))
then it gives me a different error:
Error: Aesthetics must either be length one, or the same length as the dataProblems:cut_number(unique(value), num[1])
I want a map where every 49 States in the map have different colors, each reflecting their 'Ranking'. Any help is very appreciated!
Brewer palettes deliberately have small maximums (generally < 12) since it's pretty much impossible for humans to map the subtle differences to the discrete values you have. You can achieve what you're looking for by "faking" it with scale_fill_gradient2 (NOTE: I deliberately left the legend in as you should too):
library(ggplot2)
names(drawdata) <- c('region','value')
drawdata[,1] <- tolower(drawdata[,1])
states <- data.frame(state.center, state.abb)
states <- states[!(states$state.abb %in% c("AK", "HI")),] # they aren't part of states_map
states_map <- map_data("state")
p1 <- ggplot()
# borders
p1 <- p1 + geom_map(data=states_map, map=states_map,
aes(x=long, y=lat, map_id=region),
color="white", size=0.15)
# fills
p1 <- p1 + geom_map(data=drawdata, map=states_map,
aes(fill=value, map_id=region),
color="white", size=0.15)
# labels
p1 <- p1 + geom_text(data=states,
aes(x=x, y=y, label=state.abb, group=NULL), size=2)
# decent projection
p1 <- p1 + coord_map("albers", lat0=39, lat1=45)
p1 <- p1 + scale_fill_gradient2(low="#f7f4f9", mid="#df65b0", high="#67001f")
# better theme
p1 <- p1 + labs(x=NULL, y=NULL)
p1 <- p1 + theme_bw()
p1 <- p1 + theme(panel.grid=element_blank())
p1 <- p1 + theme(panel.border=element_blank())
p1 <- p1 + theme(axis.ticks=element_blank())
p1 <- p1 + theme(axis.text=element_blank())
p1
You can get an even better result with scale_fill_distiller which does alot under the scenes to let you use a Color Brewer palette with continuous data (I'd argue you do not have continuous data tho):
p1 <- p1 + scale_fill_distiller(palette="PuRd")
I'd strongly suggest continuing to use cut like you had originally and having a max of 9 breaks to fit into the Color Brewer palette you're trying to work with. In reality, folks are still going to need a table to really grok the rankings (never assume Americans know either state shapes, locations or even the two-letter abbreviations for them), so I'd also pretty much just suggest using an actual table with full names at least with this choropleth if not in place of it.
Note also that the way you're trying to build the map deliberately excluded Alaska, Hawaii and the District of Columbia. You'll need to use a real shapefile and something like I cover here to get them to show up nicely.
If you want different colors for each state, using a gradient, you can work with scale_fill_gradient. Here is one version, using green and red at the ends of the gradient, so that each state is on that scale.
ggplot(data = df, aes(x = long, y = lat, group = group)) +
geom_polygon(aes(fill = value)) +
geom_path(colour = 'gray', linestyle = 2) +
scale_fill_gradient(low = "green", high = "red") +
coord_map() +
scale_x_continuous(breaks=NULL) + scale_y_continuous(breaks=NULL) +
theme(legend.position="none") +
geom_text(data = states, aes(x = x, y = y, label = state.abb, group = NULL), size = 2)

Resources