Addition of exponential regression models to scatterplot - r

I have data set consisting of 7 species wet weights and size measurements. Number of observations range between 3 and 18 per species.
Species ID Wet weight(g) Size(mm)
These measurements are visualised in a ggplot scatter plot. I used the following code.
library("ggplot2")
library("reshape2")
library("tidyverse")
p<-ggplot(Wet,aes(x=Size,y=WW,colour=Species))+
geom_point(size=3)+
labs(x="\nDiameter or Length (mm)",y="Wet weight (g)\n")+
theme(axis.title.x=element_text(size=18),
axis.text.x=element_text(size=14,colour="black"),
axis.title.y=element_text(size=18),
axis.text.y=element_text(size=14,colour="black"),
axis.ticks=element_blank(),
legend.position="right",
legend.text=element_text(colour="black",size=14),
legend.title=element_blank())
p
I would like to add exponential regression models for every species and am interested in their equations, correlations coefficients and p-values.
I would appreciate any suggestions on how to add these to my scatter plot.
Thanks so much!!!
Species ID Wet Size (mm) weight(g)
Aequorea 1 195 390
Aequorea 2 225 579
Aequorea 3 224 303
Aurelia 4 235 647
Aurelia 5 170 335
Aurelia 6 155 269
Cyanea 7 370 1499
Cyanea 8 460 5000
Cyanea 9 430 2011
...

Your example data frame contains only three data points for each species, which is a very small sample size to fit an exponential regression. As a result, I decided to add more data points to your example as follows.
Wet <- read.table(text = "Species Size WW
Aequorea 195 390
Aequorea 225 579
Aequorea 224 303
Aequorea 280 1600
Aequorea 320 4000
Aurelia 235 647
Aurelia 170 335
Aurelia 155 269
Aurelia 300 2000
Aurelia 350 4500
Cyanea 370 1499
Cyanea 460 5000
Cyanea 430 2011
Cyanea 100 500
Cyanea 120 550
Cyanea 200 1000",
header = TRUE, stringsAsFactors = FALSE)
We can use geom_smooth to fit non-linear regression to each species as follows. The geom_smooth code is only for plotting. You will need to use the nls function to find out the coefficients for each species.
library(ggplot2)
p <- ggplot(Wet, aes(x = Size, y = WW, colour = Species))+
geom_point(size=3)+
labs(x = "\nDiameter or Length (mm)", y="Wet weight (g)\n")+
geom_smooth(method = "nls",
formula = y ~ a + x^b,
method.args = list(start = c(a = 1, b = 1)),
se = FALSE) +
theme(axis.title.x = element_text(size = 18),
axis.text.x = element_text(size = 14, colour = "black"),
axis.title.y = element_text(size = 18),
axis.text.y = element_text(size = 14, colour = "black"),
axis.ticks = element_blank(),
legend.position = "right",
legend.text = element_text(colour = "black", size = 14),
legend.title = element_blank())
p

Related

ggplot2 Adding data labels to grouped histograms chart

I have a table called "year" looking like this:
# Year Stars n pct
1 2015 1 778 26.5
2 2015 2 247 8.4
3 2015 3 286 9.7
4 2015 4 439 15
5 2015 5 1186 40.4
6 2016 1 678 22.7
7 2016 2 233 7.8
8 2016 3 256 8.6
9 2016 4 451 15.1
10 2016 5 1372 45.9
11 2017 1 501 24.3
12 2017 2 180 8.7
13 2017 3 215 10.4
14 2017 4 274 13.3
15 2017 5 894 43.3
16 2018 1 391 25.1
17 2018 2 125 8
18 2018 3 144 9.3
19 2018 4 196 12.6
20 2018 5 699 45
21 2019 1 474 22.4
22 2019 2 124 5.9
23 2019 3 168 8
24 2019 4 277 13.1
25 2019 5 1070 50.6
26 2020 1 148 25.3
27 2020 2 50 8.5
28 2020 3 64 10.9
29 2020 4 77 13.1
30 2020 5 247 42.2
Data represents have users have rated app from google play store through years. They rate it by giving 1 (bad) to 5 (great) stars.
I'm trying to make a chart which shows share of stars by star level and year using this code:
ggplot(year, aes(as.factor(Stars), pct)) +
geom_bar(aes(fill = as.factor(Year)), position = "dodge", stat="identity") +
scale_fill_manual(values=c("#05668D", "#028090", "#00A896", "#02C39A", "#4ecdc4", "#F0F3BD")) +
ylab("Share of stars (in %)") +
xlab("Stars") +
geom_text(label=round(year$pct, digits = 1),
position=position_dodge(0.9),
size = 4, fontface = "bold") +
ylim(0,50) +
theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank(),
panel.background = element_blank(), axis.line = element_line(colour = "black"),
plot.title = element_text(hjust = 0.5)) +
ggtitle("Share of stars")
Unfortunately, I get two problems:
1) For some reason I get the following warning message:
Warning messages:
1: Removed 1 rows containing missing values (geom_bar).
2: Removed 1 rows containing missing values (geom_text).
Meaning that data for 2020, 4 stars is missing though it should 13.1
2) My labels are all over the place while I would like them to be positioned above the corresponding column on the chart.
Could someone please help me with this problems?
You need to set the grouping correctly for the dodging to work. Instead of using ylim, which cuts off one of your bars, we can turn off axis expansion which looks better for bars going down to 0. (You may need to use ylim with a higher value to make sure all labels are printed.)
ggplot(year, aes(as.factor(Stars), pct)) +
geom_col(aes(fill = as.factor(Year)), position = "dodge") +
geom_text(
aes(label = round(pct, digits = 1), group = interaction(Stars, Year)),
position = position_dodge(0.9), size = 3, fontface = "bold", vjust = 0
) +
scale_fill_manual(values=c("#05668D", "#028090", "#00A896", "#02C39A", "#4ecdc4", "#F0F3BD")) +
scale_y_continuous(expand = c(0, 0)) +
theme(
panel.grid.major = element_blank(), panel.grid.minor = element_blank(),
panel.background = element_blank(), axis.line = element_line(colour = "black"),
plot.title = element_text(hjust = 0.5)
) +
labs(title = "Share of stars", x = "Share of stars (in %)", y = "Stars", fill = "Year")
Problem #1: You're getting that message because row #25 has a pct value of 50.6, and you're setting the maximum limit on the y-axis to 50.
Problem #2: Using position_dodge2 in your geom_text will make the labels line up, and I think adding a little space above the bar makes it more readable so I also added vjust = -1
ggplot(year, aes(as.factor(Stars), pct)) +
geom_bar(aes(fill = as.factor(Year)), position = "dodge", stat="identity") +
scale_fill_manual(values=c("#05668D", "#028090", "#00A896", "#02C39A", "#4ecdc4", "#F0F3BD")) +
ylab("Share of stars (in %)") +
xlab("Stars") +
geom_text(label=round(year$pct, digits = 1),
position=position_dodge2(width = .9),
vjust = -1,
fontface = "bold",
size = 4) +
ylim(0,50) +
theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank(),
panel.background = element_blank(), axis.line = element_line(colour = "black"),
plot.title = element_text(hjust = 0.5)) +
ggtitle("Share of stars")
Simply use position_dodge2 to fix the labels and get rid of the ylim to show all data:
library(ggplot2)
ggplot(year, aes(as.factor(Stars), pct)) +
geom_bar(aes(fill = as.factor(Year)), position = "dodge", stat="identity") +
scale_fill_manual(values=c("#05668D", "#028090", "#00A896", "#02C39A", "#4ecdc4", "#F0F3BD")) +
ylab("Share of stars (in %)") +
xlab("Stars") +
geom_text(aes(label=round(pct, digits = 1)),
position=position_dodge2(0.9),
size = 4, fontface = "bold") +
theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank(),
panel.background = element_blank(), axis.line = element_line(colour = "black"),
plot.title = element_text(hjust = 0.5)) +
ggtitle("Share of stars")

Grouped boxplot combined with dotplot in ggplot2

I am new to stackoverflow so please be forgiving, if the way of asking my question can be improved (in that case I'm happy if you let me know how).
I am using the following code in ggplot2 to produce a grouped boxplot combined with a dotplot (unfortunately I cannot post an image yet (no reputation)):
ggplot(data24, aes(x=intensity, y=percacc, fill=group)) +
geom_boxplot(position=position_dodge(1), notch=T,
outlier.colour = NA, width = .7, alpha=0.2) +
geom_dotplot(binaxis = "y", stackdir = "center", binwidth = 3,
position=position_dodge(1), dotsize=0.5, alpha=1) +
stat_summary(fun.y=mean, geom="point", shape=23, size=5,
position=position_dodge(0.3)) +
stat_summary(fun.data=mean_cl_boot, geom="errorbar", width=0.1,
position=position_dodge(0.3), size=1.2)+
scale_y_continuous(limits=c(0,100)) +
scale_fill_discrete(name="Group")
My questions are:
How can use different colors for different elements? I tried to add color/fill commands within geom_boxplot() and geom_dotplot, but this doesn't work: e.g., if I add fill="green" to geom_dotplot(), all points become green and centered between the boxplots. How can I rewrite the code to get
white fill for all boxplots
blue fill and black line for dots of group 1
green fill and black line for dots of group 2
black fill for all mean diamonds
How can pull the categories of "intensity" (i.e. the three pairs of boxplots) further apart from each other?
How can I display the dotplots next to the boxplots and the mean+CI in the boxplot?
Why does my coordination system still go from <0 to >100, even if I defined the y-axis to go from 0 to 100?
Thank you!
Edit 180722
Thank you for your comments. The data head is:
id intensity AQ_sum group acc percacc
1 54 40 11 COMP 5 20.83333
2 54 60 11 COMP 18 75.00000
3 54 80 11 COMP 24 100.00000
4 55 40 12 COMP 9 37.50000
5 55 60 12 COMP 22 91.66667
6 55 80 12 COMP 24 100.00000
7 58 40 10 COMP 8 33.33333
8 58 60 10 COMP 22 91.66667
9 58 80 10 COMP 23 95.83333
10 59 40 6 COMP 19 79.16667
11 59 60 6 COMP 24 100.00000
12 59 80 6 COMP 24 100.00000
13 60 40 9 COMP 10 41.66667
14 60 60 9 COMP 23 95.83333
15 60 80 9 COMP 22 91.66667
16 61 40 13 COMP 4 16.66667
17 61 60 13 COMP 19 79.16667
18 61 80 13 COMP 24 100.00000
19 62 40 12 COMP 16 66.66667
20 62 60 12 COMP 23 95.83333
My updated code is
ggplot(data24, aes(x=intensity, y=percacc, fill=group)) +
geom_boxplot(position=position_dodge(0.8), notch=T,
outlier.colour = NA, width = .4, alpha=0.3) +
geom_dotplot(binaxis = "y", stackdir = "center", binwidth = 3,
position=position_dodge(0.8), dotsize=0.4, alpha=1)+
scale_fill_manual(values=c("#999999", "#E69F00"), name="Group") +
stat_summary(fun.y=mean, geom="point", shape=23, size=3,
position=position_dodge(0.2)) +
stat_summary(fun.data=mean_cl_boot, geom="errorbar", width=0.1,
position=position_dodge(0.2), size=0.5)+
scale_y_continuous(limits=c(0,103), expand = c(0, 0),
breaks=c(0,20,40,60,80,100), name="Percentage accuracy") +
scale_x_discrete(expand = c(0, 0.6), name="Degree of intensity (in percent)") +
labs(title="Accuracy by intensity and group\n") +
theme_light()+
theme(plot.title = element_text(face='bold', size=12, hjust = 0.5),
axis.title.x = element_text(size=10,hjust=0.5),
axis.title.y = element_text(size=10,vjust=1),
axis.text.x = element_text(size=10,color='black'),
axis.text.y = element_text(size=10, color='black'),
panel.grid.major.y = element_line(size = 0.3, linetype = "dotted", color="darkgrey"),
panel.grid.minor.y = element_line(size = 0.3, linetype = "dotted", color="darkgrey"),
panel.grid.major.x = element_blank(),
panel.border = element_blank(),
axis.line = element_line(size = 0.5, linetype = "solid", colour = "black")) +
ggsave("plotintensity.png", width = 10, height = 5)

Add a percent to y axis labels [duplicate]

This question already has answers here:
How can I change the Y-axis figures into percentages in a barplot?
(4 answers)
Closed 4 years ago.
I'm sure I missed an obvious solution tot his problem but I can't figure out how to add a percent sign to the y axis labels.
Data Sample:
Provider Month Total_Count Total_Visits Procedures RX State
Roberts 2 19 19 0 0 IL
Allen 2 85 81 4 4 IL
Dawson 2 34 34 0 0 CA
Engle 2 104 100 4 4 CA
Goldbloom 2 7 6 1 1 NM
Nathan 2 221 192 29 20 NM
Castro 2 6 6 0 0 AK
Sherwin 2 24 24 0 0 AK
Brown 2 282 270 12 12 UT
Jackson 2 114 96 18 16 UT
Corwin 2 22 22 0 0 CO
Dorris 2 124 102 22 22 CO
Ferris 2 427 318 109 108 OH
Jeffries 2 319 237 82 67 OH
The following code gives graphs with inaccurate values because R seems to be multiplying by 100.
procs <- read.csv(paste0(dirdata, "Procedure percents Feb.csv"))
procs$Percentage <- round(procs$Procedures/procs$Total.Visits*100, 2)
procs$Percentage[is.na(procs$Percentage)] <- 0
procsplit <- split(procs, procs$State)
plots <- function(procs) {
ggplot(data = procs, aes(x= Provider, y= Percentage, fill= Percentage)) +
geom_bar(stat = "identity", position = "dodge") +
geom_text(aes(x = Provider, y = Percentage, label = sprintf("%.1f%%", Percentage)), position = position_dodge(width = 0.9), hjust = .5, vjust = 0 , angle = 0) +
theme(axis.text.x = element_text(angle = 45, vjust = .5)) +
ggtitle("Procedure Percentages- February 2018", procs$State) +
theme(plot.title = element_text(size = 22, hjust = .5, family = "serif")) +
theme(plot.subtitle = element_text(size = 18, hjust = .5, family = "serif")) +
scale_y_continuous(name = "Percentage", labels = percent)
}
lapply(procsplit, plots)
I'm not sure if there's a way to use sprintf to add it or if there's a way to paste it onto the labels.
adding + scale_y_continuous(labels = function(x) paste0(x, "%")) to the ggplot statement fixes this issue

Issue with a drawing a vertical line in ggplot for categorical variable x-axis in R

I have the following table. I want to plot a vertical line using the "st_date_wk" column for each county. Please see my code below but it DOES NOT draw the vertical line using the "st_date_wk" column. Cannot figure out what I am doing wrong here.
Any help is appreciated.
Thanks.
dfx1:
YEAR Week Area acc_sum percentage COUNTY st_date_wk
1998 10-1 250 250 12.4 133 10-4
1998 10-2 300 550 29.0 133 10-4
1998 10-3 50 600 58.0 133 10-4
1998 10-4 100 700 75.9 133 10-4
1998 10-5 100 800 100.0 133 10-4
1999 9-3 75 75 22.0 205 10-2
1999 10-1 250 250 12.4 205 10-2
1999 10-2 300 550 29.0 205 10-2
1999 10-3 50 600 58.0 205 10-2
1999 10-4 100 700 75.9 205 10-2
1999 10-5 100 800 100.0 205 10-2
.
.
dfx1$YEAR <- as.factor(dfx1$YEAR)
dfx1$COUNTY <- as.factor(dfx1$COUNTY)
dfx1$percentage <- as.numeric(dfx1$percentage)
dfx1$acc_sum <- as.numeric(dfx1$acc_sum)
dfx1$Week <- factor(dfx1$Week, ordered = T)
dfx1$st_date_wk <- factor(dfx1$st_date_wk,ordered = T)
dfx1$Week <- factor(dfx1$Week, levels=c("6-1","6-2","6-3","6-4","6-5","7-1","7-2","7-3","7-4","7-5","8-1","8-2","8-3","8-4","8-5","9-1","9-2","9-3","9-4","9-5","10-1","10-2","10-3","10-4","10-5","11-1","11-2","11-3","11-4","11-5","12-1","12-2","12-3","12-4","12-5"))
gg <- ggplot(dfx1, aes(Week,percentage, col=YEAR, group = YEAR))
gg <- gg + geom_line()
gg <- gg + facet_wrap(~COUNTY, 2, scales = "fixed")
gg <- gg + theme(text = element_text(size=15), axis.text.x = element_text(angle=90, hjust=1))
gg <- gg + geom_vline(data=dfx1, aes(xintercept = dfx1$st_date_wk), color = "blue", linetype = "dashed", size = 1.0)+ facet_wrap(~COUNTY)
plot(gg)
1: In Ops.ordered(x, from[1]) : '-' is not meaningful for ordered factors
It is a very interesting issue, and I haven't quite figured out why it does not work. However, there is a fix for it.
First, This is the data that is used in the answer:
dfx1 <- read.table(text =
"YEAR Week Area acc_sum percentage COUNTY st_date_wk
1998 10-1 250 250 12.4 133 10-4
1998 10-2 300 550 29.0 133 10-4
1998 10-3 50 600 58.0 133 10-4
1998 10-4 100 700 75.9 133 10-4
1998 10-5 100 800 100.0 133 10-4
1999 9-3 75 75 22.0 133 10-1",
header = TRUE)
Convert types of Year, COUNTY, percentage, and acc_sum:
dfx1$YEAR <- as.factor(dfx1$YEAR)
dfx1$COUNTY <- as.factor(dfx1$COUNTY)
dfx1$percentage <- as.numeric(dfx1$percentage)
dfx1$acc_sum <- as.numeric(dfx1$acc_sum)
Create a vector with the week_levels (more reader-friendly):
week_levels <- c("6-1","6-2","6-3","6-4","6-5",
"7-1","7-2","7-3","7-4","7-5",
"8-1","8-2","8-3","8-4","8-5",
"9-1","9-2","9-3","9-4","9-5",
"10-1","10-2","10-3","10-4","10-5",
"11-1","11-2","11-3","11-4","11-5",
"12-1","12-2","12-3","12-4","12-5")
Transform Week and st_date_wk to an ordered factor with the same levels:
dfx1$Week <- factor(dfx1$Week, levels = week_levels, ordered = TRUE)
dfx1$st_date_wk <- factor(dfx1$st_date_wk, levels = week_levels, ordered = TRUE)
Create labels for scale_x_discrete (a named vector where the names correspond to the breaks of the x-axis):
labels <- week_levels
names(labels) <- seq_along(week_levels)
Create the visualisation, but instead of using the factors on the x-axis, use numeric, in geom_vline() use which() to get the number that corresponds to a Week on the x-axis. Then use scale_x_discrete() to add the weeks.
library(ggplot2)
ggplot(dfx1, aes(x = as.numeric(Week), y = percentage, col=YEAR, group = YEAR)) +
geom_line() +
geom_vline(xintercept = which(levels(dfx1$Week) %in% dfx1$st_date_wk), color = "blue", linetype = "dashed") +
scale_x_continuous(breaks = seq_along(labels), labels = labels) +
theme(text = element_text(size=15), axis.text.x = element_text(angle=90, hjust=1)) +
facet_wrap(~COUNTY, 2, scales = "fixed")
This will give you:
EDIT AFTER COMMENT:
library(dplyr)
dfx1 <- merge(dfx1,
(dfx1 %>%
group_by(COUNTY, st_date_wk) %>%
summarise(x = which(levels(st_date_wk) %in% st_date_wk[COUNTY == COUNTY]))),
by = c("COUNTY", "st_date_wk"), all.x = TRUE
)
ggplot(dfx1, aes(x = as.numeric(Week), y = percentage, col=YEAR, group = YEAR)) +
geom_line() +
geom_vline(data = dfx1, aes(xintercept = x), color = "blue", linetype = "dashed") +
scale_x_continuous(breaks = seq_along(labels), labels = labels) +
theme(text = element_text(size=15), axis.text.x = element_text(angle=90, hjust=1)) +
facet_wrap(~COUNTY, 2, scales = "fixed")
You just have to change the aes in the geom_vline
aes(xintercept = dfx1$st_date_wk %>% as.numeric())

ggplot changing colors of bar plot

I came across this R script that use ggplot:
dat <- read.table(text = "A B C D E F G
1 480 780 431 295 670 360 190
2 720 350 377 255 340 615 345
3 460 480 179 560 60 735 1260
4 220 240 876 789 820 100 75", header = TRUE)
library(reshape2)
dat$row <- seq_len(nrow(dat))
dat2 <- melt(dat, id.vars = "row")
library(ggplot2)
ggplot(dat2, aes(x=variable, y=value, fill=row)) +
geom_bar(stat="identity") +
xlab("\nType") +
ylab("Time\n") +
guides(fill=FALSE) +
theme_bw()
That was what I've been looking for. However, I could not:
change the default colours (for example, I tried to use the "RdYlGn"
palette)
convert the raw values to frequencies.
Any suggestions?
You could try this:
library(reshape2)
library(dplyr)
library(ggplot2)
library(ggplot2)
dat%>%
melt(id.vars = "row",variable.name = "grp")%>%
group_by(grp)%>%
mutate(tot=sum(value), fq=value/tot)%>%
ggplot(aes(x=grp,y=fq,fill=row,label = sprintf("%.2f%%", fq*100)))+
geom_bar(stat = "identity")+
geom_text(size = 3, position = position_stack(vjust = 0.5))+
xlab("\nType") +
ylab("Time\n") +
guides(fill=FALSE) +
scale_fill_distiller(palette = "RdYlGn")+
theme_bw()

Resources