Add symbol on top of ggplot2 boxplots to indicate value of variable

Add symbol on top of ggplot2 boxplots to indicate value of variable - r

Working with the following subset of a much larger dataset,
ex <- structure(list(transect_id = c(1L, 1L, 1L, 1L, 1L, 15L, 15L,
15L, 15L, 15L, 15L), number_f = c(2L, 2L, 2L, 2L, 2L, 0L, 0L,
0L, 0L, 0L, 0L), years_f = c(1L, 1L, 1L, 1L, 1L, 6L, 6L, 6L,
6L, 6L, 6L), b = c(5.036625862, 6.468666553, 8.028989792, 4.168409348,
5.790089607, 10.67796993, 9.371051788, 10.54364777, 6.904324532,
7.203606129, 9.1611166)), .Names = c("transect_id", "number_f",
"years_f", "b"), class = "data.frame", row.names = c(1L, 2L,
3L, 4L, 5L, 2045L, 2046L, 2047L, 2048L, 2049L, 2050L))
I've plotted the distributions of "b" for each of the groups indicated by "transect_id" and have colored them by "number_f", which I do here:
ggplot(aes(x=reorder(transect_id, b, FUN=median), y=b), data=ex) + geom_boxplot(aes(fill=as.factor(number_f))) + xlab('Transect ID')
What I need to do for each of the "transect_id" groups is stack symbols - asterisks or some other symbol - on top of each boxplot to provide an indication of the value of "years_f" that corresponds to each "transect_id". In the data subset below, "years_f" amounts to 1 and 6 for transect_ids 1 and 15, respectively. I'd like to see something like this, which I manually mocked up.
Also keep in mind that the dataset I'm working with is very large so I'll need to use some loop or some other way of doing this automatically. Please note that I absolutely welcome other ideas for better ways of indicating the value of "years_f" that might not overburden the figure as much as having all of these stacked symbols that will particularly be an issue for larger values of "years_f".

Try adding
annotate('text', x = c(1, 2), y = 3, label = paste0('Year_F =', unique(ex$years_f)))
to the end of your plot like so:
ggplot(aes(x=reorder(transect_id, b, FUN=median), y=b), data=ex) +
geom_boxplot(aes(fill=as.factor(number_f))) + xlab('Transect ID')+
annotate('text', x = c(1, 2), y = 3, label = paste0('Year_F =', unique(ex$years_f)))
To use it on a bigger dataset you would have to edit the x and y argument, but this might be a decent alternative. A possibility for the y coordinate could be something like 0.9 * min(ex$b).
edit In response to your comment:
You could first count how many levels there are of transect_id to specify x
len.levels <- length(levels(as.factor(ex$transect_id)))
then, you could create a summary table of the uniqe years_f variable by transect_id:
sum.table <- aggregate(years_f~reorder(ex$transect_id, ex$b, median),
data = ex, FUN = unique)
reorder(ex$transect_id, ex$b, median) years_f
1 1 1
2 15 6
and then plot as follows:
ggplot(aes(x=reorder(transect_id, b, FUN=median), y=b), data=ex) +
geom_boxplot(aes(fill=as.factor(number_f))) + xlab('Transect ID')+
annotate('text', x = 1:len.levels, y = .9 * min(ex$b),
label = paste0('Year_F =', sum.table[,2]))

Related

Have two colour scales ggplot [duplicate]

This question already has answers here:
Assign color to 2 different geoms and get 2 different legends
(3 answers)
Closed 4 years ago.
I am trying to change have separate colors for my lines and points. My data is split by Arm so at each time-point there should be two dots and two lines connecting them to the previous and future time-point.
I can get both the line and dot colors to change together, but I would like the line to be a different colour, still based on Arm though. As in, I want the lines to be light blue for Arm=1 and yellow for Arm=2, but the dots to stay they color shown below. Is this possible with ggplot?
Any help would be much appreciated.
What I have:
Code:
ggplot(head(TOT, 12), aes(x=VisitNo, y=Mean)) +
geom_line(size=1.5, aes(color=as.factor(Arm))) +
geom_point(size=3, aes(color=as.factor(Arm))) +
scale_colour_manual(values = c("blue", "orange")) +
theme_bw()
Data:
TOT <- structure(list(Arm = c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L),
VisitNo = structure(c(0L, 6L, 12L, 16L, 24L, 36L, 0L, 6L, 12L, 16L, 24L, 36L),
label = "VisitNo", class = c("labelled", "integer")),
variable = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L),
.Label = c("PWB", "SWB", "EWB", "FWB", "AC"), class = "factor"),
Mean = c(25.3025326086957, 25.4365119047619, 25.8333333333333, 21.3452380952381,
26, 26.8235294117647, 25.2272727272727, 25.6172839506173,
25.6805555555556, 21.625976744186, 26.24, 26)),
row.names = c(NA, 12L), class = "data.frame")

If you just want the lines to be a bit lighter than the points, you can use alpha to make the lines a bit transparent:
ggplot(head(TOT, 12), aes(x=VisitNo, y=Mean)) +
geom_line(size=1.5, aes(color=as.factor(Arm)), alpha = 0.4) +
geom_point(size=3, aes(color=as.factor(Arm))) +
scale_colour_manual(values = c("blue", "orange")) +
theme_bw()

Plotting multiple effect plots from logistic regression

I have a number of logistic regression models with different response variables but the same predictor variables. I want to use grid.arrange (or anything else) to make a single figure with all these effect plots that were made with the effects package. I followed the advice here to make such a graph: grid.arrange with John Fox's effects plots
library(effects)
library(gridExtra)
data <- structure(list(group = structure(c(1L, 1L, 1L, 1L, 1L,1L, 1L, 2L, 2L, 2L), .Label = c("group1", "group2"), class = "factor"),obs = c(1L, 1L, 4L, 4L, 6L, 12L, 26L, 1L, 10L, 6L),responseA = c(1L, 1L, 2L, 0L, 1L, 10L, 20L, 0L, 3L, 2L), responseB = c(0L, 0L, 2L, 4L, 6L, 4L, 8L, 1L, 8L, 5L)), .Names = c("group", "obs", "responseA","responseB"), row.names = c(53L, 54L, 55L, 56L, 57L, 58L,59L, 115L, 116L, 117L), class = "data.frame")
model1<-glm(cbind(responseA,(obs-responseA))~group,family=binomial, data=data)
model2<-glm(cbind(responseA,(obs-responseA))~group,family=binomial, data=data)
ef1 <-allEffects(model1)[[1]]
ef2 <- allEffects(model2)[[1]]
elist <- list( ef1,ef2)
class(elist) <- "efflist"
plot(elist, col=2)
The problem is that, in the models I am using the response variable in the model in the form cbind(response A,no response A), but for the figure I would like to change it to something more clean (like Response A). I tried changing the y labels by putting a list, but got a warning, and it turned both labels into "Response A".
plot(elist, ylab=c("response A","response B"),col=2)
Then tried the second method suggestion to change the class to trellis, got an error, so grid.arrange didn’t work either.
p1<-plot(allEffects(model1),ylab="Response A")
p2<-plot(allEffects(model2),ylab="Response B")
class(p1) <- class(p2) <- "trellis"
grid.arrange(p1, p2, ncol=2)
Can anyone provide a method to change each y-axis label separately?

With the ef1 and ef2 variables you created, you can try the following
plot1 <- plot(ef1, ylab = "Response A")
plot2 <- plot(ef2, ylab = "Response B")
grid.arrange(plot1, plot2, ncol=2)

Reordering factor for plotting using forcats and ggplot2 packages from tidyverse

First of all, thanks^13 to tidyverse. I want the bars in the chart below to follow the same factor levels reordered by forcats::fct_reorder (). Surprisingly, I see different order of levels in the data set when View ()ed as when they are displayed in the chart (see below). The chart should illustrate the number of failed students before and after the bonus marks (I want to sort the bars based on the number of failed students before the bonus).
MWE
ggplot (df) +
geom_bar (aes (forcats::fct_reorder (subject, FailNo, .desc= TRUE), FailNo, fill = forcats::fct_rev (Bonus)), position = 'dodge', stat = 'identity') +
theme (axis.text.x=element_text(angle=45, vjust=1.5, hjust=1.5, size = rel (1.2)))
Data output of dput (df)
structure(list(subject = structure(c(1L, 2L, 5L, 6L, 3L, 7L,
4L, 9L, 10L, 8L, 12L, 11L, 1L, 2L, 5L, 6L, 3L, 7L, 4L, 9L, 10L,
8L, 12L, 11L), .Label = c("CAB_1", "DEM_1", "SSR_2", "RRG_1",
"TTP_1", "TTP_2", "IMM_1", "RRG_2", "DEM_2", "VRR_2", "PRS_2",
"COM_2", "MEB_2", "PHH_1", "PHH_2"), class = "factor"), Bonus = structure(c(2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("After", "Before"), class = "factor"),
FailNo = c(29, 28, 20, 18, 15, 13, 12, 8, 5, 4, 4, 2, 21,
16, 16, 14, 7, 10, 10, 5, 3, 4, 4, 1)), .Names = c("subject",
"Bonus", "FailNo"), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-24L))
Bar chart
The issue
According to the table above, SSR_2 var should come in the fifth rank and IMM_1 in the sixth, however in the chart we see these two variables swapping their positions. How to sort it right after tidyverse in this case?

Use factor with unique levels for your x -axis.
ggplot (df) +
geom_bar (aes(factor(forcats::fct_reorder
(subject, FailNo, .desc= TRUE),
levels=unique(subject)),
FailNo,
fill = forcats::fct_rev (Bonus)),
position = 'dodge', stat = 'identity') +
theme(axis.text.x=element_text(angle=45, vjust=1.5, hjust=1.5, size = rel (1.2)))
Edited: #dotorate comment

Sort failNo before the bonus
library(dplyr)
df_before_bonus <- df %>% filter(Bonus == "Before") %>% arrange(desc(FailNo))
Use FailNo before the bonus to create the factor
df$subject <- factor(df$subject, levels = df_before_bonus$subject, ordered = TRUE)
Updated plot
ggplot(df) +
geom_bar(aes (x = subject, y = FailNo, fill = as.factor(Bonus)),
position = 'dodge', stat = 'identity') +
theme (axis.text.x=element_text(angle=45, vjust=1.5, hjust=1.5, size = rel (1.2)))

How to create 'clustered dotplots' for categorical data?

I wish to create a graphic, like this one from the software called Fathom.
I have a two-way table of categorical frequency data that I wish to create something like a fluctuation plot, but the key difference is that you can see the individual data points.
I've tried ggfluctuation(...), levelplots(...) and all manner of packages (like ggplot2), but with no success. I can find nothing on any forums to help either.
I'd be exceptionally grateful if someone could help direct me to, or create some code, that would achieve my objective.

Here is improved version.
sample_data = structure(list(set = structure(c(1L, 2L, 3L, 4L, 1L, 2L, 3L,
4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L), class = "factor", .Label = c("09t0101 TJ",
"09t0102 MW", "09t0201 EH", "09t0202 NH")), grade = structure(c(1L,
1L, 1L, 1L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L), .Label = c("1",
"2", "3", "4"), class = "factor"), freq = c(7L, 8L, 2L, 3L, 11L,
4L, 11L, 3L, 3L, 8L, 3L, 8L, 3L, 9L, 3L, 2L)), .Names = c("set",
"grade", "freq"), row.names = c(NA, -16L), class = "data.frame")
group = unique(sample_data$set) #Obtain the unique 'set' values for y-axis
max_x = length(unique(sample_data$grade)) #Obtain the maximum number of 'grades' to plot on x-axis
max_y = length(group) #Obtain the maximum number of 'set' to plot on y-axis
pdf(file="plot.pdf",width=8,height=6)
par(mar = c(5, 10, 4, 2)) #c(bottom, left, top, right)
plot(max_x,max_y,xlim=c(0.5,max_x+0.5),ylim=c(0.5,max_y +0.5),pch=NA,xlab="Grades",ylab=NA,xaxt="n",yaxt="n",asp=1) #asp = 1 IMPORTANT
axis(side = 2, at=c(1:length(group)), labels=c(as.vector(group)),las=2)
axis(side = 1, at=c(1:length(unique(sample_data$grade))), labels=c(as.vector(unique(sample_data$grade))))
r = 0.15 #The diameter of circles to be plotted
for (i in 1:length(group)){
a = subset(sample_data,sample_data$set==group[i]) #Subset new data.frame corresponding to first 'set'
for (j in 1:nrow(a)){
matrix_sz = ceiling(sqrt(a$freq[j])) #Determine the size of square matrix that can accomodate all the frequency
matrix_x = matrix(nrow = matrix_sz, ncol = matrix_sz) #Initiate matrix
matrix_y = matrix(nrow = matrix_sz, ncol = matrix_sz) #Initiate matrix
matrix_x[,1] = -1*((matrix_sz/2) - 0.5) #Find out relatve x co-ordinates for first column
matrix_y[1,] = 1*((matrix_sz/2) - 0.5) #Find out relatve y co-ordinates for first row
# Find out other relative co-ordinates if the size of square matrix is more than 1x1
if (matrix_sz > 1){
for (column in 2:matrix_sz){
matrix_x[,column] = matrix_x[,column - 1] + 1
}
for (row in 2:matrix_sz){
matrix_y[row,] = matrix_y[row-1,] - 1
}
}
#Determine the co-ordinate of the center of the square matrix grid
xx = as.integer(a$grade[j])
yy = i
fq = 1 #To keep track of the corresponding 'freq'
# Plot circles around the center based on relative co-ordinates
for (row in 1:matrix_sz){
for (column in 1:matrix_sz){
if (fq > a$freq[j]){break} #Break if the necessary number of points have been plotted
xx1 = xx + r * matrix_x[row, column]
yy1 = yy + r * matrix_y[row, column]
# points (x = xx1, y = yy1, pch=1)
fq = fq + 1
symbols (x = xx1, y = yy1, circles=c(r/2.25),add =TRUE,inches=FALSE,bg = "gray")
}
}
}
}
dev.off()

wrong linking point with lines in ggplot

I don't know what I'm missing but I cannot figure out a very simple task. This is a small piece of my dataframe:
dput(df)
structure(list(ID = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L), .Label = "SOU55", class = "factor"), Depth = c(2L, 4L,
6L, 8L, 10L, 12L, 14L, 16L, 18L, 20L), Value = c(211.8329815,
278.9603866, 255.6111086, 212.6163368, 193.7281895, 200.9584658,
160.9289157, 192.0664419, 174.5951019, 7.162682425)), .Names = c("ID",
"Depth", "Value"), class = "data.frame", row.names = c(NA, -10L
))
What I'm trying to do is simply plotting Depth versus Value with ggplot, this is the simple code:
ggplot(df, aes(Value, Depth))+
geom_point()+
geom_line()
and this the result:
But it is pretty different from what I really want. This is the plot made with Libreoffice:
It seems that ggplot doesn't link correctly the values. What am I doing wrong?
Thanks to all!

You need geom_path() to connect the observations in the original order. geom_line() sorts the data according to the x-aesthetic before plotting:
ggplot(df, aes(Value, Depth))+
geom_point()+
geom_path()

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Add symbol on top of ggplot2 boxplots to indicate value of variable - r

Related

Have two colour scales ggplot [duplicate]

Plotting multiple effect plots from logistic regression

Reordering factor for plotting using forcats and ggplot2 packages from tidyverse

How to create 'clustered dotplots' for categorical data?

wrong linking point with lines in ggplot

Categories

Resources