add AUC by group on roc plots in R - r

I have roc plots for 4 groups, I want to add auc values for each group in the legend:
## draw plots
basicplot <- ggplot(roc_long, aes(d = outcome, m = prediction, color = model)) + geom_roc(n.cuts = 0) +
+ style_roc(theme = theme_bw, xlab = "1-Specificity", ylab = "Sensitivity")
## calculate auc
calc_auc(basicplot)
PANEL group AUC
1 1 1 0.7718926
2 1 2 0.9296029
3 1 3 0.7790979
4 1 4 0.8235286
annotate <- basicplot +
ggtitle("ROC plots for 4 outcomes") +
theme(plot.title = element_text(hjust = 0.5)) +
annotate("text", x = .75, y = .25, label = paste("AUC =", round(calc_auc(basicplot)["AUC"], 3)))
annotate
My plots looks like this:
How can I add AUC to each group on the right?
Thanks!

You can extract the specific cell in the calc_auc(basicplot), using round(calc_auc(basicplot)[["AUC"]][1/2/3/4], and wrap them in a new sentence. Also you may need \n to break the long sentence on several new lines.

Related

Lineair regression plot of 1 measured variable in 4 groups with 1 group being the reference group

So I have 1 measured variabel multiple times in 4 groups of which 1 group is the refernce group where I want to make a regression plot to with the other 3 groups. How do generate that? I see lots of examples of 2 variables plotted in groups but not what I mean (like: https://community.rstudio.com/t/multiple-linear-regression-lines-in-a-graph-with-ggplot2/9328)
This my df:
df <- data.frame("Variabel" = c(1,2,3,5,6,6,3,5,8,7,4,1,3,6,8,5),
"group" = c(1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4))
And her my lin regression code
#display regression formula in plot
my.formula <- y ~ x
LM_plot <- ggplot(df, aes(x=...,y=...., shape=group, colour=group, fill=group))+
geom_point(shape=21,size=4,colour = "black" ,alpha=0.5)+
geom_smooth(method="lm",alpha=0.3,color="blue",fill="grey", formula = my.formula)+
geom_abline(intercept=0,slope=1,size=0.5,linetype="dashed",color="black")+
theme_bw(base_rect_size = 0.2) +
labs(x = "Group 1", y = "Group 2, 3 and 4") +
ggtitle("Title") +
stat_poly_eq(aes(label = paste0("atop(", ..eq.label.., ",", ..rr.label.., ")")),
formula = my.formula,
parse = TRUE)
Found it:
I changed the dataframe to 4 columns, so 1 column = 1 method with its variables...
Than for each method I created a line
LM_plot <- ggplot(testfas1, aes(x=ref method))+
geom_point(aes(y=method 2, colour="2"), size=4,alpha=1)+
geom_smooth(aes(y=method 2, colour="2"), method="lm",alpha=0.3,fill="black", formula = my.formula) +
geom_point(aes(y=method 3, colour="4"), size=4 ,alpha=1)+
geom_smooth(aes(y=method 3, colour="4"), method="lm",alpha=0.3,fill="#ff0166", formula = my.formula) +
geom_point(aes(y=method 4, colour="6"),size=4,alpha=1)+
geom_smooth(aes(y=method 4 colour="6"), method="lm",alpha=0.3,fill="#117f80", formula = my.formula) +
scale_y_continuous("methods 2/3/4") +
scale_x_continuous("refernce method 1") +
scale_colour_manual(name="legend", values=c("black", "#ff0166", "#117f80"),
guide = guide_legend(override.aes = list(alpha=0, size=1)))
LM_plot + theme_prism()

ggplot2: plotting line behind boxplot

I want to plot a line using geom_line behind my boxplot, I finally managed to combine line plotting with a boxplot. I have this dataset which I used to create a boxplot:
>head(MdataNa)
1 2 3 4 5 6 7
1 -0.02798634 -0.05740014 -0.02643664 0.02203644 0.02366325 -0.02868668 -0.01278713
2 0.20278229 0.19960302 0.10896017 0.24215229 0.31925211 0.29928739 0.15911725
3 0.06570653 0.08658396 -0.06019098 0.01437147 0.02078022 0.13814853 0.11369999
4 -0.42805441 -0.91945721 -1.05555731 -0.90877542 -0.77493682 -0.90620917 -1.00535742
5 0.39922939 0.12347996 0.06712451 0.07419287 -0.09517628 -0.12056720 -0.40863078
6 0.52821596 0.30827515 0.29733794 0.30555717 0.31636676 0.11592717 0.16957927
I have glucose concentration which should be plotted in a line behind this boxplot:
# glucose curve values
require("scales")
offconc <- c(0,0.4,0.8,1.8,3.5,6.9,7.3)
offtime <- c(9,11.4,12.9,14.9,16.7,18.3,20.5)
# now we have to scale them so they fit in the (boxplot)plot
time <- rescale(offtime, to=c(1,7))
conc <- rescale(offconc, to=c(-1,1))
glucoseConc <- data.frame(time,conc)
glucoseConc2 <- melt(glucoseConc, id = "time")
Then I plotted this data, but I was only able to plot the glucose curve in FRONT of the boxplot instead of behind it, I used this code:
boxNa <- ggplot(stack(MdataNa), aes(x = ind, y = values)) +
geom_boxplot() +
coord_cartesian(y = c(-1.5,1.5)) +
labs(list(title = "After Loess", x = "Timepoint", y = "M")) +
geom_line(data=glucoseConc2,aes(x=time,y=value),group=1)
output of the code above:
EDIT as suggested by the comments(NOT WORKING)
boxNa <- ggplot(stack(MdataNa), aes(x = ind, y = values)) +
geom_line(data=glucoseConc2,aes(x=time,y=value),group=1) +
geom_boxplot(data=stack(MdataNa), aes(x = ind, y = values)) +
coord_cartesian(y = c(-1.5,1.5)) +
labs(list(title = "After Loess", x = "Timepoint", y = "M"))
this will give the following error:
Error: Discrete value supplied to continuous scale
probably I'm doing something wrong then?
Here's a solution.
The idea is to convert the x axis in continous values:
ggplot() +
geom_line(data=glucoseConc2,aes(x=time,y=value),group=1)+
geom_boxplot(data=stack(MdataNA), aes(x = as.numeric(ind), y = values, group=ind)) +
coord_cartesian(y = c(-1.5,1.5)) +
labs(list(title = "After Loess", x = "Timepoint", y = "M"))+
scale_x_continuous(breaks=1:7)

ggplot: rescale axis (log) and cut axis

I want to plot a very simple boxplot like this in R:
desired graph
It is a log-link (Gamma distributed: jh_conc is a hormone concentration variable) Generalized linear model of a continuous dependent variable (jh_conc) for a categorical grouping variable (group: type of bee)
My script that I already have is:
> jh=read.csv("data_jh_titer.csv",header=T)
> jh
group jh_conc
1 Queens 6.38542714
2 Queens 11.22512563
3 Queens 7.74472362
4 Queens 11.56834171
5 Queens 3.74020100
6 Virgin Queens 0.06080402
7 Virgin Queens 0.12663317
8 Virgin Queens 0.08090452
9 Virgin Queens 0.04422111
10 Virgin Queens 0.14673367
11 Workers 0.03417085
12 Workers 0.02449749
13 Workers 0.02927136
14 Workers 0.01648241
15 Workers 0.02150754
fit1=glm(jh_conc~group,family=Gamma(link=log), data=jh)
ggplot(fit, aes(group, jh_conc))+
geom_boxplot(aes(fill=group))+
coord_trans(y="log")
the resulting plot looks like this:
My question is: what (geom) extensions can I use to split the y-axis and rescale them different? Also how do I add the black circles (averages; which are calculated on a log scale and then back-transformed to the original scale) horizontal lines which are significance levels based on posthoc tests performed on log transformed data: ** : p<0.01, *** :p< 0.001?
You can't create a broken numeric axis in ggplot2 by design, mainly because it visually distorts the data/differences being represented and is considered misleading.
You can however use scale_log10() + annotation_logticks() to help condense data across a wide range of values or better show heteroskedastic data. You can also use annotate to build out your p-value representation stars and bars.
Also you can easily grab information from a model using it's named attributes, here we care about fit$coef:
# make a zero intercept version for easy plotting
fit2 <- glm(jh_conc ~ 0 + group, family = Gamma(link = log), data = jh)
# extract relevant group means and use exp() to scale back
means <- data.frame(group = gsub("group", "",names(fit2$coef)), means = exp(fit2$coef))
ggplot(fit, aes(group, jh_conc)) +
geom_boxplot(aes(fill=group)) +
# plot the circles from the model extraction (means)
geom_point(data = means, aes(y = means),size = 4, shape = 21, color = "black", fill = NA) +
# use this instead of coord_trans
scale_y_log10() + annotation_logticks(sides = "l") +
# use annotate "segment" to draw the horizontal lines
annotate("segment", x = 1, xend = 2, y = 15, yend = 15) +
# use annotate "text" to add your pvalue *'s
annotate("text", x = 1.5, y = 15.5, label = "**", size = 4) +
annotate("segment", x = 1, xend = 3, y = 20, yend = 20) +
annotate("text", x = 2, y = 20.5, label = "***", size = 4) +
annotate("segment", x = 2, xend = 3, y = .2, yend = .2) +
annotate("text", x = 2.5, y = .25, label = "**", size = 4)

density shadow around the data with ggplot2 (R)

I am trying to have 2 "shadows" on the background of the below plot. These shadows should represent the density of the orange and blue points separately. Does it make sense?
Here is the ggplot to improve:
Here is the code and the data (matrix df) I used to create this plot:
PC1 PC2 aa
A_akallopisos 0.043272525 0.0151023307 2
A_akindynos -0.020707141 -0.0158198405 1
A_allardi -0.020277664 -0.0221016281 2
A_barberi -0.023165596 0.0389906701 2
A_bicinctus -0.025354572 -0.0059122384 2
A_chrysogaster 0.012608835 -0.0339330213 2
A_chrysopterus -0.022402365 -0.0092476009 1
A_clarkii -0.014474658 -0.0127024469 1
A_ephippium -0.016859412 0.0320034231 2
A_frenatus -0.024190876 0.0238499714 2
A_latezonatus -0.010718845 -0.0289904165 1
A_latifasciatus -0.005645811 -0.0183202248 2
A_mccullochi -0.031664307 -0.0096059126 2
A_melanopus -0.026915545 0.0308399009 2
A_nigripes 0.023420045 0.0293801537 2
A_ocellaris 0.052042539 0.0126144250 2
A_omanensis -0.020387101 0.0010944998 2
A_pacificus 0.042406273 -0.0260308092 2
A_percula 0.034591721 0.0071153133 2
A_perideraion 0.052830132 0.0064495142 2
A_polymnus 0.030902254 -0.0005091421 2
A_rubrocinctus -0.033318659 0.0474995722 2
A_sandaracinos 0.055839755 0.0093724082 2
A_sebae 0.021767793 -0.0218640814 2
A_tricinctus -0.016230301 -0.0018526482 1
P_biaculeatus -0.014466403 0.0024864574 2
ggplot(data=df,aes(x=PC1, y=PC2, color=factor(aa), label=rownames(df))) + ggtitle(paste('Site n° ',Sites_names[j],sep='')) +geom_smooth(se=F, method='lm')+ geom_point() + scale_color_manual(name='mutation', values = c("darkorange2","cornflowerblue"), labels = c("A","S")) + geom_text(hjust=0.5, vjust=-1 ,size=3) + xlim(-0.05,0.07)
Here are some possible approaches using stat_density2d() with geom="polygon" and mapping or setting alpha transparency for the density fill regions. If you are willing to experiment with some the parameters, I think you can get some very useful plots. Specifically, you may want to adjust the following:
n controls the smoothness of the density polygon.
h is the bandwidth of the density estimation.
bins controls the number of density levels.
df = read.table(header=TRUE, text=
" PC1 PC2 aa
A_akallopisos 0.043272525 0.0151023307 2
A_akindynos -0.020707141 -0.0158198405 1
A_allardi -0.020277664 -0.0221016281 2
A_barberi -0.023165596 0.0389906701 2
A_bicinctus -0.025354572 -0.0059122384 2
A_chrysogaster 0.012608835 -0.0339330213 2
A_chrysopterus -0.022402365 -0.0092476009 1
A_clarkii -0.014474658 -0.0127024469 1
A_ephippium -0.016859412 0.0320034231 2
A_frenatus -0.024190876 0.0238499714 2
A_latezonatus -0.010718845 -0.0289904165 1
A_latifasciatus -0.005645811 -0.0183202248 2
A_mccullochi -0.031664307 -0.0096059126 2
A_melanopus -0.026915545 0.0308399009 2
A_nigripes 0.023420045 0.0293801537 2
A_ocellaris 0.052042539 0.0126144250 2
A_omanensis -0.020387101 0.0010944998 2
A_pacificus 0.042406273 -0.0260308092 2
A_percula 0.034591721 0.0071153133 2
A_perideraion 0.052830132 0.0064495142 2
A_polymnus 0.030902254 -0.0005091421 2
A_rubrocinctus -0.033318659 0.0474995722 2
A_sandaracinos 0.055839755 0.0093724082 2
A_sebae 0.021767793 -0.0218640814 2
A_tricinctus -0.016230301 -0.0018526482 1
P_biaculeatus -0.014466403 0.0024864574 2")
library(ggplot2)
p1 = ggplot(data=df, aes(x=PC1, y=PC2, color=factor(aa), label=rownames(df))) +
ggtitle(paste('Site n° ',sep='')) +
stat_density2d(aes(fill=factor(aa), alpha = ..level..),
geom="polygon", color=NA, n=200, h=0.03, bins=4) +
geom_smooth(se=F, method='lm') +
geom_point() +
scale_color_manual(name='mutation',
values = c("darkorange2","cornflowerblue"),
labels = c("A","S")) +
scale_fill_manual( name='mutation',
values = c("darkorange2","cornflowerblue"),
labels = c("A","S")) +
geom_text(hjust=0.5, vjust=-1 ,size=3, color="black") +
scale_x_continuous(expand=c(0.3, 0)) + # Zooms out so that density polygons
scale_y_continuous(expand=c(0.3, 0)) + # don't reach edges of plot.
coord_cartesian(xlim=c(-0.05, 0.07),
ylim=c(-0.04, 0.05)) # Zooms back in for the final plot.
p2 = ggplot(data=df, aes(x=PC1, y=PC2, color=factor(aa), label=rownames(df))) +
ggtitle(paste('Site n° ',sep='')) +
stat_density2d(aes(fill=factor(aa)), alpha=0.2,
geom="polygon", color=NA, n=200, h=0.045, bins=2) +
geom_smooth(se=F, method='lm', size=1) +
geom_point(size=2) +
scale_color_manual(name='mutation',
values = c("darkorange2","cornflowerblue"),
labels = c("A","S")) +
scale_fill_manual( name='mutation',
values = c("darkorange2","cornflowerblue"),
labels = c("A","S")) +
geom_text(hjust=0.5, vjust=-1 ,size=3) +
scale_x_continuous(expand=c(0.3, 0)) + # Zooms out so that density polygons
scale_y_continuous(expand=c(0.3, 0)) + # don't reach edges of plot.
coord_cartesian(xlim=c(-0.05, 0.07),
ylim=c(-0.04, 0.05)) # Zooms back in for the final plot.
library(gridExtra)
ggsave("plots.png", plot=arrangeGrob(p1, p2, ncol=1), width=8, height=11, dpi=120)
Here's my suggestion. Using shadows or polygons is going to get pretty ugly when you overlay two colors and densities. Contour plot could be nicer to look at and is certainly easier to work with.
I've created some random data as a reproducible example and used a simple density function that uses the average distance of the nearest 5 points.
df <- data.frame(PC1 = runif(20),
PC2 = runif(20),
aa = rbinom(20,1,0.5))
point.density <- function(row){
points <- df[df$aa == row[[3]],]
x.dist <- (points$PC1 - row[[1]])^2
y.dist <- (points$PC2 - row[[2]])^2
x <- x.dist[order(x.dist)[1:5]]
y <- y.dist[order(y.dist)[1:5]]
1/mean(sqrt(x + y))
}
# you need to calculate the density for the whole grid.
res <- c(1:100)/100 # this is the resolution, so gives a 100x100 grid
plot.data0 <- data.frame(x.val = rep(res,each = length(res)),
y.val = rep(res, length(res)),
type = rep(0,length(res)^2))
plot.data1 <- data.frame(x.val = rep(res,each = length(res)),
y.val = rep(res, length(res)),
type = rep(1,length(res)^2))
plot.data <- rbind(plot.data0,plot.data1)
# we need a density value for each point type, so 2 grids
densities <- apply(plot.data,1,point.density)
plot.data <- cbind(plot.data, z.val = densities)
library(ggplot2)
# use stat_contour to draw the densities. Be careful to specify which dataset you're using
ggplot() + stat_contour(data = plot.data, aes(x=x.val, y=y.val, z=z.val, colour = factor(type)), bins = 20, alpha = 0.4) + geom_point(data = df, aes(x=PC1,y=PC2,colour = factor(aa)))
contour plot http://img34.imageshack.us/img34/6215/1yvb.png
rcontourggplot2

Plotting baseball pitches as qualitative variable by color

I was thinking of doing this in R but am new to it and would appreciate any help
I have a dataset (pitches) of baseball pitches identified by
'pitchNumber' and 'outcome' e.g S = swinging strike, B = ball, H= hit
etc.
e.g.
1 B ;
2 H ;
3 S ;
4 S ;
5 X ;
6 H; etc.
All I want to do is have a graph that plots them in a line cf BHSSXB
but replacing the letter with a small bar colored to represent the letter, with a legend, and optionally having the pitch number above the color . Somewhat like a sparkline.
Any suggestion on how to implement this much appreciated
And the same graph using ggplot.
Data courtesy of #GavinSimpson.
ggplot(baseball, aes(x=pitchNumber, y=1, ymin=0, ymax=1, colour=outcome)) +
geom_point() +
geom_linerange() +
ylab(NULL) +
xlab(NULL) +
scale_y_continuous(breaks=c(0, 1)) +
opts(
panel.background=theme_blank(),
panel.grid.minor=theme_blank(),
axis.text.y = theme_blank()
)
Here is a base graphics idea from which to work. First some dummy data:
set.seed(1)
baseball <- data.frame(pitchNumber = seq_len(50),
outcome = factor(sample(c("B","H","S","S","X","H"),
50, replace = TRUE)))
> head(baseball)
pitchNumber outcome
1 1 H
2 2 S
3 3 S
4 4 H
5 5 H
6 6 H
Next we define the colours we want:
## better colours - like ggplot for the cool kids
##cols <- c("red","green","blue","yellow")
cols <- head(hcl(seq(from = 0, to = 360,
length.out = nlevels(with(baseball, outcome)) + 1),
l = 65, c = 100), -1)
then plot the pitchNumber as a height 1 histogram-like bar (type = "h"), suppressing the normal axes, and we add on points to the tops of the bars to help visualisation:
with(baseball, plot(pitchNumber, y = rep(1, length(pitchNumber)), type = "h",
ylim = c(0, 1.2), col = cols[outcome],
ylab = "", xlab = "Pitch", axes = FALSE, lwd = 2))
with(baseball, points(pitchNumber, y = rep(1, length(pitchNumber)), pch = 16,
col = cols[outcome]))
Add on the x-axis and the plot frame, plus a legend:
axis(side = 1)
box()
## note: this assumes that the levels are in alphabetical order B,H,S,X...
legend("topleft", legend = c("Ball","Hit","Swinging Strike","X??"), lty = 1,
pch = 16, col = cols, bty = "n", ncol = 2, lwd = 2)
Gives this:
This is in response to your last comment on #Gavin's answer. I'm going to build off of the data provided by #Gavin and the ggplot2 plot by #Andrie. ggplot() supports the concept of faceting by a variable or variables. Here you want to facet by pitcher and at the pitch limit of 50 per row. We'll create a new variable that corresponds to each row we want to plot separately. The equivalent code in base graphics would entail adjusting mfrow or mfcol in par() and calling separate plots for each group of data.
#150 pitches represents a somewhat typical 9 inning game.
#Thanks to Gavin for sample data.
longGame <- rbind(baseball, baseball, baseball)
#Starter goes 95 pitches, middle relief throws 35, closer comes in for 20 and the glory
longGame$pitcher <- c(rep("S", 95), rep("M", 35), rep("C",20))
#Adjust pitchNumber accordingly
longGame$pitchNumber <- c(1:95, 1:35, 1:20)
#We want to show 50 pitches at a time, so will combine the pitcher name
#with which set of pitches this is
longGame$facet <- with(longGame, paste(pitcher, ceiling(pitchNumber / 50), sep = ""))
#Create the x-axis in increments of 1-50, by pitcher
longGame <- ddply(longGame, "facet", transform, pitchFacet = rep(1:50, 5)[1:length(facet)])
#Convert facet to factor in the right order
longGame$facet <- factor(longGame$facet, levels = c("S1", "S2", "M1", "C1"))
#Thanks to Andrie for ggplot2 function. I change the x-axis and add a facet_wrap
ggplot(longGame, aes(x=pitchFacet, y=1, ymin=0, ymax=1, colour=outcome)) +
geom_point() +
geom_linerange() +
facet_wrap(~facet, ncol = 1) +
ylab(NULL) +
xlab(NULL) +
scale_y_continuous(breaks=c(0, 1)) +
opts(
panel.background=theme_blank(),
panel.grid.minor=theme_blank(),
axis.text.y = theme_blank()
)
You can obviously change the labels for the facet variable, but the above code will produce:

Resources