Plotting multiple effect plots from logistic regression - r

I have a number of logistic regression models with different response variables but the same predictor variables. I want to use grid.arrange (or anything else) to make a single figure with all these effect plots that were made with the effects package. I followed the advice here to make such a graph: grid.arrange with John Fox's effects plots
library(effects)
library(gridExtra)
data <- structure(list(group = structure(c(1L, 1L, 1L, 1L, 1L,1L, 1L, 2L, 2L, 2L), .Label = c("group1", "group2"), class = "factor"),obs = c(1L, 1L, 4L, 4L, 6L, 12L, 26L, 1L, 10L, 6L),responseA = c(1L, 1L, 2L, 0L, 1L, 10L, 20L, 0L, 3L, 2L), responseB = c(0L, 0L, 2L, 4L, 6L, 4L, 8L, 1L, 8L, 5L)), .Names = c("group", "obs", "responseA","responseB"), row.names = c(53L, 54L, 55L, 56L, 57L, 58L,59L, 115L, 116L, 117L), class = "data.frame")
model1<-glm(cbind(responseA,(obs-responseA))~group,family=binomial, data=data)
model2<-glm(cbind(responseA,(obs-responseA))~group,family=binomial, data=data)
ef1 <-allEffects(model1)[[1]]
ef2 <- allEffects(model2)[[1]]
elist <- list( ef1,ef2)
class(elist) <- "efflist"
plot(elist, col=2)
The problem is that, in the models I am using the response variable in the model in the form cbind(response A,no response A), but for the figure I would like to change it to something more clean (like Response A). I tried changing the y labels by putting a list, but got a warning, and it turned both labels into "Response A".
plot(elist, ylab=c("response A","response B"),col=2)
Then tried the second method suggestion to change the class to trellis, got an error, so grid.arrange didn’t work either.
p1<-plot(allEffects(model1),ylab="Response A")
p2<-plot(allEffects(model2),ylab="Response B")
class(p1) <- class(p2) <- "trellis"
grid.arrange(p1, p2, ncol=2)
Can anyone provide a method to change each y-axis label separately?

With the ef1 and ef2 variables you created, you can try the following
plot1 <- plot(ef1, ylab = "Response A")
plot2 <- plot(ef2, ylab = "Response B")
grid.arrange(plot1, plot2, ncol=2)

Related

Error message: 'x' and 'y' must have the same length

I keep getting the following error message in R whilst trying to run a simple correlation. Can anyone help?
Error message is:
Error in cor.test.default(my_data$Year, my_data$Total, method =
"spearman") : 'x' and 'y' must have the same length
this is the code I am using:
library("dplyr")
library ("ggpubr")
library("devtools")
my_data<- read.csv(file.choose())
set.seed(1234)
dplyr::sample_n(my_data, 10)
ggdensity(my_data$Total,
main = "Density plot of barrier closures",
xlab = "Year ending")
ggqqplot(my_data$Total)
shapiro.test(my_data$Total)
cor.test(my_data$Year, my_data$Total, method = "spearman")
The data I am using has two columns in a CSV file, one is labelled "year" one is labelled "total". Both columns have 39 numeric entries so the lengths of the columns is identical. Every other part of the code works fine. I am using the latest version of R and latest version of all the packages
Edit: Someone asked for my data frame so here it is:
structure(list(ï..Year = 83:121, Total = c(1L, 0L, 0L, 1L, 1L,
0L, 1L, 4L, 2L, 0L, 4L, 7L, 4L, 4L, 1L, 1L, 2L, 6L, 24L, 4L,
20L, 1L, 4L, 3L, 8L, 6L, 5L, 5L, 0L, 0L, 5L, 50L, 1L, 1L, 2L,
3L, 2L, 9L, 6L)), class = "data.frame", row.names = c(NA, -39L
))
As user2554330 rightly stated: You'll get that error if you misspecify one of the column names. As can be seen from the output of dput(my_data), the first column's name is not Year, but ï..Year. The given error does not occur with
cor.test(my_data$ï..Year, my_data$Total, method = "spearman")
(You may be able to remove the merging of this byte order mark with the column name by adding the argument fileEncoding="UTF-8-BOM" in the read.csv() call.)

What is the best way to use agricolae to do ANOVAs on a split plot design?

I'm trying to run some ANOVAs on data from a split plot experiment, ideally using the agricolae package. It's been a while since I've taken a stats class and I wanted to be sure I'm analyzing this data correctly, so I did some searching online and couldn't really find consistency in the way people were analyzing their split plot experiments. What is the best way for me to do this?
Here's the head of my data:
dput(head(rawData))
structure(list(ï..Plot = 2111:2116, Variety = structure(c(5L,
4L, 3L, 6L, 1L, 2L), .Label = c("Burbank", "Hodag", "Lamoka",
"Norkotah", "Silverton", "Snowden"), class = "factor"), Rate = c(4L,
4L, 4L, 4L, 4L, 4L), Rep = c(1L, 1L, 1L, 1L, 1L, 1L), totalTubers = c(594L,
605L, 656L, 729L, 694L, 548L), totalOzNoCulls = c(2544.18, 2382.07,
2140.69, 2401.56, 2440.56, 2503.5), totalCWTacNoCulls = c(461.76867,
432.345705, 388.535235, 435.88314, 442.96164, 454.38525), avgLWratio = c(1.260615419,
1.287949374, 1.111981583, 1.08647584, 1.350686661, 1.107173509
), Hollow = c(14L, 15L, 22L, 25L, 14L, 13L), Double = c(10L,
13L, 15L, 22L, 11L, 9L), Knob = c(86L, 80L, 139L, 156L, 77L,
126L), Researcher = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = "Wang", class = "factor"),
CullsPounds = c(1.75, 1.15, 4.7, 1.85, 0.8, 5.55), CullsOz = c(28,
18.4, 75.2, 29.6, 12.8, 88.8), totalOz = c(2572.18, 2400.47,
2215.89, 2431.16, 2453.36, 2592.3), totalCWTacCulls = c(466.85067,
435.685305, 402.184035, 441.25554, 445.28484, 470.50245)), row.names = c(NA,
6L), class = "data.frame")
For these data, the whole plot is Rate, the split plot is Variety, the block is Rep, and for discussion's sake here, we can look at totalCWTacNoCulls as the response.
Any help would be very much appreciated! I am still getting the hang of Stack Overflow, so if I have made any mistakes or shared my data wrong, please let me know and I'll change it. Thank you!
You can do this using agricolae package as follows
library(agricolae)
attach(rawData)
Rate = factor(Rate)
Variety = factor(Variety)
Rep = factor(Rep)
sp.plot(Rep, Rate, Variety, totalCWTacNoCulls)
Usage according to agricolae package is
sp.plot(block, pplot, splot, Y)
where, block is replications, pplot is main-plot Factor, splot is sub-plot Factor and Y response variable

Rotating y axis labels with mosaic plots WITHOUT overlap

This question is extremely similar to this one yet from another point of view which has not been responded.
Following the proposed code, I am able to generate mosaic plots and rotate the labels so that they are legible. The problem comes when (it seems) the mosaic() function from vcd package does not recognise the rotation and so it does not adapt the graph to fit the labels, yielding results like the following:
Is there any way to change the margins between the labels and the titles? I would be surprised if I am the first one that has encountered this issue. I am open to using other packages to get mosaic graphs if applicable as well.
Code
aux = structure(c(0L, 0L, 3L, 46L, 107L, 14L, 0L, 0L, 4L, 0L, 0L, 2L,
9L, 0L, 23L, 2L, 1L, 3L, 14L, 1L, 8L, 26L, 6L, 11L, 6L, 1L, 6L,
0L, 1L, 1L, 29L, 10L, 62L, 1L, 3L, 1L, 1L, 3L, 1L), .Dim = c(3L,
13L), .Dimnames = list(abcdefghi = c("Madrid", "Valencia", "Granada"
), jklmnopqr = c("roknbjftxcwl", "mfchldbxuyig", "gtyoxeduijpw",
"akbcefymvsiw", "ucbfxplietqk", "mzeykauprfdh", "piermgawyjht",
"chjvatqbylxo", "merhcogjflbd", "wiyrugvmhjlq", "glszdqmjhkov",
"giowaxrtsknm", "pxucytzvljqw")), class = "table")
library(vcd)
colours = c("brown","darkgreen","darkgrey","orange","darkred","gold","blue","red",
"white","pink","purple","navy","lightblue","green","peachpuff","violet","yellow","yellow4")
aux_names = names(attr(aux,"dimnames"))
mosaic(aux,main=paste(aux_names,collapse=" vs. "),
gp=gpar(fill=matrix(sample(colours,max(nrow(aux),ncol(aux))),1,max(nrow(aux),ncol(aux)))),
pop = FALSE,labeling = labeling_border(rot_labels=c(90,0,0,0),
just_labels=c("left","right")))
This code should do what i think you're after.
mosaic(aux,main=paste(aux_names,collapse=" vs. "),
gp=gpar(fill=matrix(sample(colours,max(nrow(aux),ncol(aux))),1,max(nrow(aux),ncol(aux)))),
pop = FALSE,labeling = labeling_border(rot_labels=c(90,0,0,0),
just_labels=c("left","right"),
offset_varnames = c(8,8,8,8)),
margins = c(10, 10, 10, 10))

Add symbol on top of ggplot2 boxplots to indicate value of variable

Working with the following subset of a much larger dataset,
ex <- structure(list(transect_id = c(1L, 1L, 1L, 1L, 1L, 15L, 15L,
15L, 15L, 15L, 15L), number_f = c(2L, 2L, 2L, 2L, 2L, 0L, 0L,
0L, 0L, 0L, 0L), years_f = c(1L, 1L, 1L, 1L, 1L, 6L, 6L, 6L,
6L, 6L, 6L), b = c(5.036625862, 6.468666553, 8.028989792, 4.168409348,
5.790089607, 10.67796993, 9.371051788, 10.54364777, 6.904324532,
7.203606129, 9.1611166)), .Names = c("transect_id", "number_f",
"years_f", "b"), class = "data.frame", row.names = c(1L, 2L,
3L, 4L, 5L, 2045L, 2046L, 2047L, 2048L, 2049L, 2050L))
I've plotted the distributions of "b" for each of the groups indicated by "transect_id" and have colored them by "number_f", which I do here:
ggplot(aes(x=reorder(transect_id, b, FUN=median), y=b), data=ex) + geom_boxplot(aes(fill=as.factor(number_f))) + xlab('Transect ID')
What I need to do for each of the "transect_id" groups is stack symbols - asterisks or some other symbol - on top of each boxplot to provide an indication of the value of "years_f" that corresponds to each "transect_id". In the data subset below, "years_f" amounts to 1 and 6 for transect_ids 1 and 15, respectively. I'd like to see something like this, which I manually mocked up.
Also keep in mind that the dataset I'm working with is very large so I'll need to use some loop or some other way of doing this automatically. Please note that I absolutely welcome other ideas for better ways of indicating the value of "years_f" that might not overburden the figure as much as having all of these stacked symbols that will particularly be an issue for larger values of "years_f".
Try adding
annotate('text', x = c(1, 2), y = 3, label = paste0('Year_F =', unique(ex$years_f)))
to the end of your plot like so:
ggplot(aes(x=reorder(transect_id, b, FUN=median), y=b), data=ex) +
geom_boxplot(aes(fill=as.factor(number_f))) + xlab('Transect ID')+
annotate('text', x = c(1, 2), y = 3, label = paste0('Year_F =', unique(ex$years_f)))
To use it on a bigger dataset you would have to edit the x and y argument, but this might be a decent alternative. A possibility for the y coordinate could be something like 0.9 * min(ex$b).
edit In response to your comment:
You could first count how many levels there are of transect_id to specify x
len.levels <- length(levels(as.factor(ex$transect_id)))
then, you could create a summary table of the uniqe years_f variable by transect_id:
sum.table <- aggregate(years_f~reorder(ex$transect_id, ex$b, median),
data = ex, FUN = unique)
reorder(ex$transect_id, ex$b, median) years_f
1 1 1
2 15 6
and then plot as follows:
ggplot(aes(x=reorder(transect_id, b, FUN=median), y=b), data=ex) +
geom_boxplot(aes(fill=as.factor(number_f))) + xlab('Transect ID')+
annotate('text', x = 1:len.levels, y = .9 * min(ex$b),
label = paste0('Year_F =', sum.table[,2]))

Why does a PDF plot in ggplot2 not show title nor labels?

I'm creating a simple step plot using ggplot2. If I switch the file type from PNG to PDF the plot does not show labels, ticks nor a title or a legend. What I'm doing wrong?
Data:
plotData <- structure(list(iteration = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), time = c(0L, 10L,
20L, 30L, 40L, 50L, 60L, 70L, 80L, 90L, 100L, 0L, 10L, 20L, 30L,
40L, 50L, 60L, 70L), routes = c(6L, 6L, 5L, 3L, 3L, 3L, 3L, 3L,
2L, 1L, 0L, 5L, 5L, 5L, 5L, 1L, 1L, 1L, 0L)), .Names = c("iteration",
"time", "routes"), class = "data.frame", row.names = c(NA, -19L
))
Code:
library(ggplot2)
x_axis_breaks <- seq(10, 100, by = 10)
png(file="plot.png",width=1280, height=1280)
## pdf(file="plot.pdf",width=6,height=6)
plot <- ggplot(plotData) + geom_step(data=plotData, size = 5,
mapping=aes(x=time,
y=routes, group=iteration, colour=factor(iteration)), direction="vh")
plot <- plot + scale_x_discrete(breaks=x_axis_breaks, name="time") +
scale_y_discrete(name="#routes");
plot <- plot + opts(axis.text.x=theme_text(size=36,face="bold"),
axis.text.y=theme_text(size=36,face="bold")) +
scale_colour_hue(name="iteration")
plot <- plot + opts(legend.title=theme_text(size=36,face="bold"),
legend.text=theme_text(size=36,face="bold"))
plot <- plot + opts(axis.title.x=theme_text(size=36,face="bold"),
axis.title.y=theme_text(size=36,face="bold"))
plot <- plot + opts(title="network lifetime",
plot.title=theme_text(size=36, face="bold"))
print(plot)
dev.off()
The problem occurs if I'm switching from 'png...' to 'pdf'. The data itself is plotted fine. Maybe I'm just missing some information on generating PDF plots in ggplot2?
Most likely this is due to font embedding.
R does not embed fonts by default and this causes issues that you have described on some PDF readers. Usually you will have no problems with such figures on Adobe Reader that ships with a lot of fonts, while other readers might not come with a lot of fonts (commercial ones in particular) and typically they try to substitute the missing fonts with the closest ones. Sometimes this will fail and you don't see any fonts. i often have this problem with Evince on Ubuntu, not only with R plots but any other PDF that where fonts are not embedded.
On Ubuntu you can check status of the fonts of a pdf file with pdffonts file.pdf.
Some solutions:
- use cairo_pdf device when producing pdf in R, usually this does the trick
- use extrafont package to embed the desired font (font has to be available on your OS), see here for details
In combination with ggplot you should use ggsave() for saving images:
ggsave( "plot.png", plot )
ggsave( "plot.pdf", plot )
...

Resources