Related
I am working with the R programming language. I am trying to plot some categorical and continuous data that I am working with, but I am getting an error that tells me that such plots are only possible with "only numeric variables".
library(survival)
library(ggplot2)
data(lung)
data = lung
data$sex = as.factor(data$sex)
data$status = as.factor(data$status)
data$ph.ecog = as.factor(data$ph.ecog)
str(data)
#plot
mycolours <- rainbow(length(unique(data$sex)), end = 0.6)
# png("gally.png", 500, 400, type = "cairo", pointsize = 14)
par(mar = c(4, 4, 0.5, 0.75))
plot(NULL, NULL, xlim = c(1, 5), ylim = range(data[, 1:6]) + c(-0.2, 0.2),
bty = "n", xaxt = "n", xlab = "Variable", ylab = "Standardised value")
axis(1, 1:5, labels = colnames(data)[1:6])
abline(v = 1:5, col = "#00000033", lwd = 2)
abline(h = seq(-2.5, 2.5, 0.5), col = "#00000022", lty = 2)
for (i in 1:nrow(data)) lines(as.numeric(data[i, 1:6]), col = mycolours[as.numeric(data$sex[i])])
legend("topright", c("Female", "Male"), lwd = 2, col = mycolours, bty = "n")
# dev.off()
Does anyone know if this is possible to do with both categorical and continuous data?
Thanks
Sources: R: Parallel Coordinates Plot without GGally
Yup. You just have to be careful with the values. Remember how the factors are coded internally: they are just spicy integer variables with value labels (similar to names). You can losslessly cast it to character or to numeric. For the sake of plotting, you need numbers for line coordinates, so the factor-y nature of your variables will come at the end.
Remember that the quality of your visualisation and the information content depends on the order of your variables in you data set. For factors, labels are absolutely necessary. Help the reader by doing some completely custom improvements impossible in ggplot2 in small steps!
I wrote a custom function allowing anyone to add super-legible text on top of the values that are not so obvious to interpret. Give meaningful names, choose appropriate font size, pass all those extra parameters to the custom function as an ellipsis (...)!
Here you can see that most of the dead patients are female and most of the censored ones are males. Maybe adding some points with slight jitter will give the reader idea about the distributions of these variables.
library(survival)
data(lung)
# Data preparation
lung.scaled <- apply(lung, 2, scale)
drop.column.index <- which(colnames(lung) == "sex")
lung.scaled <- lung.scaled[, -drop.column.index] # Dropping the split variable
split.var <- lung[, drop.column.index]
lung <- lung[, -drop.column.index]
mycolours <- rainbow(length(unique(split.var)), end = 0.6, v = 0.9, alpha = 0.4)
# png("gally.png", 500, 400, type = "cairo", pointsize = 14)
par(mar = c(5.5, 4, 0.5, 0.75))
plot(NULL, NULL, xlim = c(1, ncol(lung.scaled)), ylim = range(lung.scaled, na.rm = TRUE) + c(-0.2, 0.2),
bty = "n", xaxt = "n", xlab = "", ylab = "Standardised value")
axis(1, 1:ncol(lung.scaled), labels = colnames(lung), cex.axis = 0.95, las = 2)
abline(v = 1:ncol(lung), col = "#00000033", lwd = 2)
abline(h = seq(round(min(lung.scaled, na.rm = TRUE)), round(max(lung.scaled, na.rm = TRUE), 0.5)), col = "#00000022", lty = 2)
for (i in 1:nrow(lung.scaled)) lines(as.numeric(lung.scaled[i, ]), col = mycolours[as.numeric(split.var[i])])
legend("topleft", c("Female", "Male"), lwd = 3, col = mycolours, bty = "n")
# Labels for some categorical variables with a white halo for readability
labels.with.halo <- function(varname, data.scaled, labels, nhalo = 32, col.halo = "#FFFFFF44", hscale = 0.04, vscale = 0.04, ...) {
offsets <- cbind(cos(seq(0, 2*pi, length.out = nhalo + 1)) * hscale, sin(seq(0, 2*pi, length.out = nhalo + 1)) * vscale)[-(nhalo + 1), ]
ind <- which(colnames(data.scaled) == varname)
yvals <- sort(unique(data.scaled[, ind]))
for (i in 1:nhalo) text(rep(ind, length(yvals)) + offsets[i, 1], yvals + offsets[i, 2], labels = labels, col = col.halo, ...)
text(rep(ind, length(yvals)), yvals, labels = labels, ...)
}
labels.with.halo("status", lung.scaled, c("Censored", "Dead"), pos = 3)
labels.with.halo("ph.ecog", lung.scaled, c("Asymptomatic", "Symp. but ambul.", "< 50% bed", "> 50% bed"), pos = 3, cex = 0.9)
# dev.off()
I carried out a post-hoc Tukey test on an ANOVA and then I made a plot of the results. I can't seem to change my x axis title or my y axis title. I get this error:
Error in plot.default(c(xi[, "lwr"], xi[, "upr"]), rep.int(yvals, 2L), :
formal argument "xlab" matched by multiple actual arguments
This is my relevant code:
tuk <- TukeyHSD(final)
plot(tuk,xlab="Differences in mean departure times", ylab="Comparisons")
I also need to change the y axis tick mark labels but I don't know how.
Thanks.
So because of how they wrote the plot() method for TukeyHSD class object you can not change the axis labels by default, this detail is buried in the ?TuketHSD man page.
But you can easily hack together a copy that does allow you to do it. First find the code for the existing method with getAnywhere(plot.TukeyHSD). Then adapt it like so:
tuk_plot <- function (x, xlab, ylab, ylabels = NULL, ...) {
for (i in seq_along(x)) {
xi <- x[[i]][, -4L, drop = FALSE]
yvals <- nrow(xi):1L
dev.hold()
on.exit(dev.flush())
plot(c(xi[, "lwr"], xi[, "upr"]), rep.int(yvals, 2L),
type = "n", axes = FALSE, xlab = "", ylab = "", main = NULL,
...)
axis(1, ...)
# change for custom axis labels
if (is.null(ylabels)) ylabels <- dimnames(xi)[[1L]]
axis(2, at = nrow(xi):1, labels = ylabels,
srt = 0, ...)
abline(h = yvals, lty = 1, lwd = 0.5, col = "lightgray")
abline(v = 0, lty = 2, lwd = 0.5, ...)
segments(xi[, "lwr"], yvals, xi[, "upr"], yvals, ...)
segments(as.vector(xi), rep.int(yvals - 0.1, 3L), as.vector(xi),
rep.int(yvals + 0.1, 3L), ...)
title(main = paste0(format(100 * attr(x, "conf.level"),
digits = 2L), "% family-wise confidence level\n"),
# change for custom axis titles
xlab = xlab, ylab = ylab)
box()
dev.flush()
on.exit()
}
}
Now you can adjust the x and y axis along with custom y-labels:
tuk_plot(tuk, "Hello X Axis", "Hello Y Axis", c("One", "Two", "Three"))
If you don't provide the y-labels the default ones from the model will show up.
Reproducible Example:
fm1 <- aov(breaks ~ wool + tension, data = warpbreaks))
tuk <- TukeyHSD(fm1, "tension")
I'm a beginner in coding. I was trying to create an interaction plot. Here's my code:
data is clinicaltrials from the data of the book "Learning Statistics with R."
library(sciplot)
library(lsr)
library(gplots)
lineplot.CI(x.factor = clin.trial$drug,
response = clin.trial$mood.gain,
group = clin.trial$therapy,
ci.fun = ciMean,
xlab = "Drug",
ylab = "Mood Gain")
and it produces the graph like this:
As can be seen in the graph, the legend box is not within my screen.
Also I tried creating another plot using the following code:
interaction.plot(x.factor = clin.trial$drug,
trace.factor = clin.trial$therapy,
response = clin.trial$mood.gain,
fun = mean,
type = "l",
lty = 1, # line type
lwd = 2, # line width
legend = T,
xlab = "Drug", ylab = "Mood Gain",
col = c("#00AFBB", "#E7B800"),
xpd = F,
trace.label = "Therapy")
For this code, I got the graph like this:
In this graph, the legend does not have labels.
Could anyone help me with these problems regarding legend?
You probably plan to save the plot via RStudio GUI. When you resize the plot window with your mouse, you need to run the code again to refresh the legend dimensions.
However, it's advantageous to use a more sophisticated method, e.g. to save it as a png with fixed dimensions like so:
library("sciplot")
library("lsr")
library("gplots")
png("Plot_1.png", height=400, width=500)
lineplot.CI(x.factor=clin.trial$drug,
response=clin.trial$mood.gain,
group=clin.trial$therapy,
ci.fun=ciMean,
xlab="Drug",
ylab="Mood Gain"
)
dev.off()
png("Plot_2.png", height=400, width=500)
interaction.plot(x.factor=clin.trial$drug,
trace.factor=clin.trial$therapy,
response=clin.trial$mood.gain,
fun=mean,
type="l",
lty=1, # line type
lwd=2, # line width
legend=T,
xlab="Drug", ylab="Mood Gain",
col=c("#00AFBB", "#E7B800"),
xpd=F,
trace.label="Therapy")
dev.off()
The plots are saved into your working directory, check getwd() .
Edit
You could also adjust the legend position.
In lineplot.CI you may use arguments; either by using characters just for x, e.g. x.leg="topleft" or both coordinates as numeric x.leg=.8, y.leg=2.2.
interaction.plot does not provide yet this functionality. I provide a hacked version below. Arguments are called xleg and yleg, functionality as above.
See ?legend for further explanations.
interaction.plot <- function (x.factor, trace.factor, response, fun = mean,
type = c("l", "p", "b", "o", "c"), legend = TRUE,
trace.label = deparse(substitute(trace.factor)),
fixed = FALSE, xlab = deparse(substitute(x.factor)),
ylab = ylabel, ylim = range(cells, na.rm = TRUE),
lty = nc:1, col = 1, pch = c(1L:9, 0, letters),
xpd = NULL, leg.bg = par("bg"), leg.bty = "n",
xtick = FALSE, xaxt = par("xaxt"), axes = TRUE,
xleg=NULL, yleg=NULL, ...) {
ylabel <- paste(deparse(substitute(fun)), "of ", deparse(substitute(response)))
type <- match.arg(type)
cells <- tapply(response, list(x.factor, trace.factor), fun)
nr <- nrow(cells)
nc <- ncol(cells)
xvals <- 1L:nr
if (is.ordered(x.factor)) {
wn <- getOption("warn")
options(warn = -1)
xnm <- as.numeric(levels(x.factor))
options(warn = wn)
if (!anyNA(xnm))
xvals <- xnm
}
xlabs <- rownames(cells)
ylabs <- colnames(cells)
nch <- max(sapply(ylabs, nchar, type = "width"))
if (is.null(xlabs))
xlabs <- as.character(xvals)
if (is.null(ylabs))
ylabs <- as.character(1L:nc)
xlim <- range(xvals)
if (is.null(xleg)) {
xleg <- xlim[2L] + 0.05 * diff(xlim)
xlim <- xlim + c(-0.2/nr, if (legend) 0.2 + 0.02 * nch else 0.2/nr) *
diff(xlim)
}
dev.hold()
on.exit(dev.flush())
matplot(xvals, cells, ..., type = type, xlim = xlim, ylim = ylim,
xlab = xlab, ylab = ylab, axes = axes, xaxt = "n",
col = col, lty = lty, pch = pch)
if (axes && xaxt != "n") {
axisInt <- function(x, main, sub, lwd, bg, log, asp,
...) axis(1, x, ...)
mgp. <- par("mgp")
if (!xtick)
mgp.[2L] <- 0
axisInt(1, at = xvals, labels = xlabs, tick = xtick,
mgp = mgp., xaxt = xaxt, ...)
}
if (legend) {
yrng <- diff(ylim)
if (is.null(yleg))
yleg <- ylim[2L] - 0.1 * yrng
if (!is.null(xpd) || {
xpd. <- par("xpd")
!is.na(xpd.) && !xpd. && (xpd <- TRUE)
}) {
op <- par(xpd = xpd)
on.exit(par(op), add = TRUE)
}
# text(xleg, ylim[2L] - 0.05 * yrng, paste(" ",
# trace.label), adj = 0)
if (!fixed) {
ord <- sort.list(cells[nr, ], decreasing = TRUE)
ylabs <- ylabs[ord]
lty <- lty[1 + (ord - 1)%%length(lty)]
col <- col[1 + (ord - 1)%%length(col)]
pch <- pch[ord]
}
legend(xleg, yleg, legend = ylabs, col = col,
title = if (trace.label == "") NULL else trace.label,
pch = if (type %in% c("p", "b"))
pch, lty = if (type %in% c("l", "b"))
lty, bty = leg.bty, bg = leg.bg)
}
invisible()
}
Data:
lk <- "https://learningstatisticswithr.com/data.zip"
tmp <- tempfile()
tmp.dir <- tempdir()
download.file(lk, tmp)
unzip(tmp, exdir=tmp.dir)
load("data/clinicaltrial.Rdata")
We can get marginal effects of a linear model with margins::margins() and can select variables of interest with option variables.
fit <- lm(mpg ~ factor(vs) + gear:factor(vs) + qsec, mtcars)
library(margins)
marg1 <- margins(fit, variables="vs")
> summary(marg1)
factor AME SE z p lower upper
vs1 4.8023 2.6769 1.7940 0.0728 -0.4443 10.0490
The package has a implemented method plot.margins, so we can plot the marginal effects
plot(marg1)
at allows us to specify the values at which to calculate the marginal effects:
marg2 <- margins(fit, variables="vs", at=list(gear=c(3, 4, 5)))
> summary(marg2)
factor gear AME SE z p lower upper
vs1 3.0000 2.8606 3.3642 0.8503 0.3952 -3.7332 9.4544
vs1 4.0000 5.6849 2.6713 2.1282 0.0333 0.4493 10.9206
vs1 5.0000 8.5093 3.8523 2.2089 0.0272 0.9588 16.0597
However, attempting to plot these specified margins will yield an error:
plot(marg2)
Error in `[.data.frame`(summ, , names(attributes(x)[["at"]]), drop = FALSE) :
undefined columns selected
Since the margins package claims to be "an R-port of Stata's 'margins' command ", I'd expect a plot similar to the one Stata gives:
So, how can we plot the predicted margins when they are specified with at?
edit:
Note that this is not really an ordinary interaction plot, since
with(mtcars[mtcars$gear %in% c(3, 4, 5), ],
interaction.plot(gear, vs, mpg, pch=rep(1, 2), type="b"))
gives a different output:
The error comes from what seems to be a bug in the plot method for objects of class "margins", plot.margins.
This is an attempt to correct it. The changes are in the function body, just execute this or save it in a file "plotmargins.R" and then source("plotmargins.R").
plot.margins <-
function (x, pos = seq_along(marginal_effects(x, with_at = FALSE)),
which = colnames(marginal_effects(x, with_at = FALSE)), labels = gsub("^dydx_",
"", which), horizontal = FALSE, xlab = "", ylab = "Average Marginal Effect",
level = 0.95, pch = 21, points.col = "black", points.bg = "black",
las = 1, cex = 1, lwd = 2, zeroline = TRUE, zero.col = "gray",
...)
{
pars <- list(...)
summ <- summary(x, level = level, by_factor = TRUE)
MEs <- summ[, "AME", drop = TRUE]
lower <- summ[, ncol(summ) - 1L]
upper <- summ[, ncol(summ)]
r <- max(upper) - min(lower)
#--- changes start here
nms <- intersect(names(summ), names(attributes(x)[["at"]]))
at_levels <- unique(summ[, nms, drop = FALSE])
#--- changes end here
n_at_levels <- nrow(at_levels)
if (n_at_levels > 1) {
pos2 <- rep(pos, each = n_at_levels)
pos2 <- pos2 + seq(from = -0.2, to = 0.2, length.out = n_at_levels)
}
else {
pos2 <- pos
}
if (isTRUE(horizontal)) {
xlim <- if ("xlim" %in% names(pars))
xlim
else c(min(lower) - 0.04 * r, max(upper) + 0.04 * r)
ylim <- if ("ylim" %in% names(pars))
xlim
else c(min(pos2) - (0.04 * min(pos2)), max(pos2) + (0.04 *
max(pos2)))
}
else {
xlim <- if ("xlim" %in% names(pars))
xlim
else c(min(pos2) - (0.04 * min(pos2)), max(pos2) + (0.04 *
max(pos2)))
ylim <- if ("ylim" %in% names(pars))
xlim
else c(min(lower) - 0.04 * r, max(upper) + 0.04 * r)
}
if (isTRUE(horizontal)) {
plot(NA, xlim = xlim, ylim = ylim, yaxt = "n", xlab = ylab,
ylab = xlab, las = las, ...)
if (isTRUE(zeroline)) {
abline(v = 0, col = zero.col)
}
points(MEs, pos2, col = points.col, bg = points.bg, pch = pch)
axis(2, at = pos, labels = as.character(labels), las = las)
mapply(function(pos, upper, lower, lwd) {
segments(upper, pos, lower, pos, col = points.col,
lwd = lwd)
}, pos2, upper, lower, seq(max(lwd), 0.25, length.out = length(MEs)))
}
else {
plot(NA, xlim = xlim, ylim = ylim, xaxt = "n", xlab = xlab,
ylab = ylab, las = las, ...)
if (isTRUE(zeroline)) {
abline(h = 0, col = zero.col)
}
points(pos2, MEs, col = points.col, bg = points.bg, pch = pch)
axis(1, at = pos, labels = as.character(labels), las = las)
mapply(function(pos, upper, lower, lwd) {
segments(pos, upper, pos, lower, col = points.col,
lwd = lwd)
}, pos2, upper, lower, seq(max(lwd), 0.25, length.out = length(MEs)))
}
invisible(x)
}
Now your code and the graph.
source("plotmargins.R")
marg2 <- margins(fit, variables = "vs",
at = list(gear = c(3, 4, 5)))
plot(marg2)
I'm using a function that among several things plots the results in two panels using par(mfrow = c(1, 2)).
I would like to run the function with three different datasets and plot the outputs together so that each one is in a row, as if I were using par(mfrow = c(3, 2)).
If there a way of doing this without modifying the function itself?
Probably basic issue, but help very much appreciated.
The function is a little bit long, but the relevant part plots a PCoA:
# perform classical multidimensional sclaing (PCoA) of the dist matrix
acop <- cmdscale(dat.d, k = nrow(as.matrix(dat.d)) - 1, eig = TRUE) # keep n-1 eigenvalues
axes.tot <- acop$eig # eig are the n eigenvalues computed by cmdscale. Axes are ranked by their
eigenvalues, so the first axis has the highest eigenvalue, the second axis has the second highest
eigenvalue, etc.
inertia <- sum(acop$eig[acop$eig > 0])
percents <- round(as.numeric(100 * axes.tot/inertia), digits = 0) # The eigenvalues represent
the variance extracted by each axis, here they are expressed as a percentage of the sum of all
eigenvalues (i.e. total variance).
par()
par(mfrow = c(1, 2), pty = "s")
coord1 <- acop$points[, c(1, 2)] # points is a matrix whose rows give the coordinates of the
points chosen to represent the dissimilarities
col.grps <- data.frame(vect.grps, coord1)
# plot so that the maximum variance is projected along the first axis, then on the second and so
on
plot(coord1, asp = 1, cex = 0.1, xlab = paste("Axis 1 ",
"(", percents[1], " % variance explained)", sep = ""),
ylab = paste("Axis 2 ", "(", percents[2], " % variance explained)",
sep = ""), main = "", type = "n", bty = "n")
abline(h = 0, lty = 2, col = "grey")
abline(v = 0, lty = 2, col = "grey")
if (length(vect.grps) == nrow(as.matrix(dat.d))) {
for (g in 1:length(names.grps)) {
text(x = coord1[col.grps[, 1] == names.grps[g], 1],
y = coord1[col.grps[, 1] == names.grps[g], 2],
labels = names.grps[g], col = topo.col[g], cex = 0.7)
}
}
else {
points(coord1, pch = 19, col = "blue", cex = 0.5)
}
coord1 <- acop$points[, c(3, 4)]
col.grps <- data.frame(vect.grps, coord1)
plot(coord1, asp = 1, cex = 0.1, xlab = paste("Axis 3 ",
"(", percents[3], " % variance explained)", sep = ""),
ylab = paste("Axis 4 ", "(", percents[4], " % variance explained)",
sep = ""), main = "", type = "n", bty = "n")
abline(h = 0, lty = 2, col = "grey")
abline(v = 0, lty = 2, col = "grey")
if (length(vect.grps) == nrow(as.matrix(dat.d))) {
for (g in 1:length(names.grps)) {
text(x = coord1[col.grps[, 1] == names.grps[g], 1],
y = coord1[col.grps[, 1] == names.grps[g], 2],
labels = names.grps[g], col = topo.col[g], cex = 0.7)
}
}
else {
points(coord1, pch = 19, col = "blue", cex = 0.5)
I guess you can overwrite par(),
foo <- function(){
par(mfrow=c(2,1), mar=c(0,0,0,0))
plot(1:10)
plot(rnorm(10))
}
par <- function(mfrow, ...) {graphics::par(mfrow=c(3, 2), ...)}
foo()
rm(par)