Related
I am trying to place two plots side-by-side in R and have the below example.
library(vioplot)
x <- rnorm(100)
y <- rpois(100,1)
plot(x, y, xlim=c(-5,5), ylim=c(-5,5),type='n')
vioplot(x, col="tomato", horizontal=TRUE, at=-4, add=TRUE,lty=2, rectCol="gray")
vioplot(y, col="cyan", horizontal=TRUE, at=-3, add=TRUE,lty=2)
vioplot(y, col="cyan", horizontal=TRUE, at=-2, add=TRUE,lty=2)
With this data, I'm able to make a vioplot of my x and y variables. Now, for example, I want to develop bar plots of separate count data that relates to each vioplot on the left-hand side.
counts <- c(10, 20, 30)
barplot(counts, main="Car Distribution", horiz=TRUE)
I've used the mtcars example but it could be any count data. I'm wondering if it is possible to generate these plots side-by-side so that the count plot lines up with the vioplot correctly. I do not need any y-axis labels for the count plot.
According your specifications ggplot is my recommendation
library(tidyverse)
p1 <- lst(x, y, y1=y) %>%
bind_cols() %>%
pivot_longer(1:3) %>%
ggplot(aes(name, value)) +
geom_violin(trim = FALSE)+
geom_boxplot(width=0.15) +
coord_flip()
p2 <- mtcars %>%
count(gear) %>%
ggplot(aes(gear, n)) +
geom_col()+
coord_flip()
cowplot::plot_grid(p1, p2)
In base R you can do (please note, I used boxplot, but should work with viopülot either)
par(mfrow=c(1,2))
counts <- table(mtcars$gear)
boxplot(cbind(x,y,y), col="tomato", horizontal=TRUE,lty=2, rectCol="gray")
barplot(counts, main="Car Distribution", horiz=TRUE,
names.arg=c("3 Gears", "4 Gears", "5 Gears"))
Another option if you want to use ggplot is function ggarrange() from ggpubr.
library(dplyr)
library(ggplot2)
library(ggpubr)
# Create a sample dataset
dt <- tibble(group = rep(c("x", "y"), each = 100)) %>%
mutate(value = if_else(group == "x", rnorm(200),
as.double(rpois(200, 1))))
# Combined violin/Box plot
violins <- dt %>%
ggplot(aes(value, group)) +
geom_violin(width = 0.5) +
geom_boxplot(width = 0.1)
# Bar chart
bars <- dt %>%
ggplot(aes(group)) +
geom_bar(width = 0.1) +
coord_flip()
# Combine
ggpubr::ggarrange(violins, bars + rremove("ylab") + rremove("y.text"), ncol = 2)
Output:
You can use this code:
library(vioplot)
x <- rnorm(100)
y <- rpois(100,1)
par(mfrow=c(1,2))
plot(x, y, xlim=c(-5,5), ylim=c(-5,-1),type='n')
vioplot(x, col="tomato", horizontal=TRUE, at=-4, add=TRUE,lty=2, rectCol="gray")
vioplot(y, col="cyan", horizontal=TRUE, at=-3, add=TRUE,lty=2)
vioplot(y, col="cyan", horizontal=TRUE, at=-2, add=TRUE,lty=2)
counts <- table(mtcars$gear)
barplot(counts, main="Car Distribution", horiz=TRUE,
names.arg=c("3 Gears", "4 Gears", "5 Gears"))
Output:
Thank you for your interesting question, which has motivates me to explore base R graphics features. I have tried to find a case where the side-by-side configuration between the violin plot and the barplot provides a meaningful relationship. The case is that I have a subset of iris data with various counts of the species. I want to show three statistics:
the counts of sampled each species, by showing barplots;
the spread of sepal lengths in each sampled species, by showing violin plots; and
the median petal width of each sampled species, by positioning the violin plots.
I follow #GW5's idea here to create barplots of which the positions on the axes can be controlled. I follow #IRTFM's idea here to adjust the origins of the axes.
Here is the full code:
library(vioplot)
some_iris <- iris[c(1:90, 110:139), ]
ir_counts <- some_iris |> with(Species) |> table()
ir_counts
# setosa versicolor virginica
# 50 40 30
ir_names <- names(ir_counts)
ir_colors <- c("cyan", "green", "pink")
x_vio1 <- some_iris |> subset(Species == ir_names[1]) |> with(Sepal.Length)
x_vio2 <- some_iris |> subset(Species == ir_names[2]) |> with(Sepal.Length)
x_vio3 <- some_iris |> subset(Species == ir_names[3]) |> with(Sepal.Length)
y_vio1 <- some_iris |> subset(Species == ir_names[1]) |> with(Petal.Length) |> median()
y_vio2 <- some_iris |> subset(Species == ir_names[2]) |> with(Petal.Length) |> median()
y_vio3 <- some_iris |> subset(Species == ir_names[3]) |> with(Petal.Length) |> median()
# `xpd = FALSE` to keep the grid inside the plotting boxes.
par(mfrow = c(1, 2), xpd = FALSE)
# The violin plots, put on the left side.
plot(NULL,
xlim = c(0, 10), ylim = c(0, 10), type = "n", las = 1, xaxs = "i", yaxs = "i",
xlab = "Sepal Length (cm)", ylab = " Median Petal Width (cm)")
vioplot(x_vio1, col = ir_colors[1], horizontal = TRUE, at = y_vio1, add = TRUE, lty = 2)
vioplot(x_vio2, col = ir_colors[2], horizontal = TRUE, at = y_vio2, add = TRUE, lty = 2)
vioplot(x_vio3, col = ir_colors[3], horizontal = TRUE, at = y_vio3, add = TRUE, lty = 2)
grid()
# The texts that informs the names of the species
text(labels = ir_names, y = c(y_vio1, y_vio2, y_vio3),
x = c (min(x_vio1), min(x_vio2), min(x_vio3)) - 1)
# The barplots, put on the right side.
plot(NULL,
xlim = c(0, 60), ylim = c(0, 10), yaxt = "n", type = "n",
las = 1, xlab = "Counts", ylab = "", xaxs = "i", yaxs = "i"
)
rect(xleft = 0, xright = ir_counts[1],
ybottom = y_vio1 - 0.3, ytop = y_vio1 + 0.3, col = ir_colors[1])
rect(xleft = 0, xright = ir_counts[2],
ybottom = y_vio2 - 0.3, ytop = y_vio2 + 0.3, col = ir_colors[2])
rect(xleft = 0, xright = ir_counts[3],
ybottom = y_vio3 - 0.3, ytop = y_vio3 + 0.3, col = ir_colors[3])
grid()
Here is the result:
In case you want to put labels on the barplots (on the right side), you can use mtext as follows:
# ... (The same code above)
mtext(text = ir_names, side = 2, at = c(y_vio1, y_vio2, y_vio3),
line = 0.2, las = 1 )
The resulted labels:
I am working with the R programming language. I am trying to plot some categorical and continuous data that I am working with, but I am getting an error that tells me that such plots are only possible with "only numeric variables".
library(survival)
library(ggplot2)
data(lung)
data = lung
data$sex = as.factor(data$sex)
data$status = as.factor(data$status)
data$ph.ecog = as.factor(data$ph.ecog)
str(data)
#plot
mycolours <- rainbow(length(unique(data$sex)), end = 0.6)
# png("gally.png", 500, 400, type = "cairo", pointsize = 14)
par(mar = c(4, 4, 0.5, 0.75))
plot(NULL, NULL, xlim = c(1, 5), ylim = range(data[, 1:6]) + c(-0.2, 0.2),
bty = "n", xaxt = "n", xlab = "Variable", ylab = "Standardised value")
axis(1, 1:5, labels = colnames(data)[1:6])
abline(v = 1:5, col = "#00000033", lwd = 2)
abline(h = seq(-2.5, 2.5, 0.5), col = "#00000022", lty = 2)
for (i in 1:nrow(data)) lines(as.numeric(data[i, 1:6]), col = mycolours[as.numeric(data$sex[i])])
legend("topright", c("Female", "Male"), lwd = 2, col = mycolours, bty = "n")
# dev.off()
Does anyone know if this is possible to do with both categorical and continuous data?
Thanks
Sources: R: Parallel Coordinates Plot without GGally
Yup. You just have to be careful with the values. Remember how the factors are coded internally: they are just spicy integer variables with value labels (similar to names). You can losslessly cast it to character or to numeric. For the sake of plotting, you need numbers for line coordinates, so the factor-y nature of your variables will come at the end.
Remember that the quality of your visualisation and the information content depends on the order of your variables in you data set. For factors, labels are absolutely necessary. Help the reader by doing some completely custom improvements impossible in ggplot2 in small steps!
I wrote a custom function allowing anyone to add super-legible text on top of the values that are not so obvious to interpret. Give meaningful names, choose appropriate font size, pass all those extra parameters to the custom function as an ellipsis (...)!
Here you can see that most of the dead patients are female and most of the censored ones are males. Maybe adding some points with slight jitter will give the reader idea about the distributions of these variables.
library(survival)
data(lung)
# Data preparation
lung.scaled <- apply(lung, 2, scale)
drop.column.index <- which(colnames(lung) == "sex")
lung.scaled <- lung.scaled[, -drop.column.index] # Dropping the split variable
split.var <- lung[, drop.column.index]
lung <- lung[, -drop.column.index]
mycolours <- rainbow(length(unique(split.var)), end = 0.6, v = 0.9, alpha = 0.4)
# png("gally.png", 500, 400, type = "cairo", pointsize = 14)
par(mar = c(5.5, 4, 0.5, 0.75))
plot(NULL, NULL, xlim = c(1, ncol(lung.scaled)), ylim = range(lung.scaled, na.rm = TRUE) + c(-0.2, 0.2),
bty = "n", xaxt = "n", xlab = "", ylab = "Standardised value")
axis(1, 1:ncol(lung.scaled), labels = colnames(lung), cex.axis = 0.95, las = 2)
abline(v = 1:ncol(lung), col = "#00000033", lwd = 2)
abline(h = seq(round(min(lung.scaled, na.rm = TRUE)), round(max(lung.scaled, na.rm = TRUE), 0.5)), col = "#00000022", lty = 2)
for (i in 1:nrow(lung.scaled)) lines(as.numeric(lung.scaled[i, ]), col = mycolours[as.numeric(split.var[i])])
legend("topleft", c("Female", "Male"), lwd = 3, col = mycolours, bty = "n")
# Labels for some categorical variables with a white halo for readability
labels.with.halo <- function(varname, data.scaled, labels, nhalo = 32, col.halo = "#FFFFFF44", hscale = 0.04, vscale = 0.04, ...) {
offsets <- cbind(cos(seq(0, 2*pi, length.out = nhalo + 1)) * hscale, sin(seq(0, 2*pi, length.out = nhalo + 1)) * vscale)[-(nhalo + 1), ]
ind <- which(colnames(data.scaled) == varname)
yvals <- sort(unique(data.scaled[, ind]))
for (i in 1:nhalo) text(rep(ind, length(yvals)) + offsets[i, 1], yvals + offsets[i, 2], labels = labels, col = col.halo, ...)
text(rep(ind, length(yvals)), yvals, labels = labels, ...)
}
labels.with.halo("status", lung.scaled, c("Censored", "Dead"), pos = 3)
labels.with.halo("ph.ecog", lung.scaled, c("Asymptomatic", "Symp. but ambul.", "< 50% bed", "> 50% bed"), pos = 3, cex = 0.9)
# dev.off()
I have created the following fanchart using the fanplot package. I'm trying to add axis ticks and labels to the y axis, however it's only giving me the decimals and not the full number. Looking for a solution to display the full number (e.g 4.59 and 4.61) on the y axis
I am also unsure of how to specify the breaks and number of decimal points for the labels on the y-axis using plot(). I know doing all of this in ggplot2 it would look something like this scale_y_continuous(breaks = seq(min(data.ts$Index),max(data.ts$Index),by=0.02)) . Any ideas on how to specify the breaks in the y axis as well as the number of decimal points using the base plot() feature in R?
Here is a reproductible of my dataset data.ts
structure(c(4.6049904235401, 4.60711076016453, 4.60980084146652,
4.61025389170935, 4.60544515681515, 4.60889021700954, 4.60983993107244,
4.61091608826696, 4.61138799159174, 4.61294431148318, 4.61167545843765,
4.61208284263432, 4.61421991328081, 4.61530485425155, 4.61471465043043,
4.6155992084451, 4.61195799200607, 4.61178486640435, 4.61037927954796,
4.60744590947049, 4.59979957741728, 4.59948551500254, 4.60078678080182,
4.60556092645471, 4.60934962087565, 4.60981147563749, 4.61060477704678,
4.61158365084251, 4.60963435263623, 4.61018215733317, 4.61209710959768,
4.61231368335184, 4.61071363571141, 4.61019496497916, 4.60948652606191,
4.61068813487859, 4.6084092003352, 4.60972706132393, 4.60866915174087,
4.61192565195909, 4.60878767339377, 4.61341471281265, 4.61015272152397,
4.6093479714315, 4.60750965935653, 4.60768790690338, 4.60676463096309,
4.60746490411374, 4.60885670935448, 4.60686846708382, 4.60688947889575,
4.60867708110485, 4.60448791268212, 4.60387348166032, 4.60569806689426,
4.6069320880709, 4.6087143894128, 4.61059688801283, 4.61065399116698,
4.61071421014339), .Tsp = c(2004, 2018.75, 4), class = "ts")
and here is a reproductible of the code I'm using
# # Install and Load Packages
## pacman::p_load(forecast,fanplot,tidyverse,tsbox,lubridate,readxl)
# Create an ARIMA Model using the auto.arima function
model <- auto.arima(data.ts)
# Simulate forecasts for 4 quarters (1 year) ahead
forecasts <- simulate(model, n=4)
# Create a data frame with the parameters needed for the uncertainty forecast
table <- ts_df(forecasts) %>%
rename(mode=value) %>%
mutate(time0 = rep(2019,4)) %>%
mutate(uncertainty = sd(mode)) %>%
mutate(skew = rep(0,4))
y0 <- 2019
k <- nrow(table)
# Set Percentiles
p <- seq(0.05, 0.95, 0.05)
p <- c(0.01, p, 0.99)
# Simulate a qsplitnorm distribution
fsval <- matrix(NA, nrow = length(p), ncol = k)
for (i in 1:k)
fsval[, i] <- qsplitnorm(p, mode = table$mode[i],
sd = table$uncertainty[i],
skew = table$skew[i])
# Create Plot
plot(data.ts, type = "l", col = "#75002B", lwd = 4,
xlim = c(y0 - 2,y0 + 0.75), ylim = range(fsval, data.ts),
xaxt = "n", yaxt = "n", ylab = "",xlab='',
main = '')
title(ylab = 'Log AFSI',main = 'Four-Quarter Ahead Forecast Fan - AFSI',
xlab = 'Date')
rect(y0 - 0.25, par("usr")[3] - 1, y0 + 2, par("usr")[4] + 1,
border = "gray90", col = "gray90")
fan(data = fsval, data.type = "values", probs = p,
start = y0, frequency = 4,
anchor = data.ts[time(data.ts) == y0 - .25],
fan.col = colorRampPalette(c("#75002B", "pink")),
ln = NULL, rlab = NULL)
# Add axis labels and ticks
axis(1, at = y0-2:y0 + 2, tcl = 0.5)
axis(1, at = seq(y0-2, y0 + 2, 0.25), labels = FALSE, tcl = 0.25)
abline(v = y0 - 0.25, lty = 1)
abline(v = y0 + 0.75, lty = 2)
axis(2, at = range(fsval, data.ts), las = 2, tcl = 0.5)
range(blah) will only return two values (the minimum and maximum). The at parameter of axis() requires a sequence of points at which you require axis labels. Hence, these are the only two y values you have on your plot. Take a look at using pretty(blah) or seq(min(blah), max(blah), length.out = 10).
The suggestions of #Feakster are worth looking at, but the problem here is that the y-axis margin isn't wide enough. You could do either of two things. You could round the labels so they fit within the margins, for example you could replace this
axis(2, at = range(fsval, data.ts), las = 2, tcl = 0.5)
with this
axis(2, at = range(fsval, data.ts),
labels = sprintf("%.3f", range(fsval, data.ts)), las = 2, tcl = 0.5)
Or, alternatively you could increase the y-axis margin before you make the plot by specifying:
par(mar=c(5,5,4,2)+.1)
plot(data.ts, type = "l", col = "#75002B", lwd = 4,
xlim = c(y0 - 2,y0 + 0.75), ylim = range(fsval, data.ts),
xaxt = "n", yaxt = "n", ylab = "",xlab='',
main = '')
Then everything below that should work. The mar element of par sets the number of lines printed in the margin of each axis. The default is c(5,4,4,2).
I tried to name the x axis correct.
hist(InsectSprays$count, col='pink', xlab='Sprays', labels=levels(InsectSprays$spray), xaxt='n')
axis(1, at=unique(InsectSprays$spray), labels=levels(InsectSprays$spray))
But this produces
I want the letters below the bars and not on top.
You have to plot the labels at the histogram bin midpoints. If you want to remove the axis and just have lettering, the padj will move the letters closer to the axis which you just removed.
h <- hist(InsectSprays$count, plot = FALSE)
plot(h, xaxt = "n", xlab = "Insect Sprays", ylab = "Counts",
main = "", col = "pink")
axis(1, h$mids, labels = LETTERS[1:6], tick = FALSE, padj= -1.5)
I generally think barplot are more suited for categorical variables. A solution in base R could be, with some rearrangement of the data:
d <- aggregate(InsectSprays$count, by=list(spray=InsectSprays$spray), FUN=sum)
d <- d[order(d$x, decreasing = T),]
t <- d$x
names(t) <- d$spray
barplot(t, las = 1, space = 0, col = "pink", xlab = "Sprays", ylab = "Count")
The output is the following:
Since you mentioned a ggplot solution would be nice:
library(ggplot)
library(dplyr)
InsectSprays %>%
group_by(spray) %>%
summarise(count = sum(count)) %>%
ggplot(aes(reorder(spray, -count),count)) +
geom_bar(stat = "identity", fill = "pink2") +
xlab("Sprays")
The output being:
I know the question was already asked, but i couldn't solve my problem.
I get a graph unreadale when i choose the text argument for my graph and when i choose the identify argument it's not better.
This is what i get whith this script :
VehiculeFunction <- function(data, gamme, absciss, ordinate, label, xlim, ylim){
my.data <- data[data$GAMME == gamme,]
ma.col = rgb(red = 0.1,blue = 1,green = 0.1, alpha = 0.2)
X <- my.data[[absciss]]
Y <- my.data[[ordinate]]
Z <- my.data[[label]]
X11()
plot(X, Y, pch=20, las = 1, col = ma.col, xlab = absciss, ylab = ordinate, xlim = xlim, ylim = ylim)
text(X, Y, labels = Z, pos=3, cex = 0.7, col = ma.col)
#identify(X, Y, labels = Z, cex = 0.7)
}
VehiculeFunction(data.vehicule, "I", "GMF.24", "Cout.24", "NITG", c(0,0.2), c(0,0.2))
I used iplot, but i couldn't add the identify and text argument...
I never used ggplot, so i don't know if it's could solve my problem.
Thank you for help.
A tool that might help with is facet_zoom from the ggforce package.
I don't have access to the data.vehicule object, so I will use the mtcars data.frame for an example of zooming in on a region of the graphic.
library(ggplot2)
library(ggforce)
library(dplyr)
mtcars2 <- mtcars %>% mutate(nm = rownames(mtcars))
ggplot(mtcars2) +
aes(x = wt, y = mpg, label = nm) +
geom_text()
last_plot() +
theme_bw() +
facet_zoom(x = dplyr::between(wt, 3, 4),
y = dplyr::between(mpg, 12, 17))