How to justify text axis labels in R ggplot - r

I have a bar chart with text labels along the x-axis. Some of the labels are quite lengthy and I would like to make them look neater. Any ideas of how I might achieve that?
library(sjPlot)
require(ggplot2)
require(ggthemes)
WAM_3_plot <- sjp.frq(WAM_Dec13_R2$WAM_3, title= c("WAM Item 3"),
axisLabels.x=c("Disruptive behaviour can be contained and does not spread to other patients. Generally, behaviour on the ward is positive and pro-therapeutic.",
"1", "2","3","4",
"Disruptive behaviour by one patient tends to spread to other patients and is only contained with great difficulty. The general level of behaviour seems to be getting more counter-therapeutic."),
barColor = c("palegreen4", "palegreen3", "palegreen2", "brown1", "brown2", "brown3"),
upperYlim = 25,
valueLabelSize = 5,
axisLabelSize = 1.2,
breakLabelsAt=14, returnPlot=TRUE)
WAM_3_plot + theme(axis.text.x=element_text(hjust=0.5))

Like this?
Since you didn't provide any data, we have no way of knowing what your attempt looks like, but this seems like it might be close. The main feature is the use of strwrap(...) to insert CR (\n) into your labels.
set.seed(1)
library(ggplot2)
axisLabels.x <- c("Disruptive behaviour can be contained and does not spread to other patients. Generally, behaviour on the ward is positive and pro-therapeutic.",
"1", "2","3","4",
"Disruptive behaviour by one patient tends to spread to other patients and is only contained with great difficulty. The general level of behaviour seems to be getting more counter-therapeutic.")
labels.wrap <- lapply(strwrap(axisLabels.x,50,simplify=F),paste,collapse="\n") # word wrap
gg <- data.frame(x=LETTERS[1:6], y=sample(1:10,6))
ggplot(gg) +
geom_bar(aes(x,y, fill=x), stat="identity")+
scale_x_discrete(labels=labels.wrap)+
scale_fill_discrete(guide="none")+
labs(x="",y="Response")+
coord_flip()

You may change the breakLabelsAt parameter, decrease the axisLabelSize and set flipCoordinates to TRUE, then you get similar results. I used the efc-sample data set which is included in the sjPlot-package:
data(efc)
sjp.frq(efc$e42dep, title=c("WAM Item 3"),
axisLabels.x=c("Disruptive behaviour can be contained and does not spread to other patients. Generally, behaviour on the ward is positive and pro-therapeutic.",
"1", "2",
"Disruptive behaviour by one patient tends to spread to other patients and is only contained with great difficulty. The general level of behaviour seems to be getting more counter-therapeutic."),
valueLabelSize=5,
axisLabelSize=.8,
breakLabelsAt=50,
flipCoordinates=T)

Rotating the axis labels can help a lot:
theme(axis.text.x = element_text(angle=90, vjust=0.5))

Related

Issues with axis labeling on boxplots in R

Hopefully this is a quick fix. I am trying to make boxplots of nutrient river concentrations using R code written by the person previously in my position (and I am not so experienced with R, but we use it only for this). The issue is that the output boxplot axis have multiple overlapping text, some of which seems to come from another part of the code which I thought did not dictate axis labels. The original code is shown below (the working directory is already set and csv files imported and I know that works), and the resulting boxplot is in 1.
Edit: Code below
png(filename="./TP & TN Plotting/Plots/TN Concentration/Historical TN Concentrations Zoomed to Medians.png",
width=10,
height=4,
unit="in",
res=600)
par(mar=c(5,5,3,1),
cex=.75)
tnhistconc<-(boxplot(Conc_ppb[Year!=2009 & Year !=2010]~Year[Year!=2009 & Year !=2010],
data=TNhist))
boxplot(Conc_ppb[Year!=2009 & Year !=2010]~Year[Year!=2009 & Year !=2010],
data=TNhist,
ylim=c(0,3000),
xaxt="n"),
at=c(1:3,5:8,10:(length(tnhistconc$n)+2)))
axis.break(axis=1,breakpos=c(4),style="slash")
axis.break(axis=1,breakpos=c(9),style="slash")
text(c(1:3,5:8,10:(length(tnhistconc$n)+2)),
-50,
paste("n=",
tnhistconc$n),
cex=0.8)
title(ylab="TN Concentration (ppb)",
xlab="Year")
title(main=paste("Historical (1998 - 2000), (2005 - 2008) + UMass (2012 -
",max(TNhist$Year),") TN Concentration"))
dev.off()
I made the edit of adding ,xlab="",ylab="" after xlab="Year" towards the bottom, since this fixed this issue in other similar sections of boxplot code (except it seems I needed to add it to a different part of those sections, see 2 - also tried it after xaxt ="n" as in 2 and got the same result). It fixes the overlapping text issue, but the axis labels are still not what I want them to be ("Year", and "TN Concentration (ppb)), and this is shown in 3.
So, does anyone potentially know of a simple fix that might get rid of these unwanted labels and replace them with the correct ones? Am I missing something basic? The same original code seemed to work fine in the past before I was doing this (for 2018 data), and the spreadsheets the data is being imported from are the same, same setup and everything. Many thanks in advance!
Edit: I have a sample dataset which is just the last 2 years of data. See here: https://docs.google.com/spreadsheets/d/10oo9w-IzXkLWdY10A9gHYhDH67MeSibBpc2q67L6o88/edit?usp=sharing
Original code result
How this code fixed other similar issues
Partially fixed result based on edit
I don't know if it is a typo but you have an extra parenthesis after xatx = "n".
Maybe you can try something like that:
png(filename="./TP & TN Plotting/Plots/TN Concentration/Historical TN Concentrations Zoomed to Medians.png",
width=10, height=4, unit="in", res=600)
par(mar=c(5,5,3,1), cex=.75)
tnhistconc<-(boxplot(Conc_ppb[Year!=2009 & Year !=2010]~Year[Year!=2009 & Year !=2010], data=TNhist))
boxplot(Conc_ppb[Year!=2009 & Year !=2010]~Year[Year!=2009 & Year !=2010],
data=TNhist,
ylim=c(0,3000),
xaxt="n", ylab = "", xlab = "",
at=c(1:3,5:8,10:(length(tnhistconc$n)+2)))
axis.break(axis=1,breakpos=c(4),style="slash")
axis.break(axis=1,breakpos=c(9),style="slash")
text(c(1:3,5:8,10:(length(tnhistconc$n)+2)),
-50,
paste("n=",
tnhistconc$n),
cex=0.8)
title(ylab="TN Concentration (ppb)",
xlab="Year",
main=paste("Historical (1998 - 2000), (2005 - 2008) + UMass (2012 -
",max(TNhist$Year),") TN Concentration"))
dev.off()
xatx will remove the x axis (that will control by axis.break. xlab and ylab will remove x and y axis title and they will be set later by title.
Hopefully, it will works
EDIT: Using ggplot2
Your dataframe is actually in a longer format making it easily ready to be plot using ggplot2 in few lines. Here your dataset is named df:
library(ggplot2)
ggplot(df, aes(x = as.factor(Year), y = Conc_ppb))+
geom_boxplot()+
labs(x = "Year", y = "TN Concentration (ppb)",
title = paste("Historical (1998 - 2000), (2005 - 2008) + UMass (2012 -
",max(df$Year),") TN Concentration"))

Generate randomized block schematic in R

I frequently design randomized block experiments, and my collaborators often enjoy schematics that visualize these designs. However, generating them is very time-consuming because I do this in powerpoint (I am ashamed). I would like to do this in R, but I don't really know how to get started. Below I have copied what one of these schematics looks like. What I'd really like to do is develop R code where the user can specify:
Some number of treatments within block.
A number of blocks.
A vector of colors linked to the treatments.
I'd like the output to be the first panel of the below figure, without necessarily including the "Block 1", "Block 2", etc. labels. Though, this would be a bonus. Would love if the solution was in base R.
This might be what you want. I don't see why each block cannot simply be a row as shown in the picture below. If you need some fancy shapes, you need to specify the criteria for drawing that shape for the block.
library(tidyverse)
n_blocks <- 12
m_treatments <- 5
plot_sample_treatments <- function(n_blocks,m_treatments){
df <- do.call("rbind",
lapply(1:n_blocks,
function(x) data.frame(
treatment = sample(1:m_treatments,m_treatments),
block = paste0('Block ',x),
sequence = (1:m_treatments))))
df %>% ggplot(aes(x = sequence, y = block, fill = factor(treatment))) +
geom_tile(color = 'gray30') +
scale_x_discrete(expand = c(0,0))
}
plot_sample_treatments(12,5)
which gives the following:

Changing the Order of Levels through ggplot

I am trying to learn the package cregg through the tutorial here. The tutorial works fine. However, I have an issue when I try to change the default setting of the functions. It looks like when it plots, the order of the levels and coef dots of the legend is ordered alphabetically or by numbers. My question is that when I have tried two ways: one if through the ggplot function and the second one is to change the order of levels in advance to change the order to, say 31524, both methods do not work. The original codes are as follow:
data("immigration")
stacked <- cj(immigration, ChosenImmigrant ~ Gender +
Education + LanguageSkills + CountryOfOrigin + Job + JobExperience +
JobPlans + ReasonForApplication + PriorEntry, id = ~ CaseID,
estimate = "mm", by = ~ contest_no)
plot(stacked, group = "contest_no", feature_headers = FALSE)
My question is how I can the order of levels of contest_no both on the plot and in the legend. One thing I have found is that it seems like the order of levels of contest_no is determined by the function cj first (you can check it by stacked[["contest_no"]]). Thank you!
Thanks to #Tung!(I know I left a similar comment but I still want to answer this one and close it) The answer is simple and straightforward but I didn't think it completely. In my question I kind of having the answer but I didn't know why I didn't see it. Since stacked[["contest_no"]] can show the order of levels of stacked[["contest_no"]], I just change the order by stacked[["contest_no"]] <- factor(stacked[["contest_no"]], levels=c(3, 1, 5, 2, 4)) and then plot the whole object of stacked. It works fine.

Bug in dotchart pch?

I think there may be a bug in the way the pch parameter is read within the dotchart function, but would appreciate peer confirmation before reporting it.
In the following, I would like both colour and symbol to vary with the group. Colour works fine, as expected, but not symbol.
foo <- data.frame(Specimen=paste("Specimen", 1:18),
Group=c(rep("Benign", 4),
rep("In-situ", 6),
rep("Invasive", 8)),
Outcome=rweibull(18, 5) + (1:18 / 18))
with(foo, dotchart(Outcome,
groups = Group,
color = c("green", "orange", "red")[Group],
pch=c(16, 15, 17)[Group],
xlab="Outcome measure /bar",
labels = Specimen))
There is an easy but rather bizarre workaround by reversing the "Group" column encoding pch :
with(foo, dotchart(Outcome,
groups = Group,
color = c("green", "orange", "red")[Group],
pch=c(16, 15, 17)[rev(Group)],
xlab="Outcome measure /bar",
labels = Specimen))
However, I cannot see a single legitimate reason why the vector for pch should have to be reversed, particularly since colour seems to work entirely as expected. Thoughts?
Incidentally, the reason I generally try to vary the symbol as well as the colour for different groups in a chart is for the benefit of colour blind readers. Granted, it is not so important in this case.
I agree this may be a bug (which I am genuinely cautious about in base R functions like this).
Specficially, dotchart reorders the color and lcolor (line color) arguments here:
o <- sort.list(as.numeric(groups), decreasing = TRUE)
x <- x[o]
groups <- groups[o]
color <- rep_len(color, length(groups))[o]
lcolor <- rep_len(lcolor, length(groups))[o]
...and those are used in the subsequent abline and points calls, but pch is passed on unchanged. The fix would likely be to simply add the line,
pch <- rep_len(pch, length(groups))[o]
If I wanted to put my pedantic hat on (which is a good idea before submitting a bug report), I would note that the documentation for ?dotchart specifies:
color the color(s) to be used for points and labels.
for the color argument, but only:
pch the plotting character or symbol to be used.
for the pch argument. Some may argue that this "clearly" implies that only color is intended to take multiple values, and so in that sense this isn't a "bug".
This definitely looks like a bug. I have a dataset where samples have a fairly complex 4*4 color+pch coding corresponding to things that are also in the sample names, on top of groups, and the pch values just don't seem to be reordered at all during group reordering. I'll try to submit a bug report in the next weeks. I have R 3.6.1

Geom_points not dodging when geom_errorbars are

I can't figure out how to get these geom_points to properly dodge! I've searched many, MANY how-to's and questions on different stackexchange pages, but none of them fix the problem.
analyze_weighted <- data.frame(
mus = c(clean_mu,b_mu,d_mu,g_mu,bd_mu,bg_mu,dg_mu,bdg_mu,m_mu),
sds = c(clean_sigma,b_sigma,d_sigma,g_sigma,bd_sigma,bg_sigma,dg_sigma,bdg_sigma,m_sigma),
SNR =c("No shifts","1 shift","1 shift","1 shift","2 shifts","2 shifts","2 shifts","3 shifts","4 shifts"),
)
And then I try to plot it:
ggplot(analyze_weighted, aes(x=SNR,y=mus,color=SNR,group=mus)) +
geom_point(position="dodge",na.rm=TRUE) +
geom_errorbar(position="dodge",aes(ymax=mus+sds/2,ymin=mus-sds/2,), width=0.25)
And it manages to dodge the error bars but not the points! I'm going crazy here, what do I do?
Here's what it looks like now--I want the points to be slightly dodged!
geom_point requires that you explicitly provide the width you desire the points to dodge.
This should work:
ggplot(analyze_weighted, aes(x=SNR,y=mus,color=SNR,group=mus)) +
geom_point(position=position_dodge(width=0.2),na.rm=TRUE) +
geom_errorbar(position=position_dodge(width=0.2),aes(ymax=mus+sds/2,ymin=mus-sds/2),width=0.25)
Please notice that your example wasn't a fully reproducible one, as no values of the variables used to construct mus and sds are available.

Resources