How to reorder the x axis on a stacked area plot - r

I have the following data frame and want to plot a stacked area plot:
library(ggplot2)
set.seed(11)
df <- data.frame(a = rlnorm(30), b = as.factor(1:10), c = rep(LETTERS[1:3], each = 10))
ggplot(df, aes(x = as.numeric(b), y = a, fill = c)) +
geom_area(position = 'stack') +
theme_grey() +
scale_x_discrete(labels = levels(as.factor(df$b))) +
theme(axis.text.x = element_text(angle = 90, hjust = 1))
The resulting plot on my system looks like this:
Unfortunately, the x-axis doesn't seem to show up. I want to plot the values of df$b rotated so that they don't overlap, and ultimately I would like to sort them in a specific way (haven't gotten that far yet, but I will take any suggestions).
Also, according to ?factor() using as.numeric() with a factor is not the best way to do it. When I call ggplot but leave out the as.numeric() for aes(x=... the plot comes up empty.
Is there a better way to do this?

Leave b as a factor. You will further need to add a group aesthetic which is the same as the fill aesthetic. (This tells ggplot how to "connect the dots" between separate factor levels.)
ggplot(df, aes(x = b, y = a, fill = c, group = c)) +
geom_area(position = 'stack') +
theme(axis.text.x = element_text(angle = 90, hjust = 1))
As for the order, the x-axis will go in the order of the factor levels. To change the order of the axis simply change the order of the factor levels. reorder() works well if you are basing it on a numeric column (or a function of a numeric column). For arbitrary orders, just specify the order of the levels directly in a factor call, something like: df$b = factor(df$b, levels = c("1", "5", "2", ...) For more examples of this, see the r-faq Order bars in ggplot. Yours isn't a barplot but the principle is identical.

Related

ggplot2's geom_text refusing to dodge

I have a pretty straightforward dataset consisting of a week of two totals in groups, which I'm displaying in an identity bar plot using ggplot2 (version 3.3.0).
library(ggplot2)
library(lubridate)
weeksummary <- data.frame(
Date = rep(as.POSIXct("2020-01-01") + days(0:6), 2),
Total = rpois(14, 30),
Group = c(rep("group1", 7), rep("group2", 7))
)
ggplot(data = weeksummary, mapping = aes(x = Date, y = Total, fill = Group)) +
geom_col(position = "dodge") +
geom_text(aes(label = Total), position = position_dodge(width = 0.9), size = 3)
I cannot for the life of me get this to put the numbers at the top of their own bars, been hunting around for an answer and trying everything I found with no luck, until I randomly tried this:
weeksummary$Date <- as.factor(weeksummary$Date)
But this seems unnecessary manipulation, and I'd need to make sure the dates appear in the right format and order and rewrite the additional bits that currently rely on dates... I'd rather understand what I'm doing wrong.
What you're looking for is to use as.Date.POSIXct. as.factor() works to force weeksummary$Date into a factor, but it forces the conversion of your POSIXct class into a character first (thus erasing "date"). However, you need to convert to a factor so that dodging works properly - that's the question.
You can either convert before (e.g. weeksummary$Date <- as.Date.POXIXct(weeksummary$Date)), or do it right in your plot call:
ggplot(weeksummary, aes(x = as.Date.POSIXct(Date), y = Total, fill = Group)) +
geom_col(position = 'dodge') +
geom_text(aes(label = Total, y = Total + 1),
position = position_dodge(width = 0.9), size = 3)
Giving you this:
Note: the values are different than your values, since our randomization seeds are likely not the same :)
You'll notice I nudged the labels up a bit. You can normally do this with nudge_y, but you cannot specify nudge_x or nudge_y the same time you specify a position= argument. In this case, you can just nudge by overwriting the y aesthetic.
Because geom_text inherits x aesthetics which is Date in this case, which is totally correct. You don't have to mutate your data frame, you can specify the behaviour when plotting instead
aes(x = factor(Date), y = ...),

Add new geom as new row in ggplot2, preventing layering of plots

I am pretty sure that this is easy to do but I can't seem to find a proper way to query this question into google or stack, so here we are:
I have a plot made in ggplot2 which makes use of geom_jitter(), efficiently creating one row for each element in a factor and plotting its values.
I would like to add a complementary geom_violin() to the plot, but just adding the extra geom_ function to the plot code returns two layers: the jitter and the violin, one on top of the other (as usually expected).
EDIT:
This is how the plot looks like:
How can I have the violin as a separate row, without generating a second plot?
Side quest: how I can I have the jitter and the violin geoms interleaved? (i.e. element A jitter row followed by element A violin row, and then element B jitter row followed by element B violin row)
This is the minimum required code to make it (without all the theme() embellishments):
P1 <- ggplot(data=TEST_STACK_SUB, aes(x=E, y=C, col=A)) +
theme(... , aspect.ratio=0.3) +
geom_point(position = position_jitter(w = 0.30, h = 0), alpha=0.2, size=0.5) +
geom_violin(data=TEST_STACK_SUB, mapping=aes(x=E, y=C), position="dodge") +
scale_x_discrete() +
scale_y_continuous(limits=c(0,1), breaks=seq(0,1,0.1),
labels=c(seq(0,1,0.1))) +
scale_color_gradient2(breaks=seq(0,100,20),
limits=c(0,100),
low="green3",
high="darkorchid4",
midpoint=50,
name="") +
coord_flip()
options(repr.plot.width=8, repr.plot.height=2)
plot(P1)
Here is a subset of the data to generate it (for you to try):
data
How about manipulating your factor as a continuous variable and nudging the entries across the aes() calls like so:
library(dplyr)
library(ggplot2)
set.seed(42)
tibble(x = rep(c(1, 3), each = 10),
y = c(rnorm(10, 2), rnorm(10))) -> plot_data
ggplot(plot_data) +
geom_jitter(aes(x = x - 0.5, y = y), width = 0.25) +
geom_violin(aes(x = x + 0.5, y = y, group = x), width = 0.5) +
coord_flip() +
labs(x = "x") +
scale_x_continuous(breaks = c(1, 3),
labels = paste("Level", 1:2),
trans = scales::reverse_trans())

Best way to calculate number of facets in geom_hline/_vline

When I combine geom_vline() with facet_grid() like so:
DATA <- data.frame(x = 1:6,y = 1:6, f = rep(letters[1:2],3))
ggplot(DATA,aes(x = x,y = y)) +
geom_point() +
facet_grid(f~.) +
geom_vline(xintercept = 2:3,
colour =c("goldenrod3","dodgerblue3"))
I get an error message stating Error: Aesthetics must be either length 1 or the same as the data (4): colour because there are two lines in each facet and there are two facets. One way to get around this is to use rep(c("goldenrod3","dodgerblue3"),2), but this requires that every time I change the faceting variables, I also have to calculate the number of facets and replace the magic number (2) in the call to rep(), which makes re-using ggplot code so much less nimble.
Is there a way to get the number of facets directly from ggplot for use in this situation?
You could put the xintercept and colour info into a data.frame to pass to geom_vline and then use scale_color_identity.
ggplot(DATA, aes(x = x, y = y)) +
geom_point() +
facet_grid(f~.) +
geom_vline(data = data.frame(xintercept = 2:3,
colour = c("goldenrod3","dodgerblue3") ),
aes(xintercept = xintercept, color = colour) ) +
scale_color_identity()
This side-steps the issue of figuring out the number of facets, although that could be done by pulling out the number of unique values in the faceting variable with something like length(unique(DATA$f)).

Plotting a bivariate to multiple factors in R

First of all, I'm still a beginner. I'm trying to interpret and draw a stack bar plot with R. I already took a look at a number of answers but some were not specific to my case and others I simply didn't understand:
https://stats.stackexchange.com/questions/31597/graphing-a-probability-curve-for-a-logit-model-with-multiple-predictors
https://stats.stackexchange.com/questions/47020/plotting-logistic-regression-interaction-categorical-in-r
Plot the results of a multivariate logistic regression model in R
I've got a dataset dvl that has five columns, Variant, Region, Time, Person and PrecededByPrep. I'd like to make a multivariate comparison of Variant to the other four predictors. Every column can have one of two possible values:
Variant: elk or ieder.
Region = VL or NL.
Time: time or no time
Person: person or no person
PrecededByPrep: 1 or 0
Here's the logistic regression
From the answers I gathered that the library ggplot2 might be the best drawing library to go with. I've read its documentation but for the life of me I can't figure out how to plot this: how can I get a comparison of Variant with the other three factors?
It took me a while, but I made something similar in Photoshop to what I'd like (fictional values!).
Dark gray/light gray: possible values of Variant
y-axis: frequency
x-axis: every column, subdivided into its possible values
I know to make individual bar plots, both stacked and grouped, but basically I do not know how to have stacked, grouped bar plots. ggplot2 can be used, but if it can be done without I'd prefer that.
I think this can be seen as a sample dataset, though I'm not entirely sure. I am a beginner with R and I read about creating a sample set.
t <- data.frame(Variant = sample(c("iedere","elke"),size = 50, replace = TRUE),
Region = sample(c("VL","NL"),size = 50, replace = TRUE),
PrecededByPrep = sample(c("1","0"),size = 50, replace = TRUE),
Person = sample(c("person","no person"),size = 50, replace = TRUE),
Time = sample(c("time","no time"),size = 50, replace = TRUE))
I'd like to have the plot to be aesthetically pleasing as well. What I had in mind:
Plot colours (i.e. for the bars): col=c("paleturquoise3", "palegreen3")
A bold font for the axis labels font.lab=2 but not for the value labels (e.g. ´regionin bold, butVLandNL` not in bold)
#404040 as a colour for the font, axis and lines
Labels for the axes: x: factors, y: frequency
Here is one possibility which starts with the 'un-tabulated' data frame, melt it, plot it with geom_bar in ggplot2 (which does the counting per group), separate the plot by variable by using facet_wrap.
Create toy data:
set.seed(123)
df <- data.frame(Variant = sample(c("iedere", "elke"), size = 50, replace = TRUE),
Region = sample(c("VL", "NL"), size = 50, replace = TRUE),
PrecededByPrep = sample(c("1", "0"), size = 50, replace = TRUE),
Person = sample(c("person", "no person"), size = 50, replace = TRUE),
Time = sample(c("time", "no time"), size = 50, replace = TRUE))
Reshape data:
library(reshape2)
df2 <- melt(df, id.vars = "Variant")
Plot:
library(ggplot2)
ggplot(data = df2, aes(factor(value), fill = Variant)) +
geom_bar() +
facet_wrap(~variable, nrow = 1, scales = "free_x") +
scale_fill_grey(start = 0.5) +
theme_bw()
There are lots of opportunities to customize the plot, such as setting order of factor levels, rotating axis labels, wrapping facet labels on two lines (e.g. for the longer variable name "PrecededByPrep"), or changing spacing between facets.
Customization (following updates in question and comments by OP)
# labeller function used in facet_grid to wrap "PrecededByPrep" on two lines
# see http://www.cookbook-r.com/Graphs/Facets_%28ggplot2%29/#modifying-facet-label-text
my_lab <- function(var, value){
value <- as.character(value)
if (var == "variable") {
ifelse(value == "PrecededByPrep", "Preceded\nByPrep", value)
}
}
ggplot(data = df2, aes(factor(value), fill = Variant)) +
geom_bar() +
facet_grid(~variable, scales = "free_x", labeller = my_lab) +
scale_fill_manual(values = c("paleturquoise3", "palegreen3")) + # manual fill colors
theme_bw() +
theme(axis.text = element_text(face = "bold"), # axis tick labels bold
axis.text.x = element_text(angle = 45, hjust = 1), # rotate x axis labels
line = element_line(colour = "gray25"), # line colour gray25 = #404040
strip.text = element_text(face = "bold")) + # facet labels bold
xlab("factors") + # set axis labels
ylab("frequency")
Add counts to each bar (edit following comments from OP).
The basic principles to calculate the y coordinates can be found in this Q&A. Here I use dplyr to calculate counts per bar (i.e. label in geom_text) and their y coordinates, but this could of course be done in base R, plyr or data.table.
# calculate counts (i.e. labels for geom_text) and their y positions.
library(dplyr)
df3 <- df2 %>%
group_by(variable, value, Variant) %>%
summarise(n = n()) %>%
mutate(y = cumsum(n) - (0.5 * n))
# plot
ggplot(data = df2, aes(x = factor(value), fill = Variant)) +
geom_bar() +
geom_text(data = df3, aes(y = y, label = n)) +
facet_grid(~variable, scales = "free_x", labeller = my_lab) +
scale_fill_manual(values = c("paleturquoise3", "palegreen3")) + # manual fill colors
theme_bw() +
theme(axis.text = element_text(face = "bold"), # axis tick labels bold
axis.text.x = element_text(angle = 45, hjust = 1), # rotate x axis labels
line = element_line(colour = "gray25"), # line colour gray25 = #404040
strip.text = element_text(face = "bold")) + # facet labels bold
xlab("factors") + # set axis labels
ylab("frequency")
Here is my proposition for a solution with function barplot of base R :
1. calculate the counts
l_count_df<-lapply(colnames(t)[-1],function(nomcol){table(t$Variant,t[,nomcol])})
count_df<-l_count_df[[1]]
for (i in 2:length(l_count_df)){
count_df<-cbind(count_df,l_count_df[[i]])
}
2. draw the barplot without axis names, saving the bar coordinates
par(las=1,col.axis="#404040",mar=c(5,4.5,4,2),mgp=c(3.5,1,0))
bp<-barplot(count_df,width=1.2,space=rep(c(1,0.3),4),col=c("paleturquoise3", "palegreen3"),border="#404040", axisname=F, ylab="Frequency",
legend=row.names(count_df),ylim=c(0,max(colSums(count_df))*1.2))
3. label the bars
mtext(side=1,line=0.8,at=bp,text=colnames(count_df))
mtext(side=1,line=2,at=(bp[seq(1,8,by=2)]+bp[seq(2,8,by=2)])/2,text=colnames(t)[-1],font=2)
4. add values inside the bars
for(i in 1:ncol(count_df)){
val_elke<-count_df[1,i]
val_iedere<-count_df[2,i]
text(bp[i],val_elke/2,val_elke)
text(bp[i],val_elke+val_iedere/2,val_iedere)
}
Here is what I get (with my random data) :
I'm basically answering a different question. I suppose this can be seen as perversity on my part, but I really dislike barplots of pretty much any sort. They have always seemed to create wasted space because the present informationed numerical values are less useful that an appropriately constructed table. The vcd package offers an extended mosaicplot function that seems to me to be more accurately called a "multivariate barplot that any of the ones I have seen so far. It does require that you first construct a contingency table for which the xtabs function seems a perfect fit.
install.packages)"vcd")
library(vcd)
help(package=vcd,mosaic)
col=c("paleturquoise3", "palegreen3")
vcd::mosaic(xtabs(~Variant+Region + PrecededByPrep + Time, data=ttt)
,highlighting="Variant", highlighting_fill=col)
That was the 5 way plot and this is the 5-way plot:
png(); vcd::mosaic( xtabs(
~Variant+Region + PrecededByPrep + Person + Time,
data=ttt)
,highlighting="Variant", highlighting_fill=col); dev.off()

symmetric (same axis) Heatmap with ggplot2

I'd like to use ggplot2, to create a symmetrical heatmap. The x-axis should show exactly the same labels as the y-axis. unfortunately does the ddply() method affect the order.
The input.csv looks like this:
Names,Peter,Tom,Marc
Peter,1,6,1
Tom,2,4,12
Marc,3,0,21
Im using the following code so far:
library(ggplot2)
library(plyr)
library(reshape2)
library (scales)
dat <- read.csv("input.csv")# read input
dat.m <- melt(dat)# to "melt" the dataset
dat.s <- ddply(dat.m, .(variable), transform, rescale = scale(value)) #pairwise format
file <- ggplot(dat.s, aes(Names,variable)) + geom_tile(aes(fill = value),colour = "white") + theme(axis.text.x = element_text(angle = 90, hjust = 1),legend.position="top")
pdf(file=paste("output",".pdf",sep="")) # write to file
plot(file)
# make plot
dev.off()
This results in a plot where the Y-axis (from top-to-bottom) have the labels Marc-Tom-Peter, but the X-Axis have the labels (left-to-right) Marc-Peter-Tom.
Does anyone know, how I can achieve a plot, where the labels for both axis have the same (original) order ? (Peter,Tom,Marc), note that this is just a toy example - the real data has more than 100 labels so it would not help to manually define the pairs.
Thanks in advance
First create a vector of the names ordered the way you like it:
lvls <- as.character(dat$Names)
Next order variable so it matches Names:
dat.s$variable <- factor(dat.s$variable, levels = lvls)
Now try plotting.
You could also just add the limits to your scales. Note that the default is from bottom-to-top, so if I understand you correctly, you also have to use rev to reverse the order. Here's a possible solution:
ggplot(dat.s, aes(Names,variable)) +
geom_tile(aes(fill = value),colour = "white") +
theme(axis.text.x = element_text(angle = 90, hjust = 1),legend.position="top") +
scale_x_discrete(limits = dat$Names) +
scale_y_discrete(limits = rev(dat$Names))

Resources