transform y axis to percents ggplot - r

I have used a stacked bar chart (with coord_flip) to try to compare distributions (this is one a several techniques I'm playing with) for a control and treatment group for pre and post test. Here is the plot:
and here is the code (Sorry it's not reproducible with no data set. If this is a problem I'll make up a reproducible data set as I can't share the real data):
m4 <- ggplot(data=v, aes(x=trt, fill=value))
m5 <- m4 + geom_bar() + coord_flip() +
facet_grid(time~type) + scale_fill_grey()
How can I change the y axis (which is actually on the bottom dues to coord_flip) to percents so every bar is equal in length? So I want counts to become percents. I need some sort of transformation that I'm betting ggplot has or could easily be created and applied some how.

You probably just want position_fill, by setting,
+ geom_bar(position = "fill")

Related

How to make stacked bar chart with count values on y axis>

I'm trying to create a stacked barchart with gene sequencing data, where for each gene there is a tRF.type and Amino.Acid value. An example data set looks like this:
tRF <- c('tRF-26-OB1690PQR3E', 'tRF-27-OB1690PQR3P', 'tRF-30-MIF91SS2P46I')
tRF.type <- c('5-tRF', 'i-tRF', '3-tRF')
Amino.Acid <- c('Ser', 'Lys', 'Ser')
tRF.data <- data.frame(tRF, tRF.type, Amino.Acid)
I would like the x-axis to represent the amino acid type, the y-axis the number of counts of each tRF type and the the fill of the bars to represent each tRF type.
My code is:
ggplot(chart_data, aes(x = Amino.Acid, y = tRF.type, fill = tRF.type)) +
geom_bar(stat="identity") +
ggtitle("LAN5 - 4 days post CNTF treatment") +
xlab("Amino Acid") +
ylab("tRF type")
However, it generates this graph, where the y-axis is labelled with the categories of tRF type. How can I change my code so that the y-axis scale is numerical and represents the counts of each tRF type?
Barchart
OP and Welcome to SO. In future questions, please, be sure to provide a minimal reproducible example - meaning provide code, an image (if possible), and at least a representative dataset that can demonstrate your question or problem clearly.
TL;DR - don't use stat="identity", just use geom_bar() without providing a stat, since default is to use the counts. This should work:
ggplot(chart_data, aes(x = Amino.Acid, fill = tRF.type)) + geom_bar()
The dataset provided doesn't adequately demonstrate your issue, so here's one that can work. The example data herein consists of 100 observations and two columns: one called Capitals for randomly-selected uppercase letters and one Lowercase for randomly-selected lowercase letters.
library(ggplot2)
set.seed(1234)
df <- data.frame(
Capitals=sample(LETTERS, 100, replace=TRUE),
Lowercase=sample(letters, 100, replace=TRUE)
)
If I plot similar to your code, you can see the result:
ggplot(df, aes(x=Capitals, y=Lowercase, fill=Lowercase)) +
geom_bar(stat="identity")
You can see, the bars are stacked, but the y axis is all smooshed down. The reason is related to understanding the difference between geom_bar() and geom_col(). Checking the documentation for these functions, you can see that the main difference is that geom_col() will plot bars with heights equal to the y aesthetic, whereas geom_bar() plots by default according to stat="count". In fact, using geom_bar(stat="identity") is really just a complicated way of saying geom_col().
Since your y aesthetic is not numeric, ggplot still tries to treat the discrete levels numerically. It doesn't really work out well, and it's the reason why your axis gets smooshed down like that. What you want, is geom_bar(stat="count").... which is the same as just using geom_bar() without providing a stat=.
The one problem is that geom_bar() only accepts an x or a y aesthetic. This means you should only give it one of them. This fixes the issue and now you get the proper chart:
ggplot(df, aes(x=Capitals, fill=Lowercase)) + geom_bar()
You want your y-axis to be a count, not tRF.type. This code should give you the correct plot: I've removed the y = tRF.type from ggplot(), and stat = "identity from geom_bar() (it is using the default value of stat = "count instead).
ggplot(tRF.data, aes(x = Amino.Acid, fill = tRF.type)) +
geom_bar() +
ggtitle("LAN5 - 4 days post CNTF treatment") +
xlab("Amino Acid") +
ylab("tRF type")

R: Visualization of grade point average as (sort of) violin plot

I would like to visualize a data frame much like the following in a plot:
grade number
A 2
B 6
C 1
D 0
E 1
The idea is to have the grades on the x-axis as categories and the number of pupils who received the respective grade on the y-axis.
My task is to display them not as points like in a line chart, but as thickness above the category like in a violin plot. This is really about the pure visuals of it.
I tried ggplot2's violin, but It always takes the values of the number column for the y-axis. But the y-axis is supposed to have just one single dimension: the level around which the density-plot is rotated.
I'd be very happy If someone had a hint at how I should maybe restructure my data or maybe if I am completely mistaken with my approach.
Ah, yes: on top I'd like to display the grade-point-average as a small bar.
Thank you very much in advance for taking your time. I'm sure the solution is very obvious, but I just don't see it.
As #Gregor mentioned, a smoothed density estimate (which is what a violin plot is) with just five ordinal values isn't really appropriate here. Even if you had plus/minus grades, you'd still probably be better off with bars or lines. See below for a few options:
library(ggplot2)
# Fake data
dat = data.frame(grades=LETTERS[c(1:4,6)],
count=c(5,12,11,5,3), stringsAsFactors=FALSE)
# Reusable plot elements
thm = list(theme_bw(),
scale_y_continuous(limits=c(0,max(dat$count)), breaks=seq(0,20,2)),
labs(x="Grade", y="Count"))
ggplot(dat, aes(grades, count)) +
geom_bar(stat="identity", fill=hcl(240,100,50)) +
geom_text(aes(y=0.5*count, label=paste0(count, " (", sprintf("%1.1f", count/sum(count)*100),"%)")),
colour="white", size=3) +
thm
ggplot(dat, aes(grades, count)) +
geom_line(aes(group=1),alpha=0.4) +
geom_point() +
thm
ggplot(dat, aes(x=as.numeric(factor(grades)))) +
geom_ribbon(aes(ymin=0, ymax=count), fill="grey80") +
geom_text(aes(y=count, label=paste0(sprintf("%1.1f", count/sum(count)*100),"%")), size=3) +
scale_x_continuous(labels=LETTERS[c(1:4,6)]) +
thm

Adding text to facetted histogram

Using ggplot2 I have made facetted histograms using the following code.
library(ggplot2)
library(plyr)
df1 <- data.frame(monthNo = rep(month.abb[1:5],20),
classifier = c(rep("a",50),rep("b",50)),
values = c(seq(1,10,length.out=50),seq(11,20,length.out=50))
)
means <- ddply (df1,
c(.(monthNo),.(classifier)),
summarize,
Mean=mean(values)
)
ggplot(df1,
aes(x=values, colour=as.factor(classifier))) +
geom_histogram() +
facet_wrap(~monthNo,ncol=1) +
geom_vline(data=means, aes(xintercept=Mean, colour=as.factor(classifier)),
linetype="dashed", size=1)
The vertical line showing means per month is to stay.
But I want to also add text over these vertical lines displaying the mean values for each month. These means are from the 'means' data frame.
I have looked at geom_text and I can add text to plots. But it appears my circumstance is a little different and not so easy. It's a lot simpler to add text in some cases where you just add values of the plotted data points. But cases like this when you want to add the mean and not the value of the histograms I just can't find the solution.
Please help. Thanks.
Having noted the possible duplicate (another answer of mine), the solution here might not be as (initially/intuitively) obvious. You can do what you need if you split the geom_text call into two (for each classifier):
ggplot(df1, aes(x=values, fill=as.factor(classifier))) +
geom_histogram() +
facet_wrap(~monthNo, ncol=1) +
geom_vline(data=means, aes(xintercept=Mean, colour=as.factor(classifier)),
linetype="dashed", size=1) +
geom_text(y=0.5, aes(x=Mean, label=Mean),
data=means[means$classifier=="a",]) +
geom_text(y=0.5, aes(x=Mean, label=Mean),
data=means[means$classifier=="b",])
I'm assuming you can format the numbers to the appropriate precision and place them on the y-axis where you need to with this code.

5 dimensional plot in r

I am trying to plot a 5 dimensional plot in R. I am currently using the rgl package to plot my data in 4 dimensions, using 3 variables as the x,y,z, coordinates, another variable as the color. I am wondering if I can add a fifth variable using this package, like for example the size or the shape of the points in the space. Here's an example of my data, and my current code:
set.seed(1)
df <- data.frame(replicate(4,sample(1:200,1000,rep=TRUE)))
addme <- data.frame(replicate(1,sample(0:1,1000,rep=TRUE)))
df <- cbind(df,addme)
colnames(df) <- c("var1","var2","var3","var4","var5")
require(rgl)
plot3d(df$var1, df$var2, df$var3, col=as.numeric(df$var4), size=0.5, type='s',xlab="var1",ylab="var2",zlab="var3")
I hope it is possible to do the 5th dimension.
Many thanks,
Here is a ggplot2 option. I usually shy away from 3D plots as they are hard to interpret properly. I also almost never put in 5 continuous variables in the same plot as I have here...
ggplot(df, aes(x=var1, y=var2, fill=var3, color=var4, size=var5^2)) +
geom_point(shape=21) +
scale_color_gradient(low="red", high="green") +
scale_size_continuous(range=c(1,12))
While this is a bit messy, you can actually reasonably read all 5 dimensions for most points.
A better approach to multi-dimensional plotting opens up if some of your variables are categorical. If all your variables are continuous, you can turn some of them to categorical with cut and then use facet_wrap or facet_grid to plot those.
For example, here I break up var3 and var4 into quintiles and use facet_grid on them. Note that I also keep the color aesthetics as well to highlight that most of the time turning a continuous variable to categorical in high dimensional plots is good enough to get the key points across (here you'll notice that the fill and border colors are pretty uniform within any given grid cell):
df$var4.cat <- cut(df$var4, quantile(df$var4, (0:5)/5), include.lowest=T)
df$var3.cat <- cut(df$var3, quantile(df$var3, (0:5)/5), include.lowest=T)
ggplot(df, aes(x=var1, y=var2, fill=var3, color=var4, size=var5^2)) +
geom_point(shape=21) +
scale_color_gradient(low="red", high="green") +
scale_size_continuous(range=c(1,12)) +
facet_grid(var3.cat ~ var4.cat)

ggplot2 stacked barplots, formatting, and grids

In the data that I am attempting to plot, each sample belongs in one of several groups, that will be plotted on their own grids. I am plotting stacked bar plots for each sample that will be ordered in increasing number of sequences, which is an id attribute of each sample.
Currently, the plot (with some random data) looks like this:
(Since I don't have the required 10 rep for images, I am linking it here)
There are couple things I need to accomplish. And I don't know where to start.
I would like the bars not to be placed at its corresponding nseqs value, rather placed next to each other in ascending nseqs order.
I don't want each grid to have the same scale. Everything needs to fit snugly.
I have tried to set scales and size to for facet_grid to free_x, but this results in an unused argument error. I think this is related to the fact that I have not been able to get the scales library loaded properly (it keeps saying not available).
Code that deals with plotting:
ggfdata <- melt(fdata, id.var=c('group','nseqs','sample'))
p <- ggplot(ggfdata, aes(x=nseqs, y=value, fill = variable)) +
geom_bar(stat='identity') +
facet_grid(~group) +
scale_y_continuous() +
opts(title=paste('Taxonomic Distribution - grouped by',colnames(meta.frame)[i]))
Try this:
update.packages()
## I'm assuming your ggplot2 is out of date because you use opts()
## If the scales library is unavailable, you might need to update R
ggfdata <- melt(fdata, id.var=c('group','nseqs','sample'))
ggfdata$nseqs <- factor(ggfdata$nseqs)
## Making nseqs a factor will stop ggplot from treating it as a numeric,
## which sounds like what you want
p <- ggplot(ggfdata, aes(x=nseqs, y=value, fill = variable)) +
geom_bar(stat='identity') +
facet_wrap(~group, scales="free_x") + ## No need for facet_grid with only one variable
labs(title = paste('Taxonomic Distribution - grouped by',colnames(meta.frame)[i]))

Resources