Adding percent labels to a stacked barplot in R - r

I currently have a stacked barplot with the data how I want it, other than the fact I am unsure of how to add percentage labels to break down how much of each sample belongs to each color/category.
I just need to know what packages/code would be helpful in adding the percentages so I can see the specific breakdown of the last two columns in particular.
I have the following code and plot:
tbl10 = read.table("combinedtest.Q")
names1 = read.table("names_sorted.txt")[,1]
bp=barplot(t(as.matrix(tbl10)), col=c("aliceblue","antiquewhite","aquamarine","black","blue","blue4","blueviolet","brown1","brown4","cadetblue","chartreuse","chocolate1","coral1","cornflowerblue","cyan","darkgoldenrod1","darkgray","darkgreen","darkmagenta","darkolivegreen1","hotpink","darkseagreen2","darkslateblue","deeppink","firebrick1","khaki1"),xlab="", ylab="Ancestry", border=NA)
text(cex=1, x=bp, y=-.08, names1, xpd=TRUE, srt=90)
output

Related

Contour plot via Scatter plot

Scatter plots are useless when number of plots is large.
So, e.g., using normal approximation, we can get the contour plot.
My question: Is there any package to implement the contour plot from scatter plot.
Thank you #G5W !! I can do it !!
You don't offer any data, so I will respond with some artificial data,
constructed at the bottom of the post. You also don't say how much data
you have although you say it is a large number of points. I am illustrating
with 20000 points.
You used the group number as the plotting character to indicate the group.
I find that hard to read. But just plotting the points doesn't show the
groups well. Coloring each group a different color is a start, but does
not look very good.
plot(x,y, pch=20, col=rainbow(3)[group])
Two tricks that can make a lot of points more understandable are:
1. Make the points transparent. The dense places will appear darker. AND
2. Reduce the point size.
plot(x,y, pch=20, col=rainbow(3, alpha=0.1)[group], cex=0.8)
That looks somewhat better, but did not address your actual request.
Your sample picture seems to show confidence ellipses. You can get
those using the function dataEllipse from the car package.
library(car)
plot(x,y, pch=20, col=rainbow(3, alpha=0.1)[group], cex=0.8)
dataEllipse(x,y,factor(group), levels=c(0.70,0.85,0.95),
plot.points=FALSE, col=rainbow(3), group.labels=NA, center.pch=FALSE)
But if there are really a lot of points, the points can still overlap
so much that they are just confusing. You can also use dataEllipse
to create what is basically a 2D density plot without showing the points
at all. Just plot several ellipses of different sizes over each other filling
them with transparent colors. The center of the distribution will appear darker.
This can give an idea of the distribution for a very large number of points.
plot(x,y,pch=NA)
dataEllipse(x,y,factor(group), levels=c(seq(0.15,0.95,0.2), 0.995),
plot.points=FALSE, col=rainbow(3), group.labels=NA,
center.pch=FALSE, fill=TRUE, fill.alpha=0.15, lty=1, lwd=1)
You can get a more continuous look by plotting more ellipses and leaving out the border lines.
plot(x,y,pch=NA)
dataEllipse(x,y,factor(group), levels=seq(0.11,0.99,0.02),
plot.points=FALSE, col=rainbow(3), group.labels=NA,
center.pch=FALSE, fill=TRUE, fill.alpha=0.05, lty=0)
Please try different combinations of these to get a nice picture of your data.
Additional response to comment: Adding labels
Perhaps the most natural place to add group labels is the centers of the
ellipses. You can get that by simply computing the centroids of the points in each group. So for example,
plot(x,y,pch=NA)
dataEllipse(x,y,factor(group), levels=c(seq(0.15,0.95,0.2), 0.995),
plot.points=FALSE, col=rainbow(3), group.labels=NA,
center.pch=FALSE, fill=TRUE, fill.alpha=0.15, lty=1, lwd=1)
## Now add labels
for(i in unique(group)) {
text(mean(x[group==i]), mean(y[group==i]), labels=i)
}
Note that I just used the number as the group label, but if you have a more elaborate name, you can change labels=i to something like
labels=GroupNames[i].
Data
x = c(rnorm(2000,0,1), rnorm(7000,1,1), rnorm(11000,5,1))
twist = c(rep(0,2000),rep(-0.5,7000), rep(0.4,11000))
y = c(rnorm(2000,0,1), rnorm(7000,5,1), rnorm(11000,6,1)) + twist*x
group = c(rep(1,2000), rep(2,7000), rep(3,11000))
You can use hexbin::hexbin() to show very large datasets.
#G5W gave a nice dataset:
x = c(rnorm(2000,0,1), rnorm(7000,1,1), rnorm(11000,5,1))
twist = c(rep(0,2000),rep(-0.5,7000), rep(0.4,11000))
y = c(rnorm(2000,0,1), rnorm(7000,5,1), rnorm(11000,6,1)) + twist*x
group = c(rep(1,2000), rep(2,7000), rep(3,11000))
If you don't know the group information, then the ellipses are inappropriate; this is what I'd suggest:
library(hexbin)
plot(hexbin(x,y))
which produces
If you really want contours, you'll need a density estimate to plot. The MASS::kde2d() function can produce one; see the examples in its help page for plotting a contour based on the result. This is what it gives for this dataset:
library(MASS)
contour(kde2d(x,y))

Box plot of two groups add regression line to each group

I want to make a graph that graphs box plots for two groups and adds a regression line for each group. I have seen a few examples available, but none achieving my goal.
My dataframe is like so:
df<- data.frame(cont.burnint= c(rep(2,10), rep(12, 10), rep(25, 10)),
variable= rep(c("divA","divC"), 30),
value= sample(x = seq(-1,4,0.5), size = 60, replace =
TRUE))
I would like to produce a graph like:
However, I want to change the points to a box plot for each group. I have not found helpful examples in the following:
Add geom_smooth to boxplot
Adding a simple lm trend line to a ggplot boxplot
The code I have found available thus far, changes my continuous variable cont.burnint to a factor and reorders the x-values from c(2,12,25) to c(12,2,25). Also, the regression lines in the ggplot examples (refer to link)do not extend to the y axis. I would like the regression line to extend to the y-axis. Thirdly, the box plots become off set from each other and I would like an option that keeps the box plot for both groups on the same x value.
So basically, I want to change the points in the graph provided to a box and whisker plot and keep all else the same, in the example above. I wouldn't mind adding a legend below the plot and making text and lines bolder too.
Here is the code for the example above:
plot(as.numeric(as.character(manovadata$cont.burnint)),manovadata$divA,type="p",col="black", xlab="Burn Interval (yr)", ylab="Interaction Diveristy", bty="n", cex.lab=1.5)
points(as.numeric(as.character(manovadata$cont.burnint)),manovadata$divC,col="grey")
abline(lm(manovadata$divA~as.numeric(as.character(manovadata$cont.burnint)), manovadata),col="black",lty=1)
abline(lm(manovadata$divC~as.numeric(as.character(manovadata$cont.burnint)), manovadata),col="grey",lty=1)
I can't imagine why you want overlaying boxplots, but here you go I think:
library(ggplot2)
df$cont.burnint <- as.factor(df$cont.burnint)
ggplot(df, aes(x=cont.burnint, y=value, col=variable))+
geom_boxplot(position=position_dodge(width=0), alpha=0.5)+
geom_smooth(aes(group=variable), method="lm")
I added some transparency to the boxplots using alpha to make them visible on top of each other.
Update:
ggplot(df, aes(x=cont.burnint, y=value, col=variable))+
geom_boxplot(aes(group=paste(variable,cont.burnint)))+
geom_smooth(aes(group=variable), method="lm", fullrange=T, se=F)+xlim(0,30)

R stacked barchart label mismatch

I have molecular sequencing data of relative abundance (in %) of the various phyla in 9 different samples and I am trying to plot it as colour-coded barchart (where each phyla corresponds to a different colour). Simple enough on excel, but for a complete newbie on R, I am struggling quite a bit. My data is in an excel format (formated as tabs), where the first line is the labels (e.g.sample name)- when plotting it, the bar labels are misplaced, do not match, and R plots the first line of my excel file (the names) as a separate value (pictures attached). What I have so far is:
attach(data)
data.1<-as.matrix(data)
par(mfrow=c(1,1))
barplot(data.1, col=c("aquamarine3","azure2","blue2","brown3","cadetblue3","deepskyblue3","firebrick3","gold3","darkorange3","darkorchid3","darkseagreen","darkslateblue","darkviolet","deeppink4"), main=".", xlab="Unit/Treatment", ylab="% Relative abundance")
detach(data)
legend("topright", inset=c(-0.2,0),
legend = c("Unassigned", "Acidobacteria","Actinobacteria","Bacteroidetes","Chlorobi","Chloroflexi","Firmicutes","Gemmatimonadetes","Planctomycetes","Proteobacteria","Verrucromicrobia","Euryarchaeota","Crenarchaeota","Parvarchaeota"),
fill = c("aquamarine3","azure2","blue2","brown3","cadetblue3","deepskyblue3","firebrick3","gold3","darkorange3","darkorchid3","darkseagreen","darkslateblue","darkviolet","deeppink4"))
par(mar=c(5.1, 4.1, 4.1, 8.1), xpd=TRUE)
layout(mat, widths = rep.int(1, ncol(mat)),
heights = rep.int(1, nrow(mat)), respect = FALSE)
As a result, I get this:
Barchart attempt, where R plots my sample names as x_1 and thus moves the other labels. Also, my legend covers the majority of my barchart and I cannot seem to adjust it.
Thanks very much in advance- any help with getting the barchart decently-looking would be highly appreciated.

R plot and barplot how to fix ylim not alike?

I try to use base R to plot a time series as a bar plot and as ordinary line plot. I try to write a flexible function to draw such a plot and would like to draw the plots without axes and then add universal axis manually.
Now, I hampered by strange problem: same ylim values result into different axes. Consider the following example:
data(presidents)
# shorten this series a bit
pw <- window(presidents,start=c(1965))
barplot(t(pw),ylim = c(0,80))
par(new=T)
plot(pw,ylim = c(0,80),col="blue",lwd=3)
I intentionally plot y-axes coming from both plots here to show it's not the same. I know I can achieve the intended result by plotting a bar plot first and then add lines using x and y args of lines.
But the I am looking for flexible solution that let's you add lines to barplots like you add lines to points or other line plots. So is there a way to make sure y-axes are the same?
EDIT: also adding the usr parameter to par doesn't help me here.
par(new=T,usr = par("usr"))
Add yaxs="i" to your lineplot. Like this:
plot(pw,ylim = c(0,80),col="blue",lwd=3, yaxs="i")
R start barplots at y=0, while line plots won't. This is to make sure that you see a line if it happens that your data is y=0, otherwise it aligns with the x axis line.

R lattice barchart: How to write the total sum on each bar in multiple panels?

I have a lattice bar chart with multiple panels and I would like to add the sum of each bar on top of the bars (e.g. (70) on top the of first bar on the top left, (20) on the second one, (150) on the third one etc.).
There is a similar question here but I could not find a way to adapt that code for my plot. Unlike in that example, what I would like to do is to add the 'total sum' of men and women on top of each bar vertical bar. I also could not label them separately using ltext as shown here. Any suggestion, using ltext or any other way, would be very helpful.
civ1<-c("Single","Single","Marr","Marr","Single","Single","Marr","Marr","Single","Single","Marr","Marr","Single","Single","Marr","Marr")
Sex<-rep(c("women","men"),8)
Year<-rep(c(rep(1990,4),rep(2000,4)),2)
Type1<-c(rep("Traditional",8),rep("Dual-earner",8))
Earn1<-c(seq(10, 160, by = 10))
df<-as.data.frame(cbind(civ1,Sex,Year,Type1,Earn1))
df$Earn1<-as.numeric(levels(df$Earn1))[df$Earn1]
my.key<-list(space="bottom",text=list(c("Women","Men"),col=c("black","black")), columns=2,points=T,pch=15,col=c("darkgray","lightgray"),cex=0.8)
labels=c("70","20","150","110")
print(figure1<-barchart(Earn1~civ1|Year+Type1,df,groups=Sex, ylim=c(0,350),horizontal=F,col=c("darkgray","lightgray"),cex=0.8,ylab="Earnings",stack=T,layout=c(2,2),key=my.key,
par.settings = list(strip.background=list(col=c("white","lightyellow")),
panel=function(x,y,subscripts...){
panel.grid(h=-1,v=0)
panel.barchart(...)
ltext(1,200, labels[subscripts]) #not working!
})))
I see several problems. First, your panel= parameter is inside your par.settings parameter which is incorrect. It should be passed to barchart directly. Then you have some syntax problems with a missing comma and I'm not sure how your labels were intended to work with only 4 values. Anyway, the following code should work.
barchart(
Earn1~civ1|Year+Type1,df,
groups=Sex,
ylim=c(0,350), cex=0.8, ylab="Earnings",
horizontal=F, stack=T, layout=c(2,2),
col=c("darkgray","lightgray"),
key=my.key,
par.settings = list(strip.background=list(col=c("white","lightyellow"))),
panel=function(x,y,subscripts,...){
panel.grid(h=-1,v=0)
panel.barchart(x,y,subscripts=subscripts,...)
t <- aggregate(y~x, data.frame(x,y), FUN=sum)
panel.text(t$x,t$y, labels=t$y, pos=3)
}
)
Aside from fixing the problems described above, I've use aggregate() to calculate the total for each column and used those values to plot the text labels at the appropriate spot. The resulting plot is below

Resources