I have molecular sequencing data of relative abundance (in %) of the various phyla in 9 different samples and I am trying to plot it as colour-coded barchart (where each phyla corresponds to a different colour). Simple enough on excel, but for a complete newbie on R, I am struggling quite a bit. My data is in an excel format (formated as tabs), where the first line is the labels (e.g.sample name)- when plotting it, the bar labels are misplaced, do not match, and R plots the first line of my excel file (the names) as a separate value (pictures attached). What I have so far is:
attach(data)
data.1<-as.matrix(data)
par(mfrow=c(1,1))
barplot(data.1, col=c("aquamarine3","azure2","blue2","brown3","cadetblue3","deepskyblue3","firebrick3","gold3","darkorange3","darkorchid3","darkseagreen","darkslateblue","darkviolet","deeppink4"), main=".", xlab="Unit/Treatment", ylab="% Relative abundance")
detach(data)
legend("topright", inset=c(-0.2,0),
legend = c("Unassigned", "Acidobacteria","Actinobacteria","Bacteroidetes","Chlorobi","Chloroflexi","Firmicutes","Gemmatimonadetes","Planctomycetes","Proteobacteria","Verrucromicrobia","Euryarchaeota","Crenarchaeota","Parvarchaeota"),
fill = c("aquamarine3","azure2","blue2","brown3","cadetblue3","deepskyblue3","firebrick3","gold3","darkorange3","darkorchid3","darkseagreen","darkslateblue","darkviolet","deeppink4"))
par(mar=c(5.1, 4.1, 4.1, 8.1), xpd=TRUE)
layout(mat, widths = rep.int(1, ncol(mat)),
heights = rep.int(1, nrow(mat)), respect = FALSE)
As a result, I get this:
Barchart attempt, where R plots my sample names as x_1 and thus moves the other labels. Also, my legend covers the majority of my barchart and I cannot seem to adjust it.
Thanks very much in advance- any help with getting the barchart decently-looking would be highly appreciated.
Related
I currently have a stacked barplot with the data how I want it, other than the fact I am unsure of how to add percentage labels to break down how much of each sample belongs to each color/category.
I just need to know what packages/code would be helpful in adding the percentages so I can see the specific breakdown of the last two columns in particular.
I have the following code and plot:
tbl10 = read.table("combinedtest.Q")
names1 = read.table("names_sorted.txt")[,1]
bp=barplot(t(as.matrix(tbl10)), col=c("aliceblue","antiquewhite","aquamarine","black","blue","blue4","blueviolet","brown1","brown4","cadetblue","chartreuse","chocolate1","coral1","cornflowerblue","cyan","darkgoldenrod1","darkgray","darkgreen","darkmagenta","darkolivegreen1","hotpink","darkseagreen2","darkslateblue","deeppink","firebrick1","khaki1"),xlab="", ylab="Ancestry", border=NA)
text(cex=1, x=bp, y=-.08, names1, xpd=TRUE, srt=90)
output
I have been able to plot several pie charts overtop a map, representing different populations. However, what I would like to do is somehow represent the sample size for each of the pie charts, as its differs between population. I have a loop to add each population present in the dataset as a pie chart:
map("worldHires", xlim=c(-140, -110), ylim=c(48, 64), col="lightgray", fill=TRUE)
points(x=-120.43,y=50.34, col="black", pch=19)
segments(x0=dataframe$Long, y0=dataframe$Lat, x1=dataframe$Long2, y1=dataframe$Lat2, col="black")
add.pie(z=c(2, 5, 6),x=-122.43,y=52.34,labels="",radius = 1)
for(i in 1:nrow(dataframe))
{
add.pie(as.integer(dataframe[i,c("Cat1","Cat2", "Cat3")]*100),
x=dataframe$Long2[i],y=dataframe$Lat2[i],labels="",radius = 0.08,
col=c("red","blue", "green"))
}
title(ylab="Latitude")
title(xlab="Longitude")
box(which="plot")
I would like to add the sample size data (dataframe$n) somehow. I've seen examples of scaled radius pie charts, which could work here, or even just adding the sample size above the pie chart. To get the sample size above the pie chart I tried adding 'main=dataframe$n' between labels and radius in the add.pie portion of the code, but this did not work. Does anyone have any ideas on how to add this to my script? Thank you.
The size of each pie is plotted according each value in your dataframe. The good dataframe for this has a stations as rows and the class type are columns
I'm trying to look at the conditional distributions of some data to compare how they look using a barplot. I would like to change the variable of the x-axis when I look at a different conditional distribution of a contingency table but R does not do so. It keeps the x axis variable and the plotted variable the same (with frequency distribution on the y axis).
Here is my code:
eyecolour<-matrix(c(43, 62, 48, 27,35, 26, 30, 29,27,39,61,33), ncol=4, byrow=T)
colnames(eyecolour)<-c("Blue", "Brown", "Green", "Other")
rownames(eyecolour)<-c("Glasgow", "Sheffield", "London")
barplot(prop.table(eyecolour, 1), legend=T, beside=T)
barplot(prop.table(eyecolour, 2), legend=T, beside=T)
I was expecting the two barplots to show Cities on the x axis for one plot and eye colours on the x axis for the other. I wasn't sure which - I'm just learning.
Can anyone help me to produce that result?
To answer your first question you can simply use t() so that it now plots the cities rather than the eyecolours. You might notice that the two outputs of your prop.tables have the same structure (eyecolour in the columns). They just have different numbers depending on the margin you specify. Reading the documentation for ?barplot it says that:
If height is a matrix and beside is FALSE then each bar of the plot corresponds to a column of height, with the values in the column giving the heights of stacked sub-bars making up the bar. If height is a matrix and beside is TRUE, then the values in each column are juxtaposed rather than stacked.
This suggests that barplot uses the columns as the heights in the plot which is why you need to transpose your matrix so that the columns are cities instead of eyecolours.
Something like:
barplot(t(prop.table(eyecolour, 1)), legend=T, beside=T)
I'm working with TraMineR to do a sequence analysis of educational data. I can get R to produce a plot of the 10 most frequent sequences in the data using code similar to the following:
library(TraMineR)
##Loading the data
data(actcal)
##Creating the labels and defining the sequence object
actcal.lab <- c("> 37 hours", "19-36 hours", "1-18 hours", "no work")
actcal.seq <- seqdef(actcal, 13:24, labels=actcal.lab)
## 10 most frequent sequences in the data
actcal.freq <- seqtab(actcal.seq)
actcal.freq
## Plotting the object
seqfplot(actcal.seq, pbarw=FALSE, yaxis="pct", tlim=10:1, cex.legend=.75, withlegend="right")
However, I'd also like to have the frequencies of each sequence (which are in the object actcal.freq) along the right side of the plot. For example, the first sequence in the plot created by the code above represents 37.9% of the data (as the plot currently shows). Per the seqtab, this is 757 subjects. I'd like the number 757 to appear on the right y-axis (and so on for the other sequences).
Is this possible? I've played around with axis(side=4, ...) but never been able to get it to reproduce the spacing of the left y-axis.
OK. This is a bit of a mess, but the function resets the par setting if you include a legend by default, so you need to turn that off. Then you can set the axis a bit more easily, and then we can go back for the legend. This should work with your test data above.
#add padding to the right for axis and legend
par("mar"=c(5,4,4,8)+.1)
#plot w/o axis
seqfplot(actcal.seq, pbarw=FALSE, yaxis="pct", tlim=10:1, withlegend=F)
#plot right axis with freqs
axis(4, at = seq(.7, by=1.2, length.out=length(attr(actcal.freq,"freq")$Freq)),
labels = rev(attr(actcal.freq,"freq")$Freq),
mgp = c(1.5, 0.5, 0), las = 1, tick = FALSE)
#now put the legend on
legend("right", legend=attr(actcal.seq, "labels"),
fill=attr(actcal.seq, "cpal"),
inset=-.3, bty="o", xpd=NA, cex=.75)
You may need to play a bit with the margins and especially the inset= parameter of the legend to get it placed correctly. I hope your real data isn't too much different than this because you really have to dig though the function to see how it does the formatting to get things to match up.
I'm using prcomp to do PCA analysis in R, I want to plot my PC1 vs PC2 with different color text labels for each of the two categories,
I do the plot with:
plot(pca$x, main = "PC1 Vs PC2", xlim=c(-120,+120), ylim = c(-70,50))
then to draw in all the text with the different colors I've tried:
text(pca$x[,1][1:18], pca$[,1][1:18], labels=rownames(cava), col="green",
adj=c(0.3,-0.5))
text(pca$x[,1][19:35], pca$[,1][19:35], labels=rownames(cava), col="red",
adj=c(0.3,-0.5))
But R seams to plot 2 numbers over each other instead of one, the pcs$x[,1][1:18] plots the correct points I know because if I use that plot the points it works and produces the same plot as plot(pca$x).
It would be great if any could help to plot the labels for the two categories or
even plot the points different color to make it easy to differentiate between the plots easily.
You need to specify your x and y coordinates a bit differently:
text(pca$x[1:18,1], pca$x[1:18,2] ...)
This means take the first 18 rows and the first column (which is PC1) for the x coord, etc.
I'm surprised what you did doesn't throw an error.
If you want the points themselves colored, you can do it this way:
plot(pca$x, main = "PC1 Vs PC2", col = c(rep("green", 18), rep("red", 18)))