text annotation to a graph in ggplot - r

I am drawing a PC plot using ggplots.
I know this question has been answered in some previous posts but I could not still solve my problem.
I have a data set called tab which is the output of PCA
sample.id pop EV1 EV2
HT185_MK8-2.sort.bam HA_27 -0.03796869 0.046369552
HT48_SD1A-37.sort.bam HA_14 0.04208393 0.032961404
HT53_IA1A-10.sort.bam HA_1 -0.02580365 0.005262476
HT260_MK1-4.sort.bam HA_20 -0.06090545 0.005578504
HT170_SD2W-14.sort.bam HA_17 0.01288395 0.012117833
Q093_MK7-13.sort.bam HA_26 0.06310162 0.188558067
I want to add labels on each dot in the plot, theses dots are individuals from several populations. So I want to give them their population ID (pop column in the data set).
I am using something this
ggplot(data=tab,aes(EV1,EV2, label=tab[,2])) + geom_point(aes(color=as.factor(pop))) + ylab("Principal component 2") + xlab("Principal component 1")
But I do not get my desired output.
This is my PC plot!
So could anyone help me to add population label on each dot in the plot!
Thanks

Try geom_text:
geom_text(aes(label=as.character(pop)),hjust=0,vjust=0)
Also consider looking into plotly, or setting a threshold on the labels, because labeling every point will lead to a very crowded plot, and probably very little additional useful information.

Related

Can one use ggMarginal on a plot combining points and density lines?

I have been trying to add Marginal graphs to my current plot, which displays some data with density lines and some data with points. However, ggMarginal seems to only pick up the data belonging to the first layer or the first subset which is called within geom_point. As anyone an idea how to still achieve my goal using ggMarignal?
I do have come across workarounds with cowplot however in my case it would require a lot of additional work as I produce loads of figures with varying size (which would need all specific adjustment for perfect alignments)
Thanks for any ideas!
Code to reproduce current Output:
data=iris
PC_Data=prcomp(data[,1:4])
data2Plot=as.data.frame(cbind(PC_Data$x,Species=data$Species))
data2Plot$Species=as.factor(data2Plot$Species)
p<-ggplot(data2Plot,aes(x=PC1,y=PC2,color=Species,fill=Species))+
stat_density2d(data=subset(data2Plot, Species != "3"),geom="polygon",size=0.2,alpha=0.1) +
geom_point(data=subset(data2Plot, Species == "3"),size=1)+theme(legend.position = "none")
ggMarginal(p,type="density",groupColour = T,groupFill = T)
Current Output
Wanted Output:

How to prevent geom_text_repel from labeling points on scatter plot with default number ordering list?

My dataset looks like this:
I'm trying to create a simple scatter plot with data labels that are names (first and last name).
I used geom_text_repel in ggrepel to create data labels, but the labels on the plot are just numbers in the order of the data points in my dataset.
For example, if you look at the first datapoint, instead of the label being "Stephen Curry" it is "1"
I have no idea why this is happening and I can't find anyone else who even has my problem, let alone a solution.
Code:
ggplot(gravity,
aes(TS., USG., label = rownames(gravity))) +
geom_point(aes(TS., USG.), color='black') +
geom_text_repel(aes(TS., USG., label = rownames(gravity)))
The image above shows the plot created by the code. As you can see, the labels are just the ordering number instead of the name. I don't see why this happening considering those ordering numbers are not part of the dataset I imported.
Thanks in advance

R: how to make multiple plots from one CSV, grouping by a column

I'd like to put multiple plots onto a single visual output in R, based on data that I have in a CSV that looks something like this:
user,size,time
fred,123,0.915022
fred,321,0.938769
fred,1285,1.185608
wilma,5146,2.196687
fred,7506,1.181990
barney,5146,1.860287
wilma,1172,1.158015
barney,5146,1.219313
wilma,13185,1.455904
wilma,8754,1.381372
wilma,878,1.216908
barney,2974,1.223852
I can read this just fine, using, e.g.:
data = read.csv('data.csv')
For the moment, a fairly simple plot is fine, so I'm just trying plot(), without much to it (setting type='o' to get lines and points), and' from solving a past problem, I know that I can do, e.g., the following, to get data for just fred:
plot(data$time[which(data$user == 'fred')], data$size[which(data$user == 'fred')], type='o')
What I'd like, though, is to have the data for each user all showing up on one set of axes, with color coding (and a legend to match users to colors) to identify different user data.
And if another user shows up, I'd like another line to show up, with another color (perhaps recycling if I have too many users at once).
However, just this doesn't do it:
plot(data$size, data$time, type='o',col=c("red", "blue", "green"))
Because it doesn't seem to group by the user.
And just this:
plot(data, type='o')
gives me an error:
Error in plot.default(...) :
formal argument "type" matched by multiple actual arguments
This:
plot(data)
does do something, but not what I want.
I've poked around, but I'm new enough to R that I'm not quite sure how best to search for this, nor where to look for examples that would hit a use-case like this.
I even got somewhat closer with this:
plot(data$size[which(data$user == 'wilma')], data$time[which(data$user == 'wilma')], type='o', col=c('red'))
lines(data$size[which(data$user == 'fred')], data$time[which(data$user == 'fred')], type='o', col=c('green'))
lines(data$size[which(data$user == 'barney')], data$time[which(data$user == 'barney')], type='o', col=c('blue'))
This gives me a plot (which I'd post inline, but as a new user, I'm not allowed to yet):
not-quite-right plot
which is kind of close to what I want, except that it:
doesn't have a legend
has ugly axis labels, instead of just time and size
is scaled to the first plot, and thus is missing data from some of the others
isn't sorted by x-axis, which I could do externally, though I'm guessing I could do it fairly easily in R.
So, the question, ultimately, is this:
What's an easy way to plot data like this which:
has multiple lines based on the labels in the first column of the CSV
uses the same set of axes for the data in columns 2 and 3, regardless of the label
has a legend and color-coding for which label is being used for a particular line (or set of points)
will adapt to adding new labels to the data file, hopefully without change to the R code.
Thanks in advance for any help or pointers on this.
P.S. I looked around for similar questions, and found one that's sort of close, but it's not quite the same, and I failed to figure out how to adapt it to what I'm trying to do.
Good question. This is doable in base plot, but it's even easier and more intuitive using ggplot2. Below is an example of how to do this with random data in ggplot2
First download and install the package
install.packages("ggplot2",repos='http://cran.us.r-project.org')
require(ggplot2)
Next generate the data
a <- c(rep('a',3),rep('b',3),rep('c',3))
b <- rnorm(9,50,30)
c <- rep(seq(1,3),3)
dat <- data.frame(a,b,c)
Finally, make the plot
ggplot(data=dat, aes(x=c, y=b , group=a, colour=a)) + geom_line() + geom_point()
Basically, you are telling ggplot that your x axis corresponds to the c column (dat$c), your y axis corresponds to the b column (y$b) and to group (draw separate lines) by the a column (dat$a). Colour specifies that you want to group colour by the a column as well.
The resulting graph looks like this:

How to plot heatmap with multiple categories in a single cell with ggplot2?

How to plot heatmap with multiple categories in a single cell with ggplot2? Heatmap plot of categorical variables could be done with this code
#data
datf <- data.frame(indv=factor(paste("ID", 1:20),
levels =rev(paste("ID", 1:20))), matrix(sample(LETTERS[1:7], 400, T), ncol = 20))
library(ggplot2);
library(reshape2)
# converting data to long form for ggplot2 use
datf1 <- melt(datf, id.var = 'indv')
ggplot(datf1, aes(variable, indv)) + geom_tile(aes(fill = value),
colour = "white") + scale_fill_manual(values= rainbow (7))
The codes came from here:
http://rgraphgallery.blogspot.com/2013/04/rg54-heatmap-plot-of-categorical.html
But what about multiple categories in a single cell like this? Is it possible to use triangle or other shape as a cell?
http://postimg.org/image/4dudrv0nz/
copy from biostar as Alex Reynolds suggested.
For those interested, this apperas to be Figure 2 from Exome sequencing identifies mutation in CNOT3 and ribosomal genes RPL5 and RPL10 in T-cell acute lymphoblastic leukemia.
I wanted to create a similar plot with ggplot and geom_tile for a bigger collection of genes (few hundreds) but finally decided to use geom_points instead to provide additional information per cell (tile). Also it looks to me a lot like this plot was generated in Excel or some other spreadsheet software (maybe along those lines https://www.youtube.com/watch?v=0s5OiRMMzuY). The colors in the cells (tiles) do not match those in the legend (suggesting that they have been added separately and not automatically) and there appears to be an erroneous cell (diagonal separating colors -upper left to lower right - different from diagonal in black color - lower left to upper right -).
Hence, my concluding two cents: Doing this automatically is probably very time-consuming and in my opinion makes only sense if you want to do this repeatedly, e.g., on data that is subject to change or on multiple datasets, and/or if you have a larger collections of genes.
Otherwise, following the instructions in the youtube video for a rather small number of cells is likely to be more efficient. Or use geom_point (similar to Adding points to a geom_tile layer in ggplot2 or
Marking specific tiles in geom_tile() / geom_raster()
) to represent information about an additional category (variable).
In any case, should anyone have other suggestions on how to automatically create such a figure, I am more than happy to hear about that.

How to make 3D plots with categorical data in R?

I've been trying to create a 3D bar plot based on categorical data, but have not found a way.
It is simple to explain. Consider the following example data (the real example is more complex, but it reduces to this), showing the relative risk of incurring something broken down by income and age, both categorical data.
I want to display this in a 3D bar plot (similar in idea to http://demos.devexpress.com/aspxperiencedemos/NavBar/Images/Charts/ManhattanBar.jpg). I looked at the scatterplot3d package, but it's only for scatter plots and doesn't handle categorical data well. I was able to make a 3d chart, but it shows dots instead of 3d bars. There is no chart type for what I need. I've also tried the rgl package, but no luck either. I've been googling for more than an hour now and haven't found a solution. I have a copy of the ggplot2 - Elegant Graphics for Data Analysis book as well, but ggplot2 doesn't have this kind of chart.
Is there another freeware app I could use? OpenOffice 3.2 doesn't have this chart either.
Thank you for any hints.
Age,Income,Risk
young,high,1
young,medium,1.2
young,low,1.36
adult,high,1
adult,medium,1.12
adult,low,1.23
old,high,1
old,medium,1.03
old,low,1.11
I'm not sure how to make a 3d chart in R, but there are other, better ways to represent this data than with a 3d bar chart. 3d charts make interpretation difficult, because the heights of the bars and then skewed by the 3d perspective. In that example chart, it's hard to tell if Wisconsin in 2004 is really higher than Wisconsin 2001, or if that's an effect of the perspective. And if it is higher, how much so?
Since both Age and Income have meaningful orders, it wouldn't be awful to make a line graph. ggplot2 code:
ggplot(data, aes(Age, Risk, color = Income))+
geom_line(aes(group = Income))
Or, you could make a heatmap.
ggplot(data, aes(Age, Income, fill = Risk)) +
geom_tile()
Like the others suggested there are better ways to present this, but this should get you started if you want something similar to what you had.
df <- read.csv(textConnection("Age,Income,Risk
young,high,1
young,medium,1.2
young,low,1.36
adult,high,1
adult,medium,1.12
adult,low,1.23
old,high,1
old,medium,1.03
old,low,1.11
"))
df$Age <- ordered(df$Age, levels=c('young', 'adult', 'old'))
df$Income <- ordered(df$Income, levels=c('low', 'medium', 'high'))
library(rgl)
plot3d(Risk ~ Age|Income, type='h', lwd=10, col=rainbow(3))
This will just produce flat rectangles. For an example to create nice looking bars, see demo(hist3d).
You can find a starting point here but you need to add in more lines and some rectangles to get a plot like you posted.

Resources