How to label 5 specific points on PCA plot - r

I have used package (tidyverse) and just wanted to add labels to 5 specific points on this lot. I have tried the below code but it is not giving me any points. the data set is about 2000 observations over 21 variables.
BOTTOM=which(interest2$ID%in%project.pca$ID);
text(which(interest2$ID%in%project.pca$ID)[BOTTOM,1], text(which(interest2$ID%in%project.pca$ID))[BOTTOM,2],text(which(interest2$ID%in%project.pca$ID)[BOTTOM,3],rownames(input)[BOTTOM],pos=1)

With ggplot, Make the PCA plot first, and the add edition layer with dataframe with only those 5 points. Check out in this post for example
https://datavizpyr.com/how-to-add-labels-to-select-points-with-ggrepel/

Related

ggplot2 violin plot for columns with less than 3 samples

I am wondering if anyone has found a way to display violin plots through ggplot2 with variables of 1 or 2 samples.
example code:
library(ggplot2)
testData <- data.frame(x=c("a","a","a","b","b"), y=c(1,2,2,1,2))
ggplot(data=testData ) + geom_violin(aes(x=x,y=y))
As you can see the violin plot for a has been drawn as it has 3 samples, the one for b no -> only 2 samples.
I saw geom_violin produces error when all values in a series are the same but no answer has been given, and it's been 7
years.
I know it is possible to display a violin plot with the violplot package, but I'd really prefer to keep to the ggplot package if possible.
Thanks,
HY
Thanks to #MarcoSandri and others.
I was on ggplot2 3.3.3, it now works on 3.3.6.

Grouped bar chart not working with lattice in R

I'm having trouble creating grouped barplots. Have explored base graphics and lattice.
My data looks like
compound detection LUtype
a 50 ag
a 75 urban
a 34 mixed
b 89 ag
......
I'd like to create a plot with compounds on the y axis (horizontal bar plot) with the bars colored to represent the land use type and detection on the x axis.
These data are stored in a data frame, which I tried converting to a matrix with as.matrix, but this doesn't work and from what I can tell, the matrix is only the row of compounds. This does not produce a plot.
bars<-data.frame(data6$compound,data6$detection,data6$LUtype)
barsM<-as.matrix(data6$compound,data6$detection,data6$LUtype)
barplot(barsM,horiz=TRUE,beside=TRUE)
I also tried to bypass the matrix by using lattice, by no plot here either.
library(lattice)
require(lattice)
barchart(data6$detection~data6$compound,groups=data6$LUtype,bars)
I'm reading this article
plotting grouped bar charts in R, and I have basically the same set up, but these solutions aren't working for me.

Plot the relationship of each column to a singular column in a table

I have one table of derived vegetation indices for 63 sample sites from different satellites. this gives me a table with 63 observations(sample sites) and 56 variables(1 Sample ID, 50 vegetation indices, 4 Biomass and 1 LAI). The last 5 columns of the table are the biomass and LAI, and the first column is the sample ID.
I want to generate a plot showing the relationship between a single vegetation index and one of the biomass parameters.
I am able to do this using the plot function, for one observation and variable at a time.
plot(data$Dry10, data$X8047EVImea)
I don't want to run this code 50 times and again by 5 sets for each biomass and LAI parameter.
Is there a way to loop or nested loop this plot function so that I can generate 200 graphs at once?
Also, I will place a regression line in each plot to see what vegetation index will best represent the amount of biomass present at the sample site.
This is my first post on stackoverflow, so please don't hesitate to request more information on the problem if I have missed something.
As noted in my comment you can accomplish this with a faceted plot in the ggplot2 package. This does require a little bit of data re-arrangement that can be accomplished with the reshape2 package. Here is some code that will be close to what you want to do but since I don't completely know your data formats it might take some fixes:
library(ggplot2)
library(reshape2)
library(dplyr)
vegDat <- data[,2:51]
bioDat <- data[,52:55]
## melt the data.frames so the biomass and vegetation headers are now variables
vegDatM <- melt(vegDat, variable.name='vegInd', value.name='vegVal')
bioDatM <- melt(bioDat, variable.name='bioInd', value.name='bioVal')
## Join these datasets to create all comparisons to be made
gdat <- bind_cols(vegDatM[rep(seq_len(nrow(vegDatM)), each=nrow(bioDatM)),],
bioDatM[rep(seq_len(nrow(bioDatM)), nrow(vegDatM)),])
## plot the data in a faceted grid
ggplot(gdat) + geom_point(aes(x=vegVal, y=bioVal)) + facet_grid(vegInd ~ bioInd)
Note that since there are 50 plots you may want to open a divice with a large height (or width if you swap the facet) i.e. pdf('foo.pdf', heigth=20). Hope this gets you on the right track.

PCA biplot one variables shown R

I ran a pca on a set of 45000 genes on 5 different samples, and when I perform a biplot, all I see is a mass of text (responding to the observation names), and cannot see the location of my samples. Is there a way to plot the location of the samples only, and not the observation, in a biplot?
Using built in data from R
usa <- USArrests
pca1 <- prcomp(usa)
biplot(pca1)
This generates a biplot where all the states (observation names) overlap the variables (my different samples) rape, etc. Is it possible to plot only the variables (samples), and not the states (observation names)?
biplot.default uses text to write the categorical variable name of the observation. As it doesn't use points you need to modify the source if you only want the points (and not the labels) to be plotted.
However, you could "hack" it by doing something like:
biplot(pca1, xlabs = rep(".", nrow(usa)))
I hope this is what you're looking for!
Edit If this is not satisfactory, you can modify the source given when running stats:::biplot.default to use points.

How to draw box plots for each cluster in same diagram?

I have done the clustering using Rattle and at the end I have the following format of data:
I need to draw the boxplot for each cluster in the same diagram. (i.e there are 5 clusters. So, in X axis, I need 1 to 5 cluster numbers and in Y axis I need age.)
I did the following things. But I couldn't get as I expect.
Can anybody suggest correct settings to get 5 box plots in parallel for each cluster?
In R u can make it. first read the file in frame.
reviews <- read.csv ("abc.csv", stringsAsFactors=FALSE)
boxplot(reviews$age ~ reviews$Cluster)

Resources