R - pca axis lines - r

I'm trying to figure out why my pca axes lines are not showing up in my final pca plot pca plot with no axes lines but nothing is standing out. I've tried going through and commenting out parts of the code but that hasn't helped. Any help is appreciated!
Data: https://pastebin.com/MWNfUz3S
#not all of these are used for the pca portion of my code but all are used at some point
library(readxl)
library(dplyr)
library(broom)
library(tidyr)
library(ggthemes)
library(ggplot2)
library(gridExtra)
library(cowplot)
library(gtable)
library(ggfortify)
library(data.table)
library(vegan)
library(ggrepel)
library(forcats)
library(scales)
input <-"End Members 2020_03.xls"
mydata<-read_excel(input,sheet=1)
rm(input)
df<-mydata[,c(5:24)]
pca<-rda(df,scale=T) #Conducting a correlation PCA
#Percent variance explained by PCs
pervar<-round(pca$CA$eig/sum(pca$CA$eig)*100,2)
#Plot values and limits for all LOP plots
scrs<-scores(pca,display=c("sites","species"))
xlim<-with(scrs,range(species[,1],sites[,1]))*1.1
ylim<-with(scrs,range(species[,2],sites[,2]))*1.1
ordlab<-ordipointlabel(pca,display="species")$labels
factor.scores<-as.data.frame(pca$CA$v) %>% select(PC1,PC2) %>% mutate(variable=row.names(.),PC1=PC1)
factor.scores.alt<-as.data.frame(scrs$species)
site.scores<-as.data.frame((pca$CA$u)) %>% select(PC1,PC2)
pca.scores<-as.data.frame(scrs$sites)
mydata$PC1<-pca.scores$PC1
mydata$PC2<-pca.scores$PC2
#Generate better labels in PCA plot
variables<-factor.scores.alt %>% mutate(PC1.adj=PC1*1.7,PC2.adj=PC2*1.7,labels=rownames(factor.scores.alt))
landuse.color.and.season.shape.pca<-ggplot(mydata,aes(PC1,PC2))+
geom_vline(xintercept=0.0)+
geom_hline(yintercept=0.0)+
#geom_point(aes(shape=Season,color=reorder(Landuse,Classification)),size=2)+
geom_point(aes(shape=Season,color=Landuse),size=3)+
geom_text_repel(data=variables,aes(PC1.adj,PC2.adj,label=labels),size=3,fontface="plain",family="serif")+
scale_color_gdocs(name="Landuse")+
#scale_shape_discrete(solid=T,name="Seasons")+
xlim(-3,3)+
ylim(3,-3)+
xlab("PC1 [56.4%]")+
ylab("PC2 [20.2%]")+
theme(legend.text=element_text(size=10,face="plain",family="serif"),
legend.title=element_text(size=10,face="plain",family="serif"),
axis.title=element_text(size=10,face="plain",family="serif"),
axis.text=element_text(size=10,face="plain",family="serif"),
plot.background=element_blank())
pca.legend<-get_legend(landuse.color.and.season.shape.pca + theme(legend.text=element_text(size=8),
legend.title=element_text(size=10),
text=element_text(family="serif"),
legend.position="bottom",
legend.title.align=0.5,
legend.box="horizontal",
legend.box.just="top",
legend.background=element_rect(fill="white",colour=F)))
pca.plot.final<-plot_grid(pca.legend,
landuse.color.and.season.shape.pca + theme(legend.position="none",
panel.background=element_blank()),
nrow=2,rel_heights=c(0.1,1.1))
pca.plot.final
This part of the post does not actually relate to the question but rather is being used to allow this post to be accepted. Apparently if you need to include more details with more the more code you provide. I don't really have any more details to add so here we are... Hope everyone is doing well.

Related

How can I use column labels as Y axis in ggplot?

Hello,
I have a dateset structured as shown in the link above. I am extremely new to R. And this is probably super easy to get done. But I cannot figure out how to plot this dataset using ggplot...
Could anyone guide and give me hints?
I basically want to color lines according to socioeconomic levels and visualize it by each years' value...
You need to reshape you data to run ggplot.
library(reshape)
library(dplyr)
library(ggplot2)
df_long <- melt(df) # reshape the dataframe to a long format
df_long %>%
ggplot( aes(x=variable, y=value, group=group, color=group)) +
geom_line()
Note: You will get better answers if you post your code with a reproducible dataset.

can not add statistics in ggplot

i am trying to add wilcoxon stats in my graph, but the "stat_compare_means" does not work...
i have tried both ggplot and ggplot2.
library(readxl)
library(dplyr)
library(tidyverse)
library(ggpubr)
library(dplyr)
library(tidyr)
library(ggplot2)
library(Rtsne)
require(ggpubr)
#excel sheet resolution, voxel size comparison
data<-read_excel("res_all.xlsx", sheet="resolution")
# transform to long format using dplyr (included in tidyverse)
data_long <- as_tibble(data) %>%
gather(key, value,-parameter) %>%
mutate(cohort=ifelse(grepl("per",key), "per", "val"))
# plot graph
graph <- ggplot(data_long) +
aes(x=parameter, y=value, fill=cohort)+
geom_boxplot()+
stat_compare_means(method= "wilcox.test")
graph + ggtitle("Resolution comparison")+
theme_minimal()
error is Error in stat_compare_means(method = "wilcox.test") :
could not find function "stat_compare_means"
is it any other way to add W and p-values in my graph?
Thank you in advance.
[1]: https://i.stack.imgur.com/yfp8E.png
I think you forgot a "+" after theme_minimal().
Oh, and stat_compare_means is from ggpubr package, not ggplot. be sure you included it. Check if you have library(ggpubr) or require(ggpubr) in your R session. It is good if you can include full code and result in sessioninfo() for further troubleshoot.
The stat_compare_means() was introduced in ggpubr ver 0,1,3. So check the package with ?ggpubr for the version and lsf.str("package:ggpubr") to list all functions inside the package.

How to use ggplot to plot T-SNE clustering

Here is the t-SNE code using IRIS data:
library(Rtsne)
iris_unique <- unique(iris) # Remove duplicates
iris_matrix <- as.matrix(iris_unique[,1:4])
set.seed(42) # Set a seed if you want reproducible results
tsne_out <- Rtsne(iris_matrix) # Run TSNE
# Show the objects in the 2D tsne representation
plot(tsne_out$Y,col=iris_unique$Species)
Which produces this plot:
How can I use GGPLOT to make that figure?
I think the easiest/cleanest ggplot way would be to store all the info you need in a data.frame and then plot it. From your code pasted above, this should work:
library(ggplot2)
tsne_plot <- data.frame(x = tsne_out$Y[,1], y = tsne_out$Y[,2], col = iris_unique$Species)
ggplot(tsne_plot) + geom_point(aes(x=x, y=y, color=col))
My plot using the regular plot function is:
plot(tsne_out$Y,col=iris_unique$Species)

ggplot: boxplot number of observations as x-axis labels

I have successfully created a very nice boxplot (for my purposes) categorized by a factor and binned, according to the answer in my previous post here:
ggplot: arranging boxplots of multiple y-variables for each group of a continuous x
Now, I would like to customize the x-axis labels according to the number of observations in each boxplot.
require (ggplot2)
require (plyr)
library(reshape2)
set.seed(1234)
x<- rnorm(100)
y.1<-rnorm(100)
y.2<-rnorm(100)
y.3<-rnorm(100)
y.4<-rnorm(100)
df<- (as.data.frame(cbind(x,y.1,y.2,y.3,y.4)))
dfmelt<-melt(df, measure.vars = 2:5)
dfmelt$bin <- factor(round_any(dfmelt$x,0.5))
dfmelt.sum<-summary(dfmelt$bin)
ggplot(dfmelt, aes(x=bin, y=value, fill=variable))+
geom_boxplot()+
facet_grid(.~bin, scales="free")+
labs(x="number of observations")+
scale_x_discrete(labels= dfmelt.sum)
dfmelt.sum only gives me the total number of observations for each bin not for each boxplot.
Boxplots statistics give me the number of observations for each boxplot.
dfmelt.stat<-boxplot(value~variable+bin, data=dfmelt)
dfmelt.n<-dfmelt.stat$n
But how do I add tick marks and labels for each boxplot?
Thanks, Sina
UPDATE
I have continued working on this. The biggest problem is that in the code above, only one tick mark is provided per facet. Since I also wanted to plot the means for each boxplot, I have used interaction to plot each boxplot individually, which also adds tick marks on the x-axis for each boxplot:
require (ggplot2)
require (plyr)
library(reshape2)
set.seed(1234) x<- rnorm(100)
y.1<-rnorm(100)
y.2<-rnorm(100)
y.3<-rnorm(100)
y.4<-rnorm(100)
df<- (as.data.frame(cbind(x,y.1,y.2,y.3,y.4))) dfmelt<-melt(df, measure.vars = 2:5)
dfmelt$bin <- factor(round_any(dfmelt$x,0.5))
dfmelt$f2f1<-interaction(dfmelt$variable,dfmelt$bin)
dfmelt_mean<-aggregate(value~variable*bin, data=dfmelt, FUN=mean)
dfmelt_mean$f2f1<-interaction(dfmelt_mean$variable, dfmelt_mean$bin)
dfmelt_length<-aggregate(value~variable*bin, data=dfmelt, FUN=length)
dfmelt_length$f2f1<-interaction(dfmelt_length$variable, dfmelt_length$bin)
On the side: maybe there is a more elegant way to combine all those interactions. I'd be happy to improve.
ggplot(aes(y = value, x = f2f1, fill=variable), data = dfmelt)+
geom_boxplot()+
geom_point(aes(x=f2f1, y=value),data=dfmelt_mean, color="red", shape=3)+
facet_grid(.~bin, scales="free")+
labs(x="number of observations")+
scale_x_discrete(labels=dfmelt_length$value)
This gives me tick marks on for each boxplot which can be potentially labeled. However, using labels in scale_x_discrete only repeats the first four values of dfmelt_length$value in each facet.
How can that be circumvented?
Thanks, Sina
look at this answer, It is not on the label but it works - I have used this
Modify x-axis labels in each facet
You can also do as follows, I also have used that
library(ggplot2)
df <- data.frame(group=sample(c("a","b","c"),100,replace=T),x=rnorm(100),y=rnorm(100)*rnorm(100))
xlabs <- paste(levels(df$group),"\n(N=",table(df$group),")",sep="")
ggplot(df,aes(x=group,y=x,color=group))+geom_boxplot()+scale_x_discrete(labels=xlabs)
This also works
library(ggplot2)
library(reshape2)
df <- data.frame(group=sample(c("a","b","c"),100,replace=T),x=rnorm(100),y=rnorm(100)*rnorm(100))
df1 <- melt(df)
df2 <- ddply(df1,.(group,variable),transform,N=length(group))
df2$label <- paste0(df2$group,"\n","(n=",df2$N,")")
ggplot(df2,aes(x=label,y=value,color=group))+geom_boxplot()+facet_grid(.~variable)

Showing variable labels under the segments of dendrogram with ggdendro

My question is related to Andrie's answer to my earlier question. My question is whether is this possible to display the variable labels and car label under the corresponding segments of the dendrogram?
library(ggplot2)
library(ggdendro)
data(mtcars)
x <- as.matrix(scale(mtcars))
dd.row <- as.dendrogram(hclust(dist(t(x))))
ddata_x <- dendro_data(dd.row)
p2 <- ggplot(segment(ddata_x)) +
geom_segment(aes(x=x0, y=y0, xend=x1, yend=y1))
print(p2)
Make sure you have version 0.0-7 of ggdendro and then use the convenience function ggdendrogram:
library(ggplot2)
library(ggdendro)
ggdendrogram(dd.row)
If you want full control over how the labels are displayed, you can extract and manipulate these from ddata_x using either:
ddata_x$labels
label(ddata_x)
To add to your plot:
p2 + geom_text(data=label(ddata_x), aes(label=text, x=x, y=0))
You can find more information in the vignette, vignette("ggdendro")

Resources