get results of principal Component Analysis in R - r

I want to get the results of PC1 and PC2 to plot courbe of both in the same graph with tableau desktop.
How to do?
data = read.csv(file="data.csv",header=TRUE, sep=";")
data.active <- data[, 1:30]
library(factoextra)
res.pca <- prcomp(data.active,center = TRUE, scale. = TRUE)
fviz_eig(res.pca)

I think you need to write a csv with the results in between R and Tableau. The code for that is written bellow :
# Principal Components Analysis
res.pca <- stats::prcomp(iris[,-5],center = TRUE, scale. = TRUE)
# Choose number of dimension kept
factoextra::fviz_eig(res.pca)
# Some visualisation
factoextra::fviz_pca_var(res.pca)
factoextra::fviz_pca_ind(res.pca)
factoextra::fviz_pca_biplot(res.pca)
# access transformed points
str(res.pca)
res.pca$x
# save points in csv to use outside of R
utils::write.csv(x = res.pca$x, file = "path/data_pca.csv")
# Load your data and do graphs the usual way with tableau
I used ?prcomp to find the data in the result, you may also push further your analysis and use some nice graphics (biplots of individual / variable, clustering, ...) with R (and import only images in Tableau) using : link

Related

Partial Effect Plots from spaMM

I am working on a project looking at how private docks are distributed along the coast with respect to a few demographic and landscape variables at the census block group level. Sample data and a .txt with descriptions of the the variable names can be found at (https://1drv.ms/u/s!AvVJV9wE-yCEiqQ-qZkfm2DVfXyb2w?e=cXNCTy) and code is below. I am trying to fit a CAR model using the spaMM package and have a working model that I can run a LRT on/get confidence intervals for my covariates, but when I try to plot marginal effects using spaMM’s built in plot_effects() function I get this error:
Error in .make_new_corr_lists(object = object, locdata = locdata, which_mats = which_mats, :
Found new levels for a 'adjacency' random effect.
I’m at a loss for what this is telling me and am unsure how to resolve this error as I’ve found nothing about this in spaMM vignettes or on any other help forums. Additionally, I recognize the plot_effects() function makes plots in base R and I was hoping to plot them using ggplot2. I’ve tried the ggeffects package and it doesn’t seem to recognize the spaMM model object – any suggestions on packages that can handle spaMM objects for plotting in ggplot2? Or will I need to do it by hand?
# Load packages/data and make neighbor list/weight matrix #
library(tidyverse)
library(spaMM)
library(rgdal)
library(rgeos)
library(sp)
library(sf)
library(spdep)
library(ggeffects)
load("Salt1KM.Rda") # This is a .Rda of my data
Salt1KM <- subset(Salt1KM, Block.Group != 252) # This removes one weird outlier
AllFiltersShape <- readOGR(dsn = getwd(), layer = "All_Filter_BGs") # Load the shapefile of block groups
Neighbors.QN <- poly2nb(AllFiltersShape, queen = TRUE, row.names = AllFiltersShape$BlockID, snap = 1) # Identify neighbors
Neighbors.QN # Check ouput
QN.adjMat.B <- nb2mat(Neighbors.QN, zero.policy = TRUE, style = "B") #Make binary weights matrix
# Model call and attempt to plot #
QN.B.Dock <- fitme(Dock ~ MHIE + PercentWE + PercentHomeE + Population.Density + Cutoff.Extend.Shore.Estuary.km + (1|Unique.Tract.ID) + adjacency(1|Block.Group), adjMatrix = QN.adjMat.B, data = Salt1KM, family = negbin())
summary(QN.B.Dock)
plot_effects(QN.B.Dock, focal_var = "PercentWE", add = TRUE) # This gives the error in the post
ggpredict(QN.B.Dock, terms = c("PercentWE", "MHIE")) # Attempt at using ggpredict, does not recognize the object at all

Plotting R2 of each/certain PCA component per wavelength with R

I have some experience in using PCA, but this is the first time I am attempting to use PCA for spectral data...
I have a large data with spectra where I used prcomp command to calculated PCA for the whole dataset. My results show that 3 components explain 99% of the variance.
I would like to plot the contribution of each of the three PCA components at every wavelength (in steps of 4, 200-1000 nm) like the example of a plot 2 I found on this site:
https://learnche.org/pid/latent-variable-modelling/principal-component-analysis/pca-example-analysis-of-spectral-data
Does anyone have a code how I could do this in R?
Thank you
I believe the matrix of variable loadings is found in model.pca$rotation, see prcomp documentation.
So something like this should do (using the example on your linked website):
file <- 'http://openmv.net/file/tablet-spectra.csv'
spectra <- read.csv(file, header = FALSE)
n.comp <- 4
model.pca <- prcomp(spectra[,2:651],
center = TRUE,
scale =TRUE,
rank. = n.comp)
summary(model.pca)
par(mfrow=c(n.comp,1))
sapply(1:n.comp, function(comp){
plot(2:651, model.pca$rotation[,comp], type='l', lwd=2,
main=paste("Comp.", comp), xlab="Wavelength INDEX")
})
I don't have the wavelength values, so I used the indices of the array here ; output below.

Dendrogram and HistDAWass package

I am using the HistDAWass package (https://cran.r-project.org/web/packages/HistDAWass/index.html) to perform clustering using a script partially provided by the package author.
As the Data1.csv files does not include a column with the row name sample (labels) I get a dendrogram that mark the tree labels as I1...I6.
Therefore, I tried to work with a new file (Data2.csv) which its first column include the labels but I get an error.
I will appreciate if someone can explain me how to generate the dendrogram with the new labels.
Script:
library(HistDAWass)
data=read.csv('D:/Data1.csv', header = FALSE)
data=t(data)
Hdata=MatH(nrows=6,ncols = 1)
for (i in 1:get.MatH.nrows(Hdata)){
tmp=data2hist(as.vector(data[,i]))
Hdata#M[i,1][[1]]=tmp
}
results=WH_hclust(x = Hdata,simplify = TRUE, method="complete")
plot(results) # it plots the dendrogram
Data files (in zip):
http://ge.tt/8yVsiQS2/v/0
The script contains a way for generating a matrix, where, in each cell there is a distributionH object. From raw data (for each row of the csv file) a distributionH in the for cycle, a new MatH (a matrix of distributions) is build.
For building the same from Data2.csv file you should run the following script
library(HistDAWass)
#read data
data=read.csv('Data2.csv', header = FALSE)
#initialize an empty MatH matrix using names from the firs colum of data
Hdata=MatH(nrows=nrow(data),rownames=as.list(as.character(data[,1])),ncols = 1)
#Fill the matrix
for (i in 1:get.MatH.nrows(Hdata)){
tmp=data2hist(as.vector(t(data[i,2:ncol(data)])))
Hdata#M[i,1][[1]]=tmp
}
#Do hierarchical clustering
results=WH_hclust(x = Hdata,simplify = TRUE, method="complete")
plot(results) # it plots the dendrogram

Extracting values from a graph

I have a graph that is created by complex numbers from the function below. I would like to extract the resulting data points which correpond with the line from the data plot as to be able to work with a vector of data.
library(multitaper)
NW<-10
K<-5
x<-c(2,3,1,3,4,6,7,8,5,4,3,2,4,5,7,8,6,4,3,2,4,5,7,8,6,4,5,3,2,5,7,8,6,4,5,3,6,7,8,8,9,7,6,5,4,7)
resSpec <- spec.mtm(as.ts(x), k= K, nw=NW, nFFT = length(x),
centreWithSlepians = TRUE, Ftest = TRUE,
jackknife = FALSE, maxAdaptiveIterations = 100,
plot =FALSE, na.action = na.fail)
plot(resSpec)
What would be the best procedure. I have tried saving the plot in emf. I wanted to use package ReadImages which was I believe the right package. (however this was not available for R versión 3.02 so I could not use it). What would be the correct procedure of saving and extracting and are there other packages and in what file types could I save the graph (as far as I can see R (OS windows) only permist emf.)
Any help welcomed

R programming - Graphic edges too large error while using clustering.plot in EMA package

I'm an R programming beginner and I'm trying to implement the clustering.plot method available in R package EMA. My clustering works fine and I can see the results populated as well. However, when I try to generate a heat map using clustering.plot, it gives me an error "Error in plot.new (): graphic edges too large". My code below,
#Loading library
library(EMA)
library(colonCA)
#Some information about the data
data(colonCA)
summary(colonCA)
class(colonCA) #Expression set
#Extract expression matrix from colonCA
expr_mat <- exprs(colonCA)
#Applying average linkage clustering on colonCA data using Pearson correlation
expr_genes <- genes.selection(expr_mat, thres.num=100)
expr_sample <- clustering(expr_mat[expr_genes,],metric = "pearson",method = "average")
expr_gene <- clustering(data = t(expr_mat[expr_genes,]),metric = "pearson",method = "average")
expr_clust <- clustering.plot(tree = expr_sample,tree.sup=expr_gene,data=expr_mat[expr_genes,],title = "Heat map of clustering",trim.heatmap =1)
I do not get any error when it comes to actually executing the clustering process. Could someone help?
In your example, some of the rownames of expr_mat are very long (max(nchar(rownames(expr_mat)) = 271 characters). The clustering_plot function tries to make a margin large enough for all the names but because the names are so long, there isn't room for anything else.
The really long names seem to have long stretches of periods in them. One way to condense the names of these genes is to replace runs of 2 or more periods with just one, so I would add in this line
#Extract expression matrix from colonCA
expr_mat <- exprs(colonCA)
rownames(expr_mat)<-gsub("\\.{2,}","\\.", rownames(expr_mat))
Then you can run all the other commands and plot like normal.

Resources