Multidimensional Scaling - r

I've 5x14 data matrix. I'm using the MDS to get a perceptual map. I can do the MDS properly & get the result.
But my problem is in MDS we can map either row or column variables. Is it possible to map both row & column variable using MDS.
The code I used is the following:
perp<-read.csv("E:\\Projects\\Combined_3.csv")
ads.dis<-dist(perp)
perp_mds <- cmdscale(ads.dis, k = 2,eig=TRUE)
x <- perp_mds$points[,1]
y <- perp_mds$points[,2]
plot(x,y, xlab = "Coordinate 1", ylab = "Coordinate 2", type = "n")
text(x,y, labels = rownames(perp))
I'll be grateful if somebody can help me with the coding.
Regards,
Ari

In general, the answer is no, not with cmdscale(). All that cmdscale() has knowledge of is the dissimilarity between objects. In the vegan package, there is function capscale() which is a constrained version of principal coordinates analysis (PCoA aka MDS), but can be used for normal PCoA. It can place both the objects and the variables in a biplot-like figure:
require(vegan)
data(varespec)
mod <- capscale(varespec ~ 1)
plot(mod)
But do note that PCoA with the euclidean distance is the same as PCA, which also could be used and will naturally plot both the objects and the variables:
plot(rda(varespec))
or using base R functions
mod2 <- prcomp(varespec)
biplot(mod2)
Or did you mean the non-metric version of MDS?

Related

Is there a way to add species to an ISOMAP plot in R?

I am using the isomap-function from vegan package in R to analyse community data of epiphytic mosses and lichens. I started analysing the data using NMDS but due to the structure of the data ran into problems which is why I switched to ISOMAP which works perfectly well and returns very nice results. So far so good... However, the output of the function does not support plotting of species within the ISOMAP plot as species scores are not available. Anyway, I would really like to add species information to enhance the interpretability of the output.
Does anyone of you has a solution or hint to this problem? Is there a way to add species kind of post hoc to the plot as it can be done with environmental data?
I would greatly appreciate any help on this topic!
Thank you and best regards,
Inga
No, there is no function to add species scores to isomap. It would look like this:
`sppscores<-.isomap` <-
function(object, value)
{
value <- scale(value, center = TRUE, scale = FALSE)
v <- crossprod(value, object$points)
attr(v, "data") <- deparse(substitute(value))
object$species <- v
object
}
Or alternatively:
`sppscores<-.isomap` <-
function(object, value)
{
wa <- vegan::wascores(object$points, value, expand = TRUE)
attr(wa, "data") <- deparse(substitute(value))
object$species <- wa
object
}
If ord is your isomap result and comm are your community data, you can use these as:
sppscores(ord) <- comm # either alternative
I have no idea (yet) which of these alternatives is more correct. The first adds species scores as vectors of their linear increase, the second as their weighted averages in ordination space, but expanded so that we allow some species be more extreme than the site units where they occur.
These will add new element species to the result object ord. However, using these in vegan would need more coding, but you can extract the species scores with vegan::scores, but their scaling is based on the original scale of community data, and may be badly scaled with respect to points of site units, and working on this would require more work. However, you can plot them separately, or then multiply with a constant giving similar scaling as site unit scores.
sp <- scores(ord, display="species", choices=1:2)
plot(sp, type = "n", asp = 1) # does not allow plotting text
text(sp, labels = rownames(sp)) # so we must add text

Plotting R2 of each/certain PCA component per wavelength with R

I have some experience in using PCA, but this is the first time I am attempting to use PCA for spectral data...
I have a large data with spectra where I used prcomp command to calculated PCA for the whole dataset. My results show that 3 components explain 99% of the variance.
I would like to plot the contribution of each of the three PCA components at every wavelength (in steps of 4, 200-1000 nm) like the example of a plot 2 I found on this site:
https://learnche.org/pid/latent-variable-modelling/principal-component-analysis/pca-example-analysis-of-spectral-data
Does anyone have a code how I could do this in R?
Thank you
I believe the matrix of variable loadings is found in model.pca$rotation, see prcomp documentation.
So something like this should do (using the example on your linked website):
file <- 'http://openmv.net/file/tablet-spectra.csv'
spectra <- read.csv(file, header = FALSE)
n.comp <- 4
model.pca <- prcomp(spectra[,2:651],
center = TRUE,
scale =TRUE,
rank. = n.comp)
summary(model.pca)
par(mfrow=c(n.comp,1))
sapply(1:n.comp, function(comp){
plot(2:651, model.pca$rotation[,comp], type='l', lwd=2,
main=paste("Comp.", comp), xlab="Wavelength INDEX")
})
I don't have the wavelength values, so I used the indices of the array here ; output below.

R superimposing bivariate normal density (ellipses) on scatter plot

There are similar questions on the website, but I could not find an answer to this seemingly very simple problem. I fit a mixture of two gaussians on the Old Faithful Dataset:
if(!require("mixtools")) { install.packages("mixtools"); require("mixtools") }
data_f <- faithful
plot(data_f$waiting, data_f$eruptions)
data_f.k2 = mvnormalmixEM(as.matrix(data_f), k=2, maxit=100, epsilon=0.01)
data_f.k2$mu # estimated mean coordinates for the 2 multivariate Gaussians
data_f.k2$sigma # estimated covariance matrix
I simply want to super-impose two ellipses for the two Gaussian components of the model described by the mean vectors data_f.k2$mu and the covariance matrices data_f.k2$sigma. To get something like:
For those interested, here is the MatLab solution that created the plot above.
If you are interested in the colors as well, you can use the posterior to get the appropriate groups. I did it with ggplot2, but first I show the colored solution using #Julian's code.
# group data for coloring
data_f$group <- factor(apply(data_f.k2$posterior, 1, which.max))
# plotting
plot(data_f$eruptions, data_f$waiting, col = data_f$group)
for (i in 1: length(data_f.k2$mu)) ellipse(data_f.k2$mu[[i]],data_f.k2$sigma[[i]], col=i)
And for my version using ggplot2.
# needs ggplot2 package
require("ggplot2")
# ellipsis data
ell <- cbind(data.frame(group=factor(rep(1:length(data_f.k2$mu), each=250))),
do.call(rbind, mapply(ellipse, data_f.k2$mu, data_f.k2$sigma,
npoints=250, SIMPLIFY=FALSE)))
# plotting command
p <- ggplot(data_f, aes(color=group)) +
geom_point(aes(waiting, eruptions)) +
geom_path(data=ell, aes(x=`2`, y=`1`)) +
theme_bw(base_size=16)
print(p)
You can use the ellipse-function from package mixtools. The initial problem was that this function swaps x and y from your plot. I'll try to figure this out and update the answe. (I'll leave the colors to somebody else...)
plot( data_f$eruptions,data_f$waiting)
for (i in 1: length(data_f.k2$mu)) ellipse(data_f.k2$mu[[i]],data_f.k2$sigma[[i]])
Using mixtools internal plotting function:
plot.mixEM(data_f.k2, whichplots=2)

How to get a good dendrogram using R

I am using R to do a hierarchical cluster analysis using the Ward's squared euclidean distance. I have a matrix of x columns(stations) and y rows(numbers in float), the first row contain the header(stations' names). I want to have a good dendrogram where the name of the station appear at the bottom of the tree as i am not able to interprete my result. My aim is to find those stations which are similar. However using the following codes i am having numbers (100,101,102,...) for the lower branches.
Yu<-read.table("yu_s.txt",header = T, dec=",")
library(cluster)
agn1 <- agnes(Yu, metric = "euclidean", method="ward", stand = TRUE)
hcd<-as.dendrogram(agn1)
par(mfrow=c(3,1))
plot(hcd, main="Main")
plot(cut(hcd, h=25)$upper,
main="Upper tree of cut at h=25")
plot(cut(hcd, h=25)$lower[[2]],
main="Second branch of lower tree with cut at h=25")
A nice collection of examples are present here (http://gastonsanchez.com/blog/how-to/2012/10/03/Dendrograms.html)
Two methods:
with hclust from base R
hc<-hclust(dist(mtcars),method="ward")
plot(hc)
Default plot
ggplot
with ggplot and ggdendro
library(ggplot2)
library(ggdendro)
# basic option
ggdendrogram(hc, rotate = TRUE, size = 4, theme_dendro = FALSE)

1-D conditional slice from a 2-D probability density function in R using np package

consider the included example in the np-package for r,
page 21 of the Vignettes for np package.
npcdens returns a conditional density object and is able to plot 2d-pdf and 2d-cdf, as shown. I wanted to know if I can somehow extract the 1-D information (pdf / cdf) from the object if I were to specify one of the two parameters, like in a vector or something ?? I am new to R and was not able to find out the format of the object.
Thanks for the help.
-Egon.
Here is the code as requested:
require(np)
data("Italy")
attach(Italy)
bw <- npcdensbw(formula=gdp~ordered(year), tol=.1, ftol=.1)
fhat <- npcdens(bws=bw)
summary(fhat)
npplot(bws=bw)
npplot(bws=bw, cdf=TRUE)
detach(Italy)
The fhat object contains all the needed info plus a whole lot more. To see what all is in there, do a str( fhat ) to see the structure.
I believe the values you are interested in are xeval, yeval, and condens (PDF density).
There are lots of ways to get at the values but I tend to like data frames. I'd pop the three vectors in a single data frame:
denDf <- cbind( year=as.character( fhat$xeval[,1] ), fhat$yeval, fhat$condens )
## had to do a dance around the year variable because it's a factor
then I'd select the values I want with a subset():
subset( denDf, year==1951 & gdp > 8 & gdp < 8.2)
since gdp is a floating point value it's very hard to select with a == operator.
The method suggested by JD Long will only extract density for data points in the existing training set. If you want the density at other points (conditioning or conditional variables) you will need to use the predict()
function. The following code extracts and plots the 1-D density distribution conditioned on year ==1999, a value not contained in the original data set.
First construct a data frame with the same components as the Italy data set, with gdp regularly spaced and with "1999" an ordered factor.
yr1999<- rep("1999", 100)
gdpVals <-seq(1,35, length.out=100)
nD1999 <- data.frame(year = ordered(yr1999), gdp = gdpVals)
Next use the predict function to extract the densities.
gdpDens1999 <-predict(fhat,newdata = nD1999)
The following code plots the density.
plot(gdpVals, gdpDens1999, type='l', col='red', xlab='gdp', ylab = 'p(gdp|yr = 1999)')

Resources