define population level for PCA analysis in adegenet - r

I want to perform a PCA analysis in adegenet starting from a genepop file without defined populations.
I imported the data like this:
datapop <- read.genepop('tous.gen', ncode=3, quiet = FALSE)
it works, and I can perform a PCA after scaling the data.
But I would like to plot the results / individuals on the PCA axis according to their population of origin using s.class. I have a vcf file with a three lettre code for each individual. I imported it in R:
pops_list <- read.csv('liste_pops.csv', header=FALSE)
but now how can I use it to define population levels in the genind object datapop?
I tried something likes this:
setPop(datapop, formula = NULL)
setPop(datapop) <- pops_list
but it doesn't work; even the first line doesn't work: I get this message:
"Erreur : formula must be a valid formula object."
And then how should I use it in s.class?
thanks
Didier

Without a working example it is kind of hard to tell but perhaps you can find the solution to your problem here: How to add strata information to a genind
Either way from your examples and given how the setPop method works, your line setPop(datapop, formula = NULL) would not work because you would not be defining anything. You would actually have to do:
setPop(datapop) <- pops_list
while also guaranteeing that pops_list is a factor with the appropriate format

I know this is a bit late, but the way to do this is to add pops_list as the strata and then use setPop() to select a certain column:
strata(datapop) <- pops_list
setPop(datapop) <- ~myPop # set the population to the column called "myPop" in the data frame

Related

Is there a way to add species to an ISOMAP plot in R?

I am using the isomap-function from vegan package in R to analyse community data of epiphytic mosses and lichens. I started analysing the data using NMDS but due to the structure of the data ran into problems which is why I switched to ISOMAP which works perfectly well and returns very nice results. So far so good... However, the output of the function does not support plotting of species within the ISOMAP plot as species scores are not available. Anyway, I would really like to add species information to enhance the interpretability of the output.
Does anyone of you has a solution or hint to this problem? Is there a way to add species kind of post hoc to the plot as it can be done with environmental data?
I would greatly appreciate any help on this topic!
Thank you and best regards,
Inga
No, there is no function to add species scores to isomap. It would look like this:
`sppscores<-.isomap` <-
function(object, value)
{
value <- scale(value, center = TRUE, scale = FALSE)
v <- crossprod(value, object$points)
attr(v, "data") <- deparse(substitute(value))
object$species <- v
object
}
Or alternatively:
`sppscores<-.isomap` <-
function(object, value)
{
wa <- vegan::wascores(object$points, value, expand = TRUE)
attr(wa, "data") <- deparse(substitute(value))
object$species <- wa
object
}
If ord is your isomap result and comm are your community data, you can use these as:
sppscores(ord) <- comm # either alternative
I have no idea (yet) which of these alternatives is more correct. The first adds species scores as vectors of their linear increase, the second as their weighted averages in ordination space, but expanded so that we allow some species be more extreme than the site units where they occur.
These will add new element species to the result object ord. However, using these in vegan would need more coding, but you can extract the species scores with vegan::scores, but their scaling is based on the original scale of community data, and may be badly scaled with respect to points of site units, and working on this would require more work. However, you can plot them separately, or then multiply with a constant giving similar scaling as site unit scores.
sp <- scores(ord, display="species", choices=1:2)
plot(sp, type = "n", asp = 1) # does not allow plotting text
text(sp, labels = rownames(sp)) # so we must add text

How to control plot layout for lmerTest output results?

I am using lme4 and lmerTest to run a mixed model and then use backward variable elimination (step) for my model. This seems to work well. After running the 'step' function in lmerTest, I plot the final model. The 'plot' results appear similar to ggplot2 output.
I would like to change the layout of the plot. The obvious answer is to do it manually myself creating an original plot(s) with ggplot2. If possible, I would like to simply change the layout of of the output, so that each plot (i.e. plotted dependent variable in the final model) are in their own rows.
See below code and plot to see my results. Note plot has three columns and I would like three rows. Further, I have not provided sample data (let me know if I need too!).
library(lme4)
library(lmerTest)
# Full model
Female.Survival.model.1 <- lmer(Survival.Female ~ Location + Substrate + Location:Substrate + (1|Replicate), data = Transplant.Survival, REML = TRUE)
# lmerTest - backward stepwise elimination of dependent variables
Female.Survival.model.ST <- step(Female.Survival.model.1, reduce.fixed = TRUE, reduce.random = FALSE, ddf = "Kenward-Roger" )
Female.Survival.model.ST
plot(Female.Survival.model.ST)
The function that creates these plots is called plotLSMEANS. You can look at the code for the function via lmerTest:::plotLSMEANS. The reason to look at the code is 1) to verify that, indeed, the plots are based on ggplot2 code and 2) to see if you can figure out what needs to be changed to get what you want.
In this case, it sounds like you'd want facet_wrap to have one column instead of three. I tested with the example from the **lmerTest* function step help page, and it looks like you can simply add a new facet_wrap layer to the plot.
library(ggplot2)
plot(Female.Survival.model.ST) +
facet_wrap(~namesforplots, scales = "free", ncol = 1)
Try this: plot(difflsmeans(Female.Survival.model.ST$model, test.effs = "Location "))

R programming - Graphic edges too large error while using clustering.plot in EMA package

I'm an R programming beginner and I'm trying to implement the clustering.plot method available in R package EMA. My clustering works fine and I can see the results populated as well. However, when I try to generate a heat map using clustering.plot, it gives me an error "Error in plot.new (): graphic edges too large". My code below,
#Loading library
library(EMA)
library(colonCA)
#Some information about the data
data(colonCA)
summary(colonCA)
class(colonCA) #Expression set
#Extract expression matrix from colonCA
expr_mat <- exprs(colonCA)
#Applying average linkage clustering on colonCA data using Pearson correlation
expr_genes <- genes.selection(expr_mat, thres.num=100)
expr_sample <- clustering(expr_mat[expr_genes,],metric = "pearson",method = "average")
expr_gene <- clustering(data = t(expr_mat[expr_genes,]),metric = "pearson",method = "average")
expr_clust <- clustering.plot(tree = expr_sample,tree.sup=expr_gene,data=expr_mat[expr_genes,],title = "Heat map of clustering",trim.heatmap =1)
I do not get any error when it comes to actually executing the clustering process. Could someone help?
In your example, some of the rownames of expr_mat are very long (max(nchar(rownames(expr_mat)) = 271 characters). The clustering_plot function tries to make a margin large enough for all the names but because the names are so long, there isn't room for anything else.
The really long names seem to have long stretches of periods in them. One way to condense the names of these genes is to replace runs of 2 or more periods with just one, so I would add in this line
#Extract expression matrix from colonCA
expr_mat <- exprs(colonCA)
rownames(expr_mat)<-gsub("\\.{2,}","\\.", rownames(expr_mat))
Then you can run all the other commands and plot like normal.

Stacked bar in R

I have a table exported in csv from PostgreSQL and I'd like to create a stacked bar graph in R. It's my first project in R.
Here's my data and what I want to do:
It the quality of the feeder bus service for a certain provider in the area. For each user of the train, we assign a service quality based of synchronization between the bus and the train at the train stations and calculate the percentage of user that have a ideal or very good service, a correct service, a deficient service or no service at all (linked to that question in gis.stackexchange)
So, It's like to use my first column as my x-axis labels and my headers as my categories. The data is already normalized to 100% for each row.
In Excel, it's a couple of clicks and I wouldn't mind typing a couple of line of codes since it's the final result of an already quite long plpgsql script... I'd prefer to continue to code instead of moving to Excel (I also have dozens of those to do).
So, I tried to create a stacked bar using the examples in Nathan Yau's "Visualize This" and the book "R in Action" and wasn't quite successful. Normally, their examples use data that they aggregate with R and use that. Mine is already aggregated.
So, I've finally come up with something that works in R:
but I had to transform my data quite a bit:
I had to transpose my table and remove my now-row (ex-column) identifier.
Here's my code:
# load libraries
library(ggplot2)
library(reshape2)
# load data
stl <- read.csv("D:/TEMP/rabat/_stl_rabattement_stats_mtl.csv", sep=";", header=TRUE)
# reshape for plotting
stl_matrix <- as.matrix(stl)
# make a quick plot
barplot(stl_matrix, border=NA, space=0.1, ylim=c(0, 100), xlab="Trains", ylab="%",
main="Qualité du rabattement, STL", las = 3)
Is there any way that I could use my original csv and have the same result?
I'm a little lost here...
Thanks!!!!
Try the ggplot2 and reshape library. You should be able to get the chart you want with
stl$train_order <- as.numeric(rownames(stl))
stl.r <- melt(stl, id.vars = c("train_no", "train_order"))
stl.r$train_no <- factor(
stl.r$train_no,
levels = stl$train_no[order(stl$train_order)])
ggplot(stl.r, aes(x = factor(train_no), y = value, fill = variable)) + geom_bar(stat = 'identity')
It appears that you transposed the matrix manually. This can be done in R with the t() function.
Add the following line after the as.matrix(stl) line:
stl_matrix <- t(stl_matrix)

1-D conditional slice from a 2-D probability density function in R using np package

consider the included example in the np-package for r,
page 21 of the Vignettes for np package.
npcdens returns a conditional density object and is able to plot 2d-pdf and 2d-cdf, as shown. I wanted to know if I can somehow extract the 1-D information (pdf / cdf) from the object if I were to specify one of the two parameters, like in a vector or something ?? I am new to R and was not able to find out the format of the object.
Thanks for the help.
-Egon.
Here is the code as requested:
require(np)
data("Italy")
attach(Italy)
bw <- npcdensbw(formula=gdp~ordered(year), tol=.1, ftol=.1)
fhat <- npcdens(bws=bw)
summary(fhat)
npplot(bws=bw)
npplot(bws=bw, cdf=TRUE)
detach(Italy)
The fhat object contains all the needed info plus a whole lot more. To see what all is in there, do a str( fhat ) to see the structure.
I believe the values you are interested in are xeval, yeval, and condens (PDF density).
There are lots of ways to get at the values but I tend to like data frames. I'd pop the three vectors in a single data frame:
denDf <- cbind( year=as.character( fhat$xeval[,1] ), fhat$yeval, fhat$condens )
## had to do a dance around the year variable because it's a factor
then I'd select the values I want with a subset():
subset( denDf, year==1951 & gdp > 8 & gdp < 8.2)
since gdp is a floating point value it's very hard to select with a == operator.
The method suggested by JD Long will only extract density for data points in the existing training set. If you want the density at other points (conditioning or conditional variables) you will need to use the predict()
function. The following code extracts and plots the 1-D density distribution conditioned on year ==1999, a value not contained in the original data set.
First construct a data frame with the same components as the Italy data set, with gdp regularly spaced and with "1999" an ordered factor.
yr1999<- rep("1999", 100)
gdpVals <-seq(1,35, length.out=100)
nD1999 <- data.frame(year = ordered(yr1999), gdp = gdpVals)
Next use the predict function to extract the densities.
gdpDens1999 <-predict(fhat,newdata = nD1999)
The following code plots the density.
plot(gdpVals, gdpDens1999, type='l', col='red', xlab='gdp', ylab = 'p(gdp|yr = 1999)')

Resources