How can I plot a biplot for LDA in r? - r

I did a linear discriminant analysis using the function lda() from the package MASS. Now I would try to plot a biplot like in ade4 package (forLDA). Do you know how can I do this?
If I try to use the biplot() function it doesn't work. For example, if I use the Iris data and make LDA:
dis2 <- lda(as.matrix(iris[, 1:4]), iris$Species)
then I can plot it using the function plot(), but if I use the function biplot() it doesn't work:
biplot(dis2)
Error in nrow(y) : argument "y" is missing, with no default
How can I plot the arrows of variables?

I wrote the following function to do this:
lda.arrows <- function(x, myscale = 1, tex = 0.75, choices = c(1,2), ...){
## adds `biplot` arrows to an lda using the discriminant function values
heads <- coef(x)
arrows(x0 = 0, y0 = 0,
x1 = myscale * heads[,choices[1]],
y1 = myscale * heads[,choices[2]], ...)
text(myscale * heads[,choices], labels = row.names(heads),
cex = tex)
}
For your example:
dis2 <- lda(as.matrix(iris[, 1:4]), iris$Species)
plot(dis2, asp = 1)
lda.arrows(dis2, col = 2, myscale = 2)
The length of the arrows is arbitrary relative to the lda plot (but not to each other, of course!). If you want longer or shorter arrows, change the value of myscale accordingly. By default, this plots arrows for the first and second axes. If you want to plot other axes, change choices to reflect this.

My understanding is that biplots of linear discriminant analyses can be done, it is implemented in fact also in R package ggbiplot, see https://github.com/vqv/ggbiplot/tree/experimental and package ggord, see https://github.com/fawda123/ggord, for your example:
install.packages("devtools")
library(devtools)
install_github("fawda123/ggord")
library(ggord)
ord <- lda(Species ~ ., iris, prior = rep(1, 3)/3)
ggord(ord, iris$Species)
Also the book "Biplots in practice" by M. Greenacre has one chapter (chapter 11) on it and in Figure 11.5 it shows a biplot of a linear discriminant analysis of the iris dataset:

You can achieve this using the ggord package from github. The dataset used is IRIS dataset
# --- data partition -- #
set.seed(555)
IRSam <- sample.int(n = nrow(IR), size = floor(.60*nrow(IR)), replace = FALSE, prob = NULL)
IRTrain <- IR[IRSam,]
IRTest <- IR[-IRSam,]
# --- Prediction --- #
p<- predict(IR.lda, IRTrain)
# --- plotting a biplot --- #
library(devtools)
# install_github('fawda123/ggord') --- Used to install ggord from github we need to run devtools to achieve this.
library(ggord)
ggord(IR.lda, IRTrain$Species, ylim=c(-5,5), xlim=c(-10,10))

Related

Trying to find a way to combine IRT info plots from 3 different mirt models in R in the same

I am looking to combine all three" test information function" lines (one for each model) into one and the same graph. I have a data set of category 1-5 Likert responses in 400 rows in sets of 8 columns (one for each item). I have ran three IRT models on these sets using mirt package in R, and produced test info plots. I would like to combine IRT test info plots from three different (graded response) models, three lines, in one and the same grid.
plot(PFgrmodel29, type = 'info', xlim = c(-4, 4), ylim=c(0,85))
plot(PFgrmodel43, type = 'info', xlim = c(-4, 4), ylim=c(0,85))
plot(PFgrmodel57, type = 'info', xlim = c(-4, 4), ylim=c(0,85))
Example of test info plot:
How can I achieve this with mirt, lattice, ggplot2 or similar?
Your plots from the mirt package are a lattice object, so you can try using latticeExtra, since you did not provide your dataset, I provide an example code below using the example dataset in the package:
library(mirt)
library(latticeExtra)
fulldata <- expand.table(LSAT7)
mod1 <- mirt(fulldata,1,SE=TRUE)
mod2 <- mirt(fulldata,1, itemtype = 'Rasch')
mod3 <- mirt(fulldata,1,itemtype='ideal')
key=list(columns=2,
text=list(lab=c("mod1","mod2","mod3")),
lines=list(lwd=4, col=c("blue","orange","red"))
)
p1 = plot(mod1,type="info",key=key)
p2 = update(plot(mod2,type="info"),col="orange")
p3 = update(plot(mod3,type="info"),col="red")
p1+p2+p3
That is just beautiful! Works like a charm, except I needed to add ylim=c(0,100) to modify the y axis (taller) to fit the data. I thought that placing the model with the highest info curve first ( as mod1) would do it, but no. Thank you Stupidwolf so much for providing the code!! No need for latticeExtra package.
ALso I had to retain the "model" part of the code for this to work:
model <- 'F = 1-5 PRIOR = (5, g, norm, -1.5, 3)'
My code looks like this now:
library(mirt)
library(latticeExtra)
model <- 'F = 1-5 PRIOR = (5, g, norm, -1.5, 3)'
mod1 <- mirt(PFdata57,1,itemtype="graded", SE=TRUE)
mod2 <- mirt(PFdata43,1,itemtype="graded", SE=TRUE)
mod3 <- mirt(PFdata29,1,itemtype="graded", SE=TRUE)
key=list(columns=1,
text=list(lab=c("P57/PF Short form 8a","P43/PF Short form 6a","P29/PF Short form 4a")),
lines=list(lwd=4, col=c("blue","orange","red")))
p1 = plot(mod1,type="info",key=key,xlim=c(-4,4),ylim=c(0,85))
p2 = update(plot(mod2,type="info"),col="orange")
p3 = update(plot(mod3,type="info"),col="red")
p1+p2+p3

How to make Partial Dependence plots into a scatterplot instead of a line graph from xgboost data

library(pdp)
library(xgboost)
param.list <- list(max_depth = 5, eta = 0.01, objective = "binary:logistic",
eval_metric = "auc")
house.xgb <- xgb.train(params = params,
data = xgb.DMatrix(as.matrix(house[ ,1:3]), label =
house$SalePrice, missing = NA),
nrounds = 500)
partial(house.xgb, pred.var = "MSSubClass", plot = T, train = house[ ,1:3])
Graph I want
https://slundberg.github.io/shap/notebooks/NHANES+I+Survival+Model.html the graph i want comes from this link which uses python. I don't care about the colors or y axis, I just want the scatterplot part rather than the line graph.
I've been using the partial from the pdp R package but if anyone can point me in the direction of another package that would be great. The pdp package only gives a few data points.
In the call to partial() just set plot = FALSE (which is the default). For example,
pd <- partial(house.xgb, pred.var = "MSSubClass",
train = house[ ,1:3])
head(pd) # print the data frame
plot(pd) # basic plot or use ggplot2, lattice, plotly, etc.
will return a data frame for which you can make a scatterplot with!

Plotting quantile regression by variables in a single page

I am running quantile regressions for several independent variables separately (same dependent). I want to plot only the slope estimates over several quantiles of each variable in a single plot.
Here's a toy data:
set.seed(1988)
y <- rnorm(50, 5, 3)
x1 <- rnorm(50, 3, 1)
x2 <- rnorm(50, 1, 0.5)
# Running Quantile Regression
require(quantreg)
fit1 <- summary(rq(y~x1, tau=1:9/10), se="boot")
fit2 <- summary(rq(y~x2, tau=1:9/10), se="boot")
I want to plot only the slope estimates over quantiles. Hence, I am giving parm=2 in plot.
plot(fit1, parm=2)
plot(fit2, parm=2)
Now, I want to combine both these plots in a single page.
What I have tried so far;
I tried setting par(mfrow=c(2,2)) and plotting them. But it's producing a blank page.
I have tried using gridExtra and gridGraphics without success. Tried to convert base graphs into Grob objects as stated here
Tried using function layout function as in this document
I am trying to look into the source code of plot.rqs. But I am unable to understand how it's plotting confidence bands (I'm able to plot only the coefficients over quantiles) or to change mfrow parameter there.
Can anybody point out where am I going wrong? Should I look into the source code of plot.rqs and change any parameters there?
While quantreg::plot.summary.rqs has an mfrow parameter, it uses it to override par('mfrow') so as to facet over parm values, which is not what you want to do.
One alternative is to parse the objects and plot manually. You can pull the tau values and coefficient matrix out of fit1 and fit2, which are just lists of values for each tau, so in tidyverse grammar,
library(tidyverse)
c(fit1, fit2) %>% # concatenate lists, flattening to one level
# iterate over list and rbind to data.frame
map_dfr(~cbind(tau = .x[['tau']], # from each list element, cbind the tau...
coef(.x) %>% # ...and the coefficient matrix,
data.frame(check.names = TRUE) %>% # cleaned a little
rownames_to_column('term'))) %>%
filter(term != '(Intercept)') %>% # drop intercept rows
# initialize plot and map variables to aesthetics (positions)
ggplot(aes(x = tau, y = Value,
ymin = Value - Std..Error,
ymax = Value + Std..Error)) +
geom_ribbon(alpha = 0.5) +
geom_line(color = 'blue') +
facet_wrap(~term, nrow = 2) # make a plot for each value of `term`
Pull more out of the objects if you like, add the horizontal lines of the original, and otherwise go wild.
Another option is to use magick to capture the original images (or save them with any device and reread them) and manually combine them:
library(magick)
plots <- image_graph(height = 300) # graphics device to capture plots in image stack
plot(fit1, parm = 2)
plot(fit2, parm = 2)
dev.off()
im1 <- image_append(plots, stack = TRUE) # attach images in stack top to bottom
image_write(im1, 'rq.png')
The function plot used by quantreg package has it's own mfrow parameter. If you do not specify it, it enforces some option which it chooses on it's own (and thus overrides your par(mfrow = c(2,2)).
Using the mfrow parameter within plot.rqs:
# make one plot, change the layout
plot(fit1, parm = 2, mfrow = c(2,1))
# add a new plot
par(new = TRUE)
# create a second plot
plot(fit2, parm = 2, mfrow = c(2,1))

Visualize data using histogram in R

I am trying to visualize some data and in order to do it I am using R's hist.
Bellow are my data
jancoefabs <- as.numeric(as.vector(abs(Janmodelnorm$coef)))
jancoefabs
[1] 1.165610e+00 1.277929e-01 4.349831e-01 3.602961e-01 7.189458e+00
[6] 1.856908e-04 1.352052e-05 4.811291e-05 1.055744e-02 2.756525e-04
[11] 2.202706e-01 4.199914e-02 4.684091e-02 8.634340e-01 2.479175e-02
[16] 2.409628e-01 5.459076e-03 9.892580e-03 5.378456e-02
Now as the more cunning of you might have guessed these are the absolute values of some model's coefficients.
What I need is an histogram that will have for axes:
x will be the number (count or length) of coefficients which is 19 in total, along with their names.
y will show values of each column (as breaks?) having a ylim="" set, according to min and max of those values (or something similar).
Note that Janmodelnorm$coef simply produces the following
(Intercept) LON LAT ME RAT
1.165610e+00 -1.277929e-01 -4.349831e-01 -3.602961e-01 -7.189458e+00
DS DSA DSI DRNS DREW
-1.856908e-04 1.352052e-05 4.811291e-05 -1.055744e-02 -2.756525e-04
ASPNS ASPEW SI CUR W_180_270
-2.202706e-01 -4.199914e-02 4.684091e-02 -8.634340e-01 -2.479175e-02
W_0_360 W_90_180 W_0_180 NDVI
2.409628e-01 5.459076e-03 -9.892580e-03 -5.378456e-02
So far and consulting ?hist, I am trying to play with the code bellow without success. Therefore I am taking it from scratch.
# hist(jancoefabs, col="lightblue", border="pink",
# breaks=8,
# xlim=c(0,10), ylim=c(20,-20), plot=TRUE)
When plot=FALSE is set, I get a bunch of somewhat useful info about the set. I also find hard to use breaks argument efficiently.
Any suggestion will be appreciated. Thanks.
Rather than using hist, why not use a barplot or a standard plot. For example,
## Generate some data
set.seed(1)
y = rnorm(19, sd=5)
names(y) = c("Inter", LETTERS[1:18])
Then plot the cofficients
barplot(y)
Alternatively, you could use a scatter plot
plot(1:19, y, axes=FALSE, ylim=c(-10, 10))
axis(2)
axis(1, 1:19, names(y))
and add error bars to indicate the standard errors (see for example Add error bars to show standard deviation on a plot in R)
Are you sure you want a histogram for this? A lattice barchart might be pretty nice. An example with the mtcars built-in data set.
> coef <- lm(mpg ~ ., data = mtcars)$coef
> library(lattice)
> barchart(coef, col = 'lightblue', horizontal = FALSE,
ylim = range(coef), xlab = '',
scales = list(y = list(labels = coef),
x = list(labels = names(coef))))
A base R dotchart might be good too,
> dotchart(coef, pch = 19, xlab = 'value')
> text(coef, seq(coef), labels = round(coef, 3), pos = 2)

How do I extract the Correlation of fixed effects part of the lmer output

When you have a multilevel model with lots of factors and interactions the size of the correlation of fixed effects matrix can become quite big and unclear.
I can use the symbolic.cor=T parameter in the print method to make a clearer print of the summary like below:
ratbrain <-
within(read.delim("http://www-personal.umich.edu/~bwest/rat_brain.dat"),
{
treatment <- factor(treatment,
labels = c("Basal", "Carbachol"))
region <- factor(region,
labels = c("BST", "LS", "VDB"))
})
print(mod<-lmer(activate ~ region * treatment + (0 + treatment | animal),ratbrain),symbolic.cor=T)
This plots a somewhat clearer correlation matrix for large matrices. Allthough this example's matrix isn't so big.
But it would be nice if I could plot a heatmap of the correlations.
How do I extract the correlation of fixed effects so I can make this heatmap?
EDIT:
Here's the function I created thanks to the answers.
fixeff.plotcorr<-function(mod,...)
{
#require(GGally) # contains another correlation plot using ggplot2
require(lme4)
fixNames<-names(fixef(mod))
# Simon O'Hanlon's answer:
# so <- summary(mod)
# df<-as.matrix(so#vcov#factors$correlation) for version lme4<1.0
# df<-as.matrix(so$vcov#factors$correlation) # lme4 >= 1.0
df<-as.matrix(cov2cor(vcov(mod))) #Ben Bolker's solution
rownames(df)<-fixNames
colnames(df)<-abbreviate(fixNames, minlength = 11)
colsc=c(rgb(241, 54, 23, maxColorValue=255), 'white', rgb(0, 61, 104, maxColorValue=255))
colramp = colorRampPalette(colsc, space='Lab')
colors = colramp(100)
cols=colors[((df + 1)/2) * 100]
# I'm using function my.plotcorr which you can download here:
# http://hlplab.wordpress.com/2012/03/20/correlation-plot-matrices-using-the-ellipse-library/
my.plotcorr(df, col=cols, diag='none', upper.panel="number", mar=c(0,0.1,0,0),...)
# Another possibility is the corrplot package:
# cols <- colorRampPalette(c("#67001F", "#B2182B", "#D6604D", "#F4A582", "#FDDBC7",
# "#FFFFFF", "#D1E5F0", "#92C5DE", "#4393C3", "#2166AC", "#053061"))
# require(corrplot,quiet=T)
# corrplot(df, type="upper", method="number", tl.pos='tl', tl.col='black', tl.cex=0.8, cl.pos='n', col=cols(50))
# corrplot(df,add=TRUE, method='ellipse', type='lower', tl.pos='n', tl.col='black', cl.pos='n', col=cols(50), diag=FALSE)
}
You have to download the my.plotcorr function from here.
The resulting plot of the example above using command fixeff.plotcorr(mod) now looks like this:
How about using the built-in
cov2cor(vcov(mod))
?
I don't know direct method. But this is workaround.
diag(diag(1/sqrt(vcov(mod)))) %*% vcov(mod) %*% diag(diag(1/sqrt(vcov(mod))))
Using the S4 method listing function we can return the function which is dispatched when print is called on an object of class "mer":
selectMethod( print , "mer" )
Looking at the source code that is returned we can find the lines applicable to you:
if (correlation) {
corF <- so#vcov#factors$correlation
so is defined as the summary of your object, so in your case you should just need to extract:
so <- summary(mod)
so#vcov#factors$correlation

Resources