Understanding the output from a factor analysis using the FAMD function - r

I have some survey data where people were asked questions and given a yes or no option (1=yes, 0=no). I would like to be able to pick out some patterns in this data.
The questions are:
Do you enjoy XX work?
Do you do XX work alone?
Has your workload increased?
Do you have a backlog of work?
I would like to know whether people who work alone are more likely to have an increased workload, a backlog of work and not enjoy their job. To answer this, I think factor analysis is the way to go but I'm struggling to interpret the output.
Here is an example of my data:
enjoy <- c(1,1,0,1,0,1,0,0,0,1)
alone <- c(0,0,1,1,1,0,0,1,1,0)
workload <- c(0,0,1,1,0,1,0,0,0,1)
backlog <- c(0,0,1,1,0,1,0,0,0,0)
data <- data.frame(enjoy, alone, workload, backlog)
data <- data %>% mutate_if(sapply(data, is.numeric), as.character) ## convert from numeric to categorical
I'm using the FAMD function in factomineR as this can use categorical data.
library(FactoMineR)
data_famd <- FAMD(data, graph = FALSE)
Then using factoextra, I can see which variables contribute to each axis
library(factoextra)
# Contribution to the first dimension
fviz_contrib(data_famd, "var", axes = 1) ## backlog & workload
# Contribution to the second dimension
fviz_contrib(data_famd, "var", axes = 2) ## enjoy and alone
Then I can make this plot:
fviz_mfa_ind(data_famd,
habillage = "alone", # color by groups
palette = c("#00AFBB", "#E7B800", "#FC4E07"),
addEllipses = TRUE, ellipse.type = "confidence",
repel = TRUE) # Avoid text overlapping
This looks like that people who work alone vs not alone answer questions differently. But I don't understand what answers people who work alone (yellow) are giving vs people who don't work alone. They are clearly distinct so are doing something differently.
My main question is: What do the axes mean? I've done PCA's using continuous data before and using the loadings I can figure out what the axes mean, and therefore interpret these graphs. How do you do this for a factor analysis? Is there a different package?
Thanks for any help.

Related

Is there a way to add species to an ISOMAP plot in R?

I am using the isomap-function from vegan package in R to analyse community data of epiphytic mosses and lichens. I started analysing the data using NMDS but due to the structure of the data ran into problems which is why I switched to ISOMAP which works perfectly well and returns very nice results. So far so good... However, the output of the function does not support plotting of species within the ISOMAP plot as species scores are not available. Anyway, I would really like to add species information to enhance the interpretability of the output.
Does anyone of you has a solution or hint to this problem? Is there a way to add species kind of post hoc to the plot as it can be done with environmental data?
I would greatly appreciate any help on this topic!
Thank you and best regards,
Inga
No, there is no function to add species scores to isomap. It would look like this:
`sppscores<-.isomap` <-
function(object, value)
{
value <- scale(value, center = TRUE, scale = FALSE)
v <- crossprod(value, object$points)
attr(v, "data") <- deparse(substitute(value))
object$species <- v
object
}
Or alternatively:
`sppscores<-.isomap` <-
function(object, value)
{
wa <- vegan::wascores(object$points, value, expand = TRUE)
attr(wa, "data") <- deparse(substitute(value))
object$species <- wa
object
}
If ord is your isomap result and comm are your community data, you can use these as:
sppscores(ord) <- comm # either alternative
I have no idea (yet) which of these alternatives is more correct. The first adds species scores as vectors of their linear increase, the second as their weighted averages in ordination space, but expanded so that we allow some species be more extreme than the site units where they occur.
These will add new element species to the result object ord. However, using these in vegan would need more coding, but you can extract the species scores with vegan::scores, but their scaling is based on the original scale of community data, and may be badly scaled with respect to points of site units, and working on this would require more work. However, you can plot them separately, or then multiply with a constant giving similar scaling as site unit scores.
sp <- scores(ord, display="species", choices=1:2)
plot(sp, type = "n", asp = 1) # does not allow plotting text
text(sp, labels = rownames(sp)) # so we must add text

Can I arrange my study labels in a forest plot using study years after specifying the byvar to be something different? (Meta package-R)

Can I arrange all my study labels within subgroups in my forest plot with the year of publication after specifying that I want subgroups divided by a certain variable?
Here is the code I am currently using.
brugia.forest <- metaprop(event = no.positive, n = no.tested, studlab = studylabel, data = brugia, byvar = diagnostics, bylab = c("direct detection", "direct and indirect detection", "indirect detection"), print.byvar = F, sm = "PLO", method.tau = "REML", title = "", hakn = T)
I would like the studies within the "diagnostics" groups to be arranged from the oldest to the most recent and not alphabetically as is currently the case. I am using the meta package of R because of its user-friendliness and would like to continue using it (so, metafor suggestions may not be too helpful)
Thanks.
I would like to answer this question because the creator of the meta package, Dr. Guido Schwarzer was kind to answer the question via email. Here is the way forward:
By default, the forest function does not sort the studies at all,
instead the order of the dataset is used. One can therefore order the dataset before utilising it for the forestplot first.
Alternatively, one can use the 'sortvar' function to change the order of studies and specify the variable one wants to sort the studies by.
Hope this helps.
Another option is to use ggplot if you are familiar with the ggplot package. Gives a lot of flexibility in arranging and modifying the plots

Visualizing PCA with large number of variables in R using ggbiplot

I am trying to visualize a PCA that includes 87 variables.
prc <-prcomp(df[,1:87], center = TRUE, scale. = TRUE)
ggbiplot(prc, labels = rownames(df[,1:87]), var.axes = TRUE)
When I create the biplot, many of the vectors overlap with each other, making it impossible to read the labels. I was wondering if there is any way to only show some of the labels at a time. For example, I think it'd be useful if I could create a few separate biplots with each one showing only a subset of the labels on the vectors.
This question seems closely related, but I don't know if it translates to the latest version of ggbiplot. I'm also not sure how to modify the original functions.
A potential solution is to use the factoextra package to visualize your PCA results. The fviz_pca_biplot() function includes a repel argument. When repel = TRUE the plot labels are spread out to minimize overlap. There are also select.var options mentioned in the documentation, such as select.var = list(contrib=5) to display only the 5 most influential vectors. Also a select.var = list(name) option that seems to allow for the specification of a specific subset of variables that you want shown.
# read data
df <- mtcars[, c(1:7,10:11)]
# perform PCA
library("FactoMineR")
res.pca <- PCA(df, graph = FALSE)
# visualize
library(factoextra)
fviz_pca_biplot(res.pca, repel = TRUE, select.var = list(contrib = 5))

Using Likert Package in R for analyzing real survey data

I conducted a survey with 138 questions on it, of which only a few are likert type questions with some having different scales.
I have been trying to use the Likert package in R to analyze and graphically portray the data, however, I am seriously struggling to make sense of any of it.
I have gone through the "demos" which are only useful if you already know what is going on with the package. It doesn't explain any of the steps you have to take before being able to apply the likert package, what can actually be applied to the package, how you rename the variables etc.. All you get is a bunch of code and a rabbit hole to crawl down trying to figure it all out.
I have scoured google for a step by step guide to using the likert package but found nothing.
Can anyone please direct me to a guide or at least perhaps provide the steps I have to take with my dataframe before I can try to use the likert package?
I am hoping to fit a few of my columns(containing the likert responses) to stacked barplots using this package.
Once I figure out what exactly the Likert package will accept in terms of a cleaned up data frame, I should be able to follow the demo... maybe..
This is what I have done so far, based on my limited knowledge of R and trying to figure things out on my own.
library(likert)
library(dplyr)
fdaff_likert <- select(f2f, RESPID, daff_rate)
fdaff_likert <- data.frame(fdaff_likert)
fdaff_likert <- likert(items=fdaff_likert[,2, drop = FALSE], nlevels = 5)
the output of my likert is:
summary(fdaff_likert)
Item low neutral high mean sd
1 daff_rate 9.977827 37.91574 52.10643 3.802661 1.302508
The plot, however, is all over the place.. (unordered)
plot (fdaff_likert)
The likert scale is out of order and not properly centered. In addition, how do I rename the y-axis to the question?
For later analysis, how can I break it up into the group levels (based on another column specifying a region in the original data frame?
library(likert)
set.seed(1)
n <- 138
# An illustrative example
fdaff_likert <- data.frame(
RESPID=sample(1:5,n, replace=T),
daff_rate=factor(sample(1:5,n, replace=T), labels=c("Good","Neither","Poor","Very Good","Very Poor"))
)
fdaff_likert1 <- likert(items=fdaff_likert[,2, drop = FALSE], nlevels = 5)
# Plot with unordered categories
plot(fdaff_likert1)
# Reorder levels of daff_rate factor
fdaff_likert$daff_rate <- factor(fdaff_likert$daff_rate,
levels=levels(fdaff_likert$daff_rate)[c(5,3,2,1,4)])
fdaff_likert2 <- likert(items=fdaff_likert[,2, drop = FALSE], nlevels = 5)
# Plot with ordered categories
plot(fdaff_likert2)
Here is an illustrative example for creating a plot with grouped items.
set.seed(1)
fdaff_likert <- data.frame(
country=factor(sample(1:3, n, replace=T), labels=c("US","Mexico","Canada")),
item1=factor(sample(1:5,n, replace=T), labels=c("Very Poor","Poor","Neither","Good","Very Good")),
item2=factor(sample(1:5,n, replace=T), labels=c("Very Poor","Poor","Neither","Good","Very Good")),
item3=factor(sample(1:5,n, replace=T), labels=c("Very Poor","Poor","Neither","Good","Very Good"))
)
names(fdaff_likert) <- c("Country",
"1. I read only if I have to",
"2. Reading is one of my favorite hobbies",
"3. I find it hard to finish books")
fdaff_likert3 <- likert(items=fdaff_likert[,2:4], grouping=fdaff_likert[,1])
plot(fdaff_likert3)

Node labels on circular phylogenetic tree

I am trying to create circular phylogenetic tree. I have this part of code:
fit<- hclust(dist(Data[,-4]), method = "complete", members = NULL)
nclus= 3
color=c('red','blue','green')
color_list=rep(color,nclus/length(color))
clus=cutree(fit,nclus)
plot(as.phylo(fit),type='fan',tip.color=color_list[clus],label.offset=0.2,no.margin=TRUE, cex=0.70, show.node.label = TRUE)
And this is result:
Also I am trying to show label for each node and to color branches. Any suggestion how to do that?
Thanks!
When you say "color branches" I assume you mean color the edges. This seems to work, but I have to think there's a better way.
Using the built-in mtcars dataset here, since you did not provide your data.
plot.fan <- function(hc, nclus=3) {
palette <- c('red','blue','green','orange','black')[1:nclus]
clus <-cutree(hc,nclus)
X <- as.phylo(hc)
edge.clus <- sapply(1:nclus,function(i)max(which(X$edge[,2] %in% which(clus==i))))
order <- order(edge.clus)
edge.clus <- c(min(edge.clus),diff(sort(edge.clus)))
edge.clus <- rep(order,edge.clus)
plot(X,type='fan',
tip.color=palette[clus],edge.color=palette[edge.clus],
label.offset=0.2,no.margin=TRUE, cex=0.70)
}
fit <- hclust(dist(mtcars[,c("mpg","hp","wt","disp")]))
plot.fan(fit,3); plot.fan(fit,5)
Regarding "label the nodes", if you mean label the tips, it looks like you've already done that. If you want different labels, unfortunately, unlike plot.hclust(...) the labels=... argument is rejected. You could experiment with the tiplabels(....) function, but it does not seem to work very well with type="fan". The labels come from the row names of Data, so your best bet IMO is to change the row names prior to clustering.
If you actually mean label the nodes (the connection points between the edges, have a look at nodelabels(...). I don't provide a working example because I can't imagine what labels you would put there.

Resources