Why does metaMDS() produce a horizontal distribution of our data? - r

We have a species presence table (so binary: 1=present, 0=absent). When using metaMDS of the vegan package, it produces a horizontal distribution of our data when plotted, instead of clusters.
We tried using different distance methods (Euclidean, Bray, Jaccard), but they all seem to produce the same plot.
myfungi.all looks like this:
structure(list(Sample = 1:12, Habitat = structure(c(1L, 1L, 1L,
1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("Dune", "Forest"
), class = "factor"), OTU88 = c(0L, 0L, 1L, 1L, 0L, 0L, 0L, 0L,
1L, 1L, 1L, 1L), OTU28 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L), OTU165 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L), OTU178 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L,
0L), OTU97 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L
), OTU39 = c(0L, 0L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L),
OTU104 = c(1L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L
), OTU95 = c(0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 0L,
0L), OTU90 = c(1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L), OTU119 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L), OTU451 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L,
0L), OTU98 = c(1L, 1L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L), OTU45 = c(0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L,
1L), OTU2 = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, 1L, 1L, 1L,
1L), OTU24 = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L), OTU169 = c(0L, 0L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L), OTU29 = c(1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L), OTU85 = c(0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L,
0L), OTU140 = c(1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 0L,
0L), OTU42 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L,
0L), OTU70 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L,
0L), OTU25 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L), OTU34 = c(1L, 1L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L,
1L), OTU181 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L), OTU201 = c(1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L), OTU17 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L), OTU1146 = c(0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L,
1L, 1L), OTU14 = c(0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 1L,
1L, 1L), OTU72 = c(0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L,
0L, 0L), OTU13 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L,
1L, 1L), OTU20 = c(0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L,
1L, 1L), OTU63 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L), OTU170 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L), OTU262 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L), OTU48 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L), OTU6 = c(0L, 0L, 0L, 1L, 0L, 0L, 1L, 1L, 0L, 0L,
0L, 0L), OTU3 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L,
1L, 1L), OTU31 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L), OTU73 = c(1L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 1L, 1L,
0L, 0L), OTU32 = c(0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 0L, 0L,
0L, 0L), OTU37 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L), OTU196 = c(0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 0L, 0L,
0L, 0L), OTU5 = c(1L, 1L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L,
0L, 0L), OTU11 = c(0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L,
0L, 1L), OTU16 = c(0L, 0L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L,
0L, 0L), OTU41 = c(0L, 0L, 0L, 1L, 1L, 1L, 0L, 0L, 0L, 0L,
0L, 0L), OTU71 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L), OTU109 = c(0L, 0L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L,
0L, 0L), OTU233 = c(0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L)), class = "data.frame", row.names = c(NA, -12L))
Our script looks like this:
myfungi.all = read.csv("soil_fungi.csv",header=T)
myfungi = myfungi.all[,c(3:51)]
myfungi.nmds.bc <- metaMDS(myfungi, distance = "bray", k = 2, binary = TRUE)
plot(myfungi.nmds.bc, type="t", main=paste("NMDS/Bray-Curtis -?? Stress =", round(myfungi.nmds.bc$stress,10)))
Does anyone have suggestions as what seems to be the problem?
At the moment our plot looks like this:

The solution you reported gives a perfect fit (stress nearly 0), and also gives a warning because of this dubious stress. The solution effectively puts your sampling units into two points so that you have absolutely dichotomous data. As Ben Bolker demonstrated, Principal Coordinates Analysis, PCoA (which you also can perform with stats::cmdscale, vegan::wcmdscale or vegan::dbrda) still has points in two major cluster, but spreads points within these clusters. PCoA is a linear method, but NMDS is non-linear and therefore often needs more data. It seems that in this case the weak ties (read the documentation ?monoMDS or Kruskal's papers cited in that documentation) is the stage that puts most demand on the data, and setting weakties = FALSE will prevent collapsing non-identical observations into two points:
m3 <- metaMDS(myfungi, weakties = FALSE)
m3 # stress 0.04124
stressplot(m3) # compare this to your result stressplot(myfungi.nmds.bc)
plot(m3)
The default monoMDS with weakties = TRUE (like Kruskal recommended) will consider the dichotomy of two groups as the only important non-linear difference, but with weakties = FALSE the solutions cannot proceed to zero stress. You still have a dichotomy, but with scatter.

Best guess is that you simply don't have enough data to distinguish two separate environmental axes: when I run your code I get
Warning message: In metaMDS(myfungi[, -(1:2)], distance = "bray", k = 2, binary = TRUE) : stress is (nearly) zero: you may have insufficient data
Out of your 53 species, only 35 are informative (the others appear either at none or at all of the sites):
m2 <- myfungi[,apply(myfungi,2,var)>0]
ncol(m2) ## 35
vv <- function(x) (image(Matrix(as.matrix(x))))
How many distinct distribution patterns are there?
nrow(unique(t(m2))) ## 27
You could try PCoA instead:
library(ape)
biplot(pcoa(vegdist(m2,"bray"))
As Jari Oksanen points out, you could also do this with cmdscale() in base R:
plot(cmdscale(vegdist(mm,"bray")),
col=as.numeric(myfungi$Habitat))

Related

I don't know how to create a DGEList (or anyway how to setup my data) to solve the tasks in the image below?

I have this data: (Design contains several tissues and the ones I'll need to consider are pancreas and lung)
head(Design)
Individual sex age RNA.quality..max10. organ tissue
GTEX-Y5V6-0526-SM-4VBRV GTEX-Y5V6 1 60-69 7.1 Thyroid Thyroid
GTEX-1KXAM-1726-SM-D3LAE GTEX-1KXAM 1 60-69 8.1 Thyroid Thyroid
GTEX-18A67-0826-SM-7KFTI GTEX-18A67 1 50-59 7.2 Thyroid Thyroid
GTEX-14BMU-0226-SM-5S2QA GTEX-14BMU 2 20-29 7.2 Thyroid Thyroid
GTEX-13PVR-0626-SM-5S2RC GTEX-13PVR 2 60-69 7.3 Thyroid Thyroid
GTEX-1211K-0726-SM-5FQUW GTEX-1211K 2 60-69 7.0 Thyroid Thyroid
dput(counts[1:10,])
structure(list(`GTEX-Y5V6-0526-SM-4VBRV` = c(0L, 1L, 2L, 1L,
0L, 0L, 0L, 0L, 0L, 214L), `GTEX-1KXAM-1726-SM-D3LAE` = c(0L,
0L, 2L, 0L, 0L, 0L, 0L, 0L, 0L, 205L), `GTEX-18A67-0826-SM-7KFTI` = c(0L,
0L, 2L, 0L, 0L, 0L, 0L, 0L, 0L, 164L), `GTEX-14BMU-0226-SM-5S2QA` = c(0L,
0L, 0L, 12L, 0L, 0L, 0L, 0L, 0L, 108L), `GTEX-13PVR-0626-SM-5S2RC` = c(0L,
0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 100L), `GTEX-1211K-0726-SM-5FQUW` = c(0L,
0L, 0L, 2L, 0L, 0L, 1L, 0L, 0L, 174L), `GTEX-1KXAM-0926-SM-CXZKA` = c(2L,
1L, 2L, 2L, 0L, 0L, 0L, 0L, 0L, 99L), `GTEX-18A67-2626-SM-718AD` = c(7L,
3L, 7L, 2L, 0L, 1L, 5L, 0L, 0L, 116L), `GTEX-14BMU-1126-SM-5RQJ8` = c(0L,
0L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 44L), `GTEX-1211K-1426-SM-5FQTF` = c(4L,
0L, 5L, 2L, 0L, 0L, 0L, 0L, 0L, 143L), `GTEX-11TT1-0726-SM-5GU5A` = c(0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 57L), `GTEX-1HCUA-1626-SM-A9SMG` = c(0L,
0L, 0L, 22L, 0L, 0L, 0L, 0L, 0L, 53L), `GTEX-1KXAM-0226-SM-EV7AP` = c(0L,
0L, 1L, 1L, 0L, 0L, 1L, 0L, 1L, 75L), `GTEX-18A67-1726-SM-7KFT9` = c(0L,
0L, 2L, 1L, 0L, 0L, 0L, 0L, 0L, 73L), `GTEX-14BMU-0726-SM-73KXS` = c(0L,
0L, 0L, 40L, 0L, 0L, 0L, 0L, 0L, 74L), `GTEX-13PVR-0726-SM-5S2PX` = c(0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 54L), `GTEX-1211K-1126-SM-5EGGB` = c(0L,
1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 25L), `GTEX-11TT1-0326-SM-5LUAY` = c(0L,
0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 54L), `GTEX-1KXAM-2426-SM-DIPFC` = c(1L,
0L, 2L, 1L, 0L, 0L, 2L, 0L, 0L, 29L), `GTEX-18A67-0326-SM-7LG5X` = c(0L,
0L, 5L, 4L, 0L, 0L, 2L, 0L, 1L, 91L), `GTEX-14BMU-2026-SM-5S2W6` = c(0L,
0L, 2L, 5L, 0L, 0L, 0L, 0L, 0L, 30L), `GTEX-13PVR-2526-SM-5RQIT` = c(0L,
0L, 2L, 1L, 0L, 0L, 0L, 0L, 0L, 14L), `GTEX-1211K-2126-SM-59HJZ` = c(1L,
0L, 2L, 0L, 0L, 0L, 1L, 0L, 0L, 51L), `GTEX-Y3I4-2326-SM-4TT81` = c(0L,
0L, 3L, 0L, 0L, 0L, 1L, 0L, 0L, 38L), `GTEX-1KXAM-0426-SM-DHXKG` = c(0L,
0L, 1L, 1L, 0L, 0L, 1L, 0L, 0L, 105L), `GTEX-18A67-1126-SM-7KFSB` = c(1L,
0L, 0L, 4L, 0L, 0L, 1L, 0L, 0L, 76L), `GTEX-14BMU-0526-SM-73KW4` = c(0L,
0L, 0L, 11L, 0L, 0L, 0L, 0L, 0L, 53L), `GTEX-1211K-0826-SM-5FQUP` = c(1L,
0L, 0L, 2L, 0L, 0L, 1L, 0L, 0L, 104L), `GTEX-11TT1-1626-SM-5EQL7` = c(0L,
0L, 0L, 1L, 0L, 0L, 1L, 0L, 0L, 113L), `GTEX-ZYFG-0226-SM-5GIDT` = c(1L,
0L, 2L, 2L, 0L, 0L, 2L, 0L, 0L, 54L), `GTEX-1KXAM-0826-SM-CXZK9` = c(0L,
0L, 0L, 5L, 0L, 0L, 2L, 0L, 0L, 97L), `GTEX-18A67-2426-SM-7LT95` = c(1L,
0L, 2L, 0L, 0L, 1L, 3L, 0L, 0L, 69L), `GTEX-14BMU-0926-SM-5S2QB` = c(0L,
0L, 0L, 3L, 0L, 0L, 0L, 0L, 0L, 29L), `GTEX-13PVR-1826-SM-5Q5CC` = c(1L,
0L, 0L, 3L, 0L, 1L, 2L, 0L, 0L, 32L), `GTEX-1211K-0926-SM-5FQTL` = c(0L,
0L, 0L, 3L, 0L, 0L, 1L, 0L, 0L, 99L), `GTEX-11TT1-0526-SM-5P9JO` = c(0L,
1L, 2L, 4L, 0L, 0L, 2L, 0L, 0L, 52L), `GTEX-1KXAM-0726-SM-E9U5I` = c(0L,
0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 45L), `GTEX-18A67-2526-SM-7LG5Z` = c(1L,
0L, 2L, 0L, 0L, 0L, 1L, 0L, 0L, 91L), `GTEX-14BMU-1026-SM-5RQJ5` = c(1L,
0L, 1L, 8L, 0L, 0L, 0L, 0L, 0L, 47L), `GTEX-13PVR-2026-SM-73KXT` = c(0L,
0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 27L), `GTEX-1211K-1326-SM-5FQV2` = c(0L,
0L, 3L, 0L, 0L, 0L, 1L, 1L, 0L, 57L), `GTEX-11TT1-0626-SM-5GU4X` = c(1L,
0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 90L), `GTEX-ZYFG-1826-SM-5GZWX` = c(0L,
0L, 3L, 2L, 0L, 0L, 2L, 0L, 0L, 91L), `GTEX-1KXAM-1926-SM-D3LAG` = c(0L,
0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 103L), `GTEX-18A67-2226-SM-7LT9Z` = c(0L,
0L, 2L, 2L, 0L, 0L, 1L, 0L, 1L, 157L), `GTEX-13PVR-1726-SM-5Q5EC` = c(1L,
0L, 2L, 0L, 0L, 0L, 1L, 0L, 0L, 34L), `GTEX-1211K-1826-SM-5EGJ2` = c(0L,
0L, 1L, 3L, 0L, 0L, 0L, 0L, 0L, 49L), `GTEX-11TT1-0926-SM-5GU5M` = c(0L,
2L, 0L, 3L, 1L, 0L, 0L, 0L, 1L, 49L), `GTEX-1KXAM-1026-SM-CY8IA` = c(0L,
0L, 1L, 3L, 0L, 0L, 0L, 0L, 0L, 93L), `GTEX-14BMU-1626-SM-5TDE7` = c(0L,
1L, 3L, 13L, 0L, 0L, 1L, 0L, 0L, 84L), `GTEX-13PVR-2226-SM-7DHKP` = c(0L,
0L, 2L, 2L, 0L, 0L, 0L, 0L, 0L, 75L), `GTEX-1211K-1926-SM-5EQLB` = c(0L,
1L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 114L), `GTEX-11TT1-2126-SM-5GU5Y` = c(2L,
0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 49L), `GTEX-ZT9W-2026-SM-51MRA` = c(0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 70L), `GTEX-1KXAM-2326-SM-CYPTD` = c(0L,
0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 20L), `GTEX-18A67-0226-SM-7LG67` = c(0L,
0L, 5L, 2L, 0L, 0L, 1L, 0L, 0L, 94L), `GTEX-14BMU-2126-SM-5S2TS` = c(0L,
0L, 0L, 3L, 0L, 0L, 0L, 0L, 0L, 50L), `GTEX-13PVR-2426-SM-5RQHN` = c(0L,
0L, 2L, 0L, 0L, 0L, 0L, 0L, 0L, 59L), `GTEX-1211K-2226-SM-5FQU6` = c(0L,
0L, 0L, 3L, 0L, 0L, 0L, 0L, 0L, 81L), `GTEX-11TT1-2426-SM-5EQMK` = c(0L,
1L, 0L, 2L, 0L, 0L, 0L, 0L, 0L, 60L)), row.names = c("ENSG00000243485",
"ENSG00000237613", "ENSG00000186092", "ENSG00000238009", "ENSG00000222623",
"ENSG00000241599", "ENSG00000236601", "ENSG00000235146", "ENSG00000223181",
"ENSG00000237491"), class = "data.frame")
I need to create a DGEList with only some of the genes: Pancreas and lung genes (if I am right), in order to do the tasks in the image below: Tasks
I need to do a PCA to check if there's separation among male and female genes, and after I need to do a differential expression analysis with the function exactTest(), and since I need a DGEList for exactTest to compare Pancreas sex1 genes with pancreas sex 2 genes, lungsex1-lungsex2 I suppose that I can do both after creating the DGEList.
In the end my problem is that I dont know how to setup the data.
If you need anything else I'll be here, thank you in advance.
PancreasLungDesign=Design[13:30,1:6]
PancreasLungDesign=PancreasLungDesign[-c(7:12),]
Counts2=counts[,13:30]
Counts2= Counts2[,-(7:12)]
rownames(PancreasLungDesign) == colnames(Counts2)
Expressedgenes2=Counts2>=10
NumExpressedgenes2=apply(Expressedgenes2,1,sum)
FilteredCounts2=Counts2[NumExpressedgenes2>0,]
NumExpressedgenes2=apply(Expressedgenes2,1,sum)
FilteredCounts2=Counts2[NumExpressedgenes2>0,]
y2=DGEList(counts=FilteredCounts2, group = PancreasLungDesign$tissue)
y2=calcNormFactors(y2)
apply(cpm(y2,normalized.lib.sizes = T),2,sum)
plotMDS(y2,table(PancreasLungDesign$sex),labels = PancreasLungDesign$tissue,col=rep(c("green","green","blue","blue","blue","green","yellow","yellow","red","red","yellow","red")),cex=0.5,main="Principal component analysis sex specific expression")

mlogit : using varying alternatives for mlogit in R

I am trying to use varying alternatives for each person. However not able to get it working. If I make the alternatives same for each person, it works fine. How to make it varying and work.
Data :
> dput( df1 )
structure(list(Choice = c(1L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 1L,
0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 0L, 0L,
1L, 0L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 0L, 0L, 0L, 1L), A = c(0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, -1L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 1L, 0L, 0L, -1L, 0L, 0L, 1L, 0L, 0L, -1L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L), B = c(0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, -1L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, -1L, 0L, 1L, 0L, 0L, -1L, 0L,
0L), C = c(1L, 0L, 0L, 0L, -1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, -1L, 0L, 0L, 0L,
1L, 0L, 0L, -1L, 0L, 0L, 0L, 0L, 0L, 0L), D = c(0L, 1L, 0L, 0L,
0L, -1L, 0L, 0L, 0L, 1L, 0L, 0L, -1L, 0L, 0L, 1L, 0L, 0L, -1L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L), E = c(0L, 0L, 1L, 0L, 0L, 0L, -1L, 0L, 0L, 0L, 1L,
0L, 0L, -1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, -1L, 0L), F = c(0L, 0L,
0L, 1L, 0L, 0L, 0L, -1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L,
0L, 0L, -1L, 0L, 0L, 1L, 0L, 0L, -1L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 1L, 0L, 0L, -1L), Alternative = c(1L, 2L, 3L, 4L, 1L,
2L, 3L, 4L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L,
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L,
3L)), row.names = c(NA, -38L), class = "data.frame")
Code :
model = mlogit( Choice ~ B + C + D + E + F | 0, data = df1,
alt.levels = unique( df1$Alternative ),
shape = "long")
Error
Error in dfidx::dfidx(data = data, dfa$idx, drop.index = dfa$drop.index, :
the data must be balanced in order to use the levels argument
You need to provide mlogit with an explicit ID variable denoting which participant made the choice. It can't infer them from the data.frame you've provided.
I'm assuming in your reproducible example that the alternatives in rows running sequentially from [1 - 4] or [1 - 3] represent the choice sets presented to a unique individual. If so, then you can fit a model like so:
library(mlogit)
# Explicitly create an ID variable
df1$ID <- rep(1:12, times = c(rep(4, 2), rep(3, 10)))
#Convert to dfidx data
dfx1 <- mlogit.data(df1,
shape = "long",
choice = "Choice",
id.var = "ID")
# Fit a model
m0 <- mlogit(Choice ~ B + C + D + E + F | 0,
data = dfx1)

system is computationally singular : mlogit

When I am trying to add all variables in formula, I am getting this error. If I omit A then, the model runs fine.
dput(df1)
structure(list(ID = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), Choice = c(1L, 0L, 0L,
0L, 0L, 1L, 1L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 0L, 0L, 1L, 0L,
1L, 0L, 0L, 0L, 1L, 0L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 0L, 0L, 0L,
1L), A = c(0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, -1L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, -1L, 0L, 0L, 1L, 0L, 0L, -1L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), B = c(0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, -1L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 1L, 0L, 0L, -1L, 0L, 1L, 0L, 0L, -1L, 0L, 0L),
C = c(1L, 0L, 0L, -1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, -1L, 0L, 0L, 0L, 1L,
0L, 0L, -1L, 0L, 0L, 0L, 0L, 0L, 0L), D = c(0L, 1L, 0L, 0L,
-1L, 0L, 0L, 1L, 0L, 0L, -1L, 0L, 0L, 1L, 0L, 0L, -1L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L), E = c(0L, 0L, 1L, 0L, 0L, -1L, 0L, 0L, 1L, 0L,
0L, -1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, -1L, 0L), F = c(0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L,
0L, -1L, 0L, 0L, 1L, 0L, 0L, -1L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 1L, 0L, 0L, -1L), Alternative = c(1L, 2L, 3L, 1L,
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L,
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L,
2L, 3L)), row.names = c(NA, -36L), class = "data.frame")
> model = mlogit( Choice ~ A + B + C + D + E + F | 0, data = df1,
+
+ alt.var = 'Alternative',
+
+ shape = "long")
Error in solve.default(H, g[!fixed]) :
system is computationally singular: reciprocal condition number = 4.85723e-17
I have seen these two questions and the documentation however still not able to figure it out completely. Any help will be highly appreciated.
R mlogit model, computationally singular
Error in mlogit: Error in solve.default(H, g[!fixed]) : system is computationally singular: reciprocal condition number = 3.4767e-18

Export ftable factors to html

I have a table created from ftable()
structure(c(1L, 0L, 0L, 0L, 1L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 1L, 2L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L,
1L, 1L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L,
0L, 1L, 0L, 2L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L,
0L, 1L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 0L,
0L, 0L, 1L, 0L, 0L, 0L, 0L), .Dim = c(12L, 7L), class = "ftable", row.vars = list(
ï..petal_size = c("large ", "small", "small "), stem_length = c("long",
"long ", "short", "short ")), col.vars = list(flow_color = c("blue",
"green", "indigo ", "orange", "red ", "violet", "yellow")))
I would like to export it using htmlTable, but when I use htmlTableon this i get this result with no factors and just numbers like in the picture here
How do I recover the factor names for the htmltable? Please note the final output should have the same number of rows and columns as the picture's output, but it needs to have the factor names on the rows and columns.
I will convert it first to data.frame and the add the necessary tweaks to obtain the desired output:
tableToHtml <-structure(c(1L, 0L, 0L, 0L, 1L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 1L, 2L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L,
1L, 1L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L,
0L, 1L, 0L, 2L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L,
0L, 1L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 0L,
0L, 0L, 1L, 0L, 0L, 0L, 0L), .Dim = c(12L, 7L), class = "ftable", row.vars = list(
ï..petal_size = c("large ", "small", "small "), stem_length = c("long",
"long ", "short", "short ")), col.vars = list(flow_color = c("blue",
"green", "indigo ", "orange", "red ", "violet", "yellow")))
library(htmlTable)
htmlTable(as.data.frame(tableToHtml),rnames=F, header=rep("", length(colnames(as.data.frame(tableToHtml)))))

Barplots with percentages of different variables

Given following example:
structure(list(jdgcbrbR = c(0L, 1L, 1L, 1L, 1L, 0L, 0L, 0L),
ctprpwrR = c(0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L), hrshsntaR = c(0L,
0L, 1L, 0L, 0L, 0L, 0L, 0L), dbctvrdR = c(0L, 0L, 1L, 1L,
0L, 0L, 0L, 0L), lwstrobR = c(0L, 0L, 1L, 0L, 0L, 0L, 0L,
0L), rgbrklwR = c(0L, 0L, 1L, 1L, 0L, 0L, 0L, 0L), ctinpltR = c(0L,
0L, 1L, 1L, 0L, 0L, 0L, 0L), stcbg2tR = c(0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L), tmprsR = c(NA, NA, NA, 0L, 0L, NA, NA, NA
), caplcstR = c(0L, 0L, 1L, 0L, 1L, 0L, 0L, 0L), widprsnR = c(0L,
0L, 1L, 0L, 0L, 0L, 0L, 0L), wevdctR = c(0L, 0L, 1L, 0L,
0L, 0L, 0L, 0L)), .Names = c("jdgcbrbR", "ctprpwrR", "hrshsntaR",
"dbctvrdR", "lwstrobR", "rgbrklwR", "ctinpltR", "stcbg2tR", "tmprsR",
"caplcstR", "widprsnR", "wevdctR"), row.names = 747:754, class = "data.frame")
How could I create a set of barplots, graphing the percentage of 1-values in each variable, thus giving a nice overview of the evolution of the percentage of 1's.
So far I tried creating the percentages (which failed because nrow is perceived as NULL):
pct_jdgcbrbR <- (sum(jdgcbrbR) / nrow(jdgcbrbR) * 100)
pct_jdgcbrbR
And I found the barplot function which could be usefull:
barplot(percentages, main="INR",
xlab="varnames")
The result should look something like this example I made in Excel:
The following works for me, assuming your data.frame is called "temp":
barplot((colSums(temp, na.rm=TRUE)/nrow(temp))*100)

Resources