Related
I have this data: (Design contains several tissues and the ones I'll need to consider are pancreas and lung)
head(Design)
Individual sex age RNA.quality..max10. organ tissue
GTEX-Y5V6-0526-SM-4VBRV GTEX-Y5V6 1 60-69 7.1 Thyroid Thyroid
GTEX-1KXAM-1726-SM-D3LAE GTEX-1KXAM 1 60-69 8.1 Thyroid Thyroid
GTEX-18A67-0826-SM-7KFTI GTEX-18A67 1 50-59 7.2 Thyroid Thyroid
GTEX-14BMU-0226-SM-5S2QA GTEX-14BMU 2 20-29 7.2 Thyroid Thyroid
GTEX-13PVR-0626-SM-5S2RC GTEX-13PVR 2 60-69 7.3 Thyroid Thyroid
GTEX-1211K-0726-SM-5FQUW GTEX-1211K 2 60-69 7.0 Thyroid Thyroid
dput(counts[1:10,])
structure(list(`GTEX-Y5V6-0526-SM-4VBRV` = c(0L, 1L, 2L, 1L,
0L, 0L, 0L, 0L, 0L, 214L), `GTEX-1KXAM-1726-SM-D3LAE` = c(0L,
0L, 2L, 0L, 0L, 0L, 0L, 0L, 0L, 205L), `GTEX-18A67-0826-SM-7KFTI` = c(0L,
0L, 2L, 0L, 0L, 0L, 0L, 0L, 0L, 164L), `GTEX-14BMU-0226-SM-5S2QA` = c(0L,
0L, 0L, 12L, 0L, 0L, 0L, 0L, 0L, 108L), `GTEX-13PVR-0626-SM-5S2RC` = c(0L,
0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 100L), `GTEX-1211K-0726-SM-5FQUW` = c(0L,
0L, 0L, 2L, 0L, 0L, 1L, 0L, 0L, 174L), `GTEX-1KXAM-0926-SM-CXZKA` = c(2L,
1L, 2L, 2L, 0L, 0L, 0L, 0L, 0L, 99L), `GTEX-18A67-2626-SM-718AD` = c(7L,
3L, 7L, 2L, 0L, 1L, 5L, 0L, 0L, 116L), `GTEX-14BMU-1126-SM-5RQJ8` = c(0L,
0L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 44L), `GTEX-1211K-1426-SM-5FQTF` = c(4L,
0L, 5L, 2L, 0L, 0L, 0L, 0L, 0L, 143L), `GTEX-11TT1-0726-SM-5GU5A` = c(0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 57L), `GTEX-1HCUA-1626-SM-A9SMG` = c(0L,
0L, 0L, 22L, 0L, 0L, 0L, 0L, 0L, 53L), `GTEX-1KXAM-0226-SM-EV7AP` = c(0L,
0L, 1L, 1L, 0L, 0L, 1L, 0L, 1L, 75L), `GTEX-18A67-1726-SM-7KFT9` = c(0L,
0L, 2L, 1L, 0L, 0L, 0L, 0L, 0L, 73L), `GTEX-14BMU-0726-SM-73KXS` = c(0L,
0L, 0L, 40L, 0L, 0L, 0L, 0L, 0L, 74L), `GTEX-13PVR-0726-SM-5S2PX` = c(0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 54L), `GTEX-1211K-1126-SM-5EGGB` = c(0L,
1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 25L), `GTEX-11TT1-0326-SM-5LUAY` = c(0L,
0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 54L), `GTEX-1KXAM-2426-SM-DIPFC` = c(1L,
0L, 2L, 1L, 0L, 0L, 2L, 0L, 0L, 29L), `GTEX-18A67-0326-SM-7LG5X` = c(0L,
0L, 5L, 4L, 0L, 0L, 2L, 0L, 1L, 91L), `GTEX-14BMU-2026-SM-5S2W6` = c(0L,
0L, 2L, 5L, 0L, 0L, 0L, 0L, 0L, 30L), `GTEX-13PVR-2526-SM-5RQIT` = c(0L,
0L, 2L, 1L, 0L, 0L, 0L, 0L, 0L, 14L), `GTEX-1211K-2126-SM-59HJZ` = c(1L,
0L, 2L, 0L, 0L, 0L, 1L, 0L, 0L, 51L), `GTEX-Y3I4-2326-SM-4TT81` = c(0L,
0L, 3L, 0L, 0L, 0L, 1L, 0L, 0L, 38L), `GTEX-1KXAM-0426-SM-DHXKG` = c(0L,
0L, 1L, 1L, 0L, 0L, 1L, 0L, 0L, 105L), `GTEX-18A67-1126-SM-7KFSB` = c(1L,
0L, 0L, 4L, 0L, 0L, 1L, 0L, 0L, 76L), `GTEX-14BMU-0526-SM-73KW4` = c(0L,
0L, 0L, 11L, 0L, 0L, 0L, 0L, 0L, 53L), `GTEX-1211K-0826-SM-5FQUP` = c(1L,
0L, 0L, 2L, 0L, 0L, 1L, 0L, 0L, 104L), `GTEX-11TT1-1626-SM-5EQL7` = c(0L,
0L, 0L, 1L, 0L, 0L, 1L, 0L, 0L, 113L), `GTEX-ZYFG-0226-SM-5GIDT` = c(1L,
0L, 2L, 2L, 0L, 0L, 2L, 0L, 0L, 54L), `GTEX-1KXAM-0826-SM-CXZK9` = c(0L,
0L, 0L, 5L, 0L, 0L, 2L, 0L, 0L, 97L), `GTEX-18A67-2426-SM-7LT95` = c(1L,
0L, 2L, 0L, 0L, 1L, 3L, 0L, 0L, 69L), `GTEX-14BMU-0926-SM-5S2QB` = c(0L,
0L, 0L, 3L, 0L, 0L, 0L, 0L, 0L, 29L), `GTEX-13PVR-1826-SM-5Q5CC` = c(1L,
0L, 0L, 3L, 0L, 1L, 2L, 0L, 0L, 32L), `GTEX-1211K-0926-SM-5FQTL` = c(0L,
0L, 0L, 3L, 0L, 0L, 1L, 0L, 0L, 99L), `GTEX-11TT1-0526-SM-5P9JO` = c(0L,
1L, 2L, 4L, 0L, 0L, 2L, 0L, 0L, 52L), `GTEX-1KXAM-0726-SM-E9U5I` = c(0L,
0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 45L), `GTEX-18A67-2526-SM-7LG5Z` = c(1L,
0L, 2L, 0L, 0L, 0L, 1L, 0L, 0L, 91L), `GTEX-14BMU-1026-SM-5RQJ5` = c(1L,
0L, 1L, 8L, 0L, 0L, 0L, 0L, 0L, 47L), `GTEX-13PVR-2026-SM-73KXT` = c(0L,
0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 27L), `GTEX-1211K-1326-SM-5FQV2` = c(0L,
0L, 3L, 0L, 0L, 0L, 1L, 1L, 0L, 57L), `GTEX-11TT1-0626-SM-5GU4X` = c(1L,
0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 90L), `GTEX-ZYFG-1826-SM-5GZWX` = c(0L,
0L, 3L, 2L, 0L, 0L, 2L, 0L, 0L, 91L), `GTEX-1KXAM-1926-SM-D3LAG` = c(0L,
0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 103L), `GTEX-18A67-2226-SM-7LT9Z` = c(0L,
0L, 2L, 2L, 0L, 0L, 1L, 0L, 1L, 157L), `GTEX-13PVR-1726-SM-5Q5EC` = c(1L,
0L, 2L, 0L, 0L, 0L, 1L, 0L, 0L, 34L), `GTEX-1211K-1826-SM-5EGJ2` = c(0L,
0L, 1L, 3L, 0L, 0L, 0L, 0L, 0L, 49L), `GTEX-11TT1-0926-SM-5GU5M` = c(0L,
2L, 0L, 3L, 1L, 0L, 0L, 0L, 1L, 49L), `GTEX-1KXAM-1026-SM-CY8IA` = c(0L,
0L, 1L, 3L, 0L, 0L, 0L, 0L, 0L, 93L), `GTEX-14BMU-1626-SM-5TDE7` = c(0L,
1L, 3L, 13L, 0L, 0L, 1L, 0L, 0L, 84L), `GTEX-13PVR-2226-SM-7DHKP` = c(0L,
0L, 2L, 2L, 0L, 0L, 0L, 0L, 0L, 75L), `GTEX-1211K-1926-SM-5EQLB` = c(0L,
1L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 114L), `GTEX-11TT1-2126-SM-5GU5Y` = c(2L,
0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 49L), `GTEX-ZT9W-2026-SM-51MRA` = c(0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 70L), `GTEX-1KXAM-2326-SM-CYPTD` = c(0L,
0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 20L), `GTEX-18A67-0226-SM-7LG67` = c(0L,
0L, 5L, 2L, 0L, 0L, 1L, 0L, 0L, 94L), `GTEX-14BMU-2126-SM-5S2TS` = c(0L,
0L, 0L, 3L, 0L, 0L, 0L, 0L, 0L, 50L), `GTEX-13PVR-2426-SM-5RQHN` = c(0L,
0L, 2L, 0L, 0L, 0L, 0L, 0L, 0L, 59L), `GTEX-1211K-2226-SM-5FQU6` = c(0L,
0L, 0L, 3L, 0L, 0L, 0L, 0L, 0L, 81L), `GTEX-11TT1-2426-SM-5EQMK` = c(0L,
1L, 0L, 2L, 0L, 0L, 0L, 0L, 0L, 60L)), row.names = c("ENSG00000243485",
"ENSG00000237613", "ENSG00000186092", "ENSG00000238009", "ENSG00000222623",
"ENSG00000241599", "ENSG00000236601", "ENSG00000235146", "ENSG00000223181",
"ENSG00000237491"), class = "data.frame")
I need to create a DGEList with only some of the genes: Pancreas and lung genes (if I am right), in order to do the tasks in the image below: Tasks
I need to do a PCA to check if there's separation among male and female genes, and after I need to do a differential expression analysis with the function exactTest(), and since I need a DGEList for exactTest to compare Pancreas sex1 genes with pancreas sex 2 genes, lungsex1-lungsex2 I suppose that I can do both after creating the DGEList.
In the end my problem is that I dont know how to setup the data.
If you need anything else I'll be here, thank you in advance.
PancreasLungDesign=Design[13:30,1:6]
PancreasLungDesign=PancreasLungDesign[-c(7:12),]
Counts2=counts[,13:30]
Counts2= Counts2[,-(7:12)]
rownames(PancreasLungDesign) == colnames(Counts2)
Expressedgenes2=Counts2>=10
NumExpressedgenes2=apply(Expressedgenes2,1,sum)
FilteredCounts2=Counts2[NumExpressedgenes2>0,]
NumExpressedgenes2=apply(Expressedgenes2,1,sum)
FilteredCounts2=Counts2[NumExpressedgenes2>0,]
y2=DGEList(counts=FilteredCounts2, group = PancreasLungDesign$tissue)
y2=calcNormFactors(y2)
apply(cpm(y2,normalized.lib.sizes = T),2,sum)
plotMDS(y2,table(PancreasLungDesign$sex),labels = PancreasLungDesign$tissue,col=rep(c("green","green","blue","blue","blue","green","yellow","yellow","red","red","yellow","red")),cex=0.5,main="Principal component analysis sex specific expression")
I have to identify genes showing sex specific expression in 2 tissues: "pancreas" and "lung".
To do it first of all i need to do a PCA to ascertain whether there is separation between tissues of different sexes (in particular there are 3 individuals of sex 1 and 3 of sex2 for each tissue)
I suppose that i should classify the genes in counts for sex by using the sex column in the Design list and after I should perform a PCA where different colors are assigned to sex 1 and sex 2 genes.
The problem is that even if I know what I should do to perform the PCA (if what i tought is right) I don't know how to write the codes required to do it: how can i create a new dataframe made by only the genes in count that correspond to lung and pancreas rows in Design?
I thought to do in this way in order to color the genes with different colors depending by sex (information shown in Design), if there's a simplier way is well accepted any suggestion.
dput(Design[1:10,]):
Design = structure(list(Individual = c("GTEX-Y5V6", "GTEX-1KXAM", "GTEX-18A67",
"GTEX-14BMU", "GTEX-13PVR", "GTEX-1211K", "GTEX-1KXAM", "GTEX-18A67",
"GTEX-14BMU", "GTEX-1211K"), sex = c(1L, 1L, 1L, 2L, 2L, 2L,
1L, 1L, 2L, 2L), age = c("60-69", "60-69", "50-59", "20-29",
"60-69", "60-69", "60-69", "50-59", "20-29", "60-69"), RNA.quality..max10. = c(7.1,
8.1, 7.2, 7.2, 7.3, 7, 7.2, 7.3, 7.4, 8.2), organ = c("Thyroid",
"Thyroid", "Thyroid", "Thyroid", "Thyroid", "Thyroid", "Stomach",
"Stomach", "Stomach", "Stomach"), tissue = c("Thyroid", "Thyroid",
"Thyroid", "Thyroid", "Thyroid", "Thyroid", "Stomach", "Stomach",
"Stomach", "Stomach")), row.names = c("GTEX-Y5V6-0526-SM-4VBRV",
"GTEX-1KXAM-1726-SM-D3LAE", "GTEX-18A67-0826-SM-7KFTI", "GTEX-14BMU-0226-SM-5S2QA",
"GTEX-13PVR-0626-SM-5S2RC", "GTEX-1211K-0726-SM-5FQUW", "GTEX-1KXAM-0926-SM-CXZKA",
"GTEX-18A67-2626-SM-718AD", "GTEX-14BMU-1126-SM-5RQJ8", "GTEX-1211K-1426-SM-5FQTF"
), class = "data.frame")
dput(counts[1:10,]):
structure(list(`GTEX-Y5V6-0526-SM-4VBRV` = c(0L, 1L, 2L, 1L,
0L, 0L, 0L, 0L, 0L, 214L), `GTEX-1KXAM-1726-SM-D3LAE` = c(0L,
0L, 2L, 0L, 0L, 0L, 0L, 0L, 0L, 205L), `GTEX-18A67-0826-SM-7KFTI` = c(0L,
0L, 2L, 0L, 0L, 0L, 0L, 0L, 0L, 164L), `GTEX-14BMU-0226-SM-5S2QA` = c(0L,
0L, 0L, 12L, 0L, 0L, 0L, 0L, 0L, 108L), `GTEX-13PVR-0626-SM-5S2RC` = c(0L,
0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 100L), `GTEX-1211K-0726-SM-5FQUW` = c(0L,
0L, 0L, 2L, 0L, 0L, 1L, 0L, 0L, 174L), `GTEX-1KXAM-0926-SM-CXZKA` = c(2L,
1L, 2L, 2L, 0L, 0L, 0L, 0L, 0L, 99L), `GTEX-18A67-2626-SM-718AD` = c(7L,
3L, 7L, 2L, 0L, 1L, 5L, 0L, 0L, 116L), `GTEX-14BMU-1126-SM-5RQJ8` = c(0L,
0L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 44L), `GTEX-1211K-1426-SM-5FQTF` = c(4L,
0L, 5L, 2L, 0L, 0L, 0L, 0L, 0L, 143L), `GTEX-11TT1-0726-SM-5GU5A` = c(0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 57L), `GTEX-1HCUA-1626-SM-A9SMG` = c(0L,
0L, 0L, 22L, 0L, 0L, 0L, 0L, 0L, 53L), `GTEX-1KXAM-0226-SM-EV7AP` = c(0L,
0L, 1L, 1L, 0L, 0L, 1L, 0L, 1L, 75L), `GTEX-18A67-1726-SM-7KFT9` = c(0L,
0L, 2L, 1L, 0L, 0L, 0L, 0L, 0L, 73L), `GTEX-14BMU-0726-SM-73KXS` = c(0L,
0L, 0L, 40L, 0L, 0L, 0L, 0L, 0L, 74L), `GTEX-13PVR-0726-SM-5S2PX` = c(0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 54L), `GTEX-1211K-1126-SM-5EGGB` = c(0L,
1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 25L), `GTEX-11TT1-0326-SM-5LUAY` = c(0L,
0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 54L), `GTEX-1KXAM-2426-SM-DIPFC` = c(1L,
0L, 2L, 1L, 0L, 0L, 2L, 0L, 0L, 29L), `GTEX-18A67-0326-SM-7LG5X` = c(0L,
0L, 5L, 4L, 0L, 0L, 2L, 0L, 1L, 91L), `GTEX-14BMU-2026-SM-5S2W6` = c(0L,
0L, 2L, 5L, 0L, 0L, 0L, 0L, 0L, 30L), `GTEX-13PVR-2526-SM-5RQIT` = c(0L,
0L, 2L, 1L, 0L, 0L, 0L, 0L, 0L, 14L), `GTEX-1211K-2126-SM-59HJZ` = c(1L,
0L, 2L, 0L, 0L, 0L, 1L, 0L, 0L, 51L), `GTEX-Y3I4-2326-SM-4TT81` = c(0L,
0L, 3L, 0L, 0L, 0L, 1L, 0L, 0L, 38L), `GTEX-1KXAM-0426-SM-DHXKG` = c(0L,
0L, 1L, 1L, 0L, 0L, 1L, 0L, 0L, 105L), `GTEX-18A67-1126-SM-7KFSB` = c(1L,
0L, 0L, 4L, 0L, 0L, 1L, 0L, 0L, 76L), `GTEX-14BMU-0526-SM-73KW4` = c(0L,
0L, 0L, 11L, 0L, 0L, 0L, 0L, 0L, 53L), `GTEX-1211K-0826-SM-5FQUP` = c(1L,
0L, 0L, 2L, 0L, 0L, 1L, 0L, 0L, 104L), `GTEX-11TT1-1626-SM-5EQL7` = c(0L,
0L, 0L, 1L, 0L, 0L, 1L, 0L, 0L, 113L), `GTEX-ZYFG-0226-SM-5GIDT` = c(1L,
0L, 2L, 2L, 0L, 0L, 2L, 0L, 0L, 54L), `GTEX-1KXAM-0826-SM-CXZK9` = c(0L,
0L, 0L, 5L, 0L, 0L, 2L, 0L, 0L, 97L), `GTEX-18A67-2426-SM-7LT95` = c(1L,
0L, 2L, 0L, 0L, 1L, 3L, 0L, 0L, 69L), `GTEX-14BMU-0926-SM-5S2QB` = c(0L,
0L, 0L, 3L, 0L, 0L, 0L, 0L, 0L, 29L), `GTEX-13PVR-1826-SM-5Q5CC` = c(1L,
0L, 0L, 3L, 0L, 1L, 2L, 0L, 0L, 32L), `GTEX-1211K-0926-SM-5FQTL` = c(0L,
0L, 0L, 3L, 0L, 0L, 1L, 0L, 0L, 99L), `GTEX-11TT1-0526-SM-5P9JO` = c(0L,
1L, 2L, 4L, 0L, 0L, 2L, 0L, 0L, 52L), `GTEX-1KXAM-0726-SM-E9U5I` = c(0L,
0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 45L), `GTEX-18A67-2526-SM-7LG5Z` = c(1L,
0L, 2L, 0L, 0L, 0L, 1L, 0L, 0L, 91L), `GTEX-14BMU-1026-SM-5RQJ5` = c(1L,
0L, 1L, 8L, 0L, 0L, 0L, 0L, 0L, 47L), `GTEX-13PVR-2026-SM-73KXT` = c(0L,
0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 27L), `GTEX-1211K-1326-SM-5FQV2` = c(0L,
0L, 3L, 0L, 0L, 0L, 1L, 1L, 0L, 57L), `GTEX-11TT1-0626-SM-5GU4X` = c(1L,
0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 90L), `GTEX-ZYFG-1826-SM-5GZWX` = c(0L,
0L, 3L, 2L, 0L, 0L, 2L, 0L, 0L, 91L), `GTEX-1KXAM-1926-SM-D3LAG` = c(0L,
0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 103L), `GTEX-18A67-2226-SM-7LT9Z` = c(0L,
0L, 2L, 2L, 0L, 0L, 1L, 0L, 1L, 157L), `GTEX-13PVR-1726-SM-5Q5EC` = c(1L,
0L, 2L, 0L, 0L, 0L, 1L, 0L, 0L, 34L), `GTEX-1211K-1826-SM-5EGJ2` = c(0L,
0L, 1L, 3L, 0L, 0L, 0L, 0L, 0L, 49L), `GTEX-11TT1-0926-SM-5GU5M` = c(0L,
2L, 0L, 3L, 1L, 0L, 0L, 0L, 1L, 49L), `GTEX-1KXAM-1026-SM-CY8IA` = c(0L,
0L, 1L, 3L, 0L, 0L, 0L, 0L, 0L, 93L), `GTEX-14BMU-1626-SM-5TDE7` = c(0L,
1L, 3L, 13L, 0L, 0L, 1L, 0L, 0L, 84L), `GTEX-13PVR-2226-SM-7DHKP` = c(0L,
0L, 2L, 2L, 0L, 0L, 0L, 0L, 0L, 75L), `GTEX-1211K-1926-SM-5EQLB` = c(0L,
1L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 114L), `GTEX-11TT1-2126-SM-5GU5Y` = c(2L,
0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 49L), `GTEX-ZT9W-2026-SM-51MRA` = c(0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 70L), `GTEX-1KXAM-2326-SM-CYPTD` = c(0L,
0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 20L), `GTEX-18A67-0226-SM-7LG67` = c(0L,
0L, 5L, 2L, 0L, 0L, 1L, 0L, 0L, 94L), `GTEX-14BMU-2126-SM-5S2TS` = c(0L,
0L, 0L, 3L, 0L, 0L, 0L, 0L, 0L, 50L), `GTEX-13PVR-2426-SM-5RQHN` = c(0L,
0L, 2L, 0L, 0L, 0L, 0L, 0L, 0L, 59L), `GTEX-1211K-2226-SM-5FQU6` = c(0L,
0L, 0L, 3L, 0L, 0L, 0L, 0L, 0L, 81L), `GTEX-11TT1-2426-SM-5EQMK` = c(0L,
1L, 0L, 2L, 0L, 0L, 0L, 0L, 0L, 60L)), row.names = c("ENSG00000243485",
"ENSG00000237613", "ENSG00000186092", "ENSG00000238009", "ENSG00000222623",
"ENSG00000241599", "ENSG00000236601", "ENSG00000235146", "ENSG00000223181",
"ENSG00000237491"), class = "data.frame")
I am trying to use varying alternatives for each person. However not able to get it working. If I make the alternatives same for each person, it works fine. How to make it varying and work.
Data :
> dput( df1 )
structure(list(Choice = c(1L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 1L,
0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 0L, 0L,
1L, 0L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 0L, 0L, 0L, 1L), A = c(0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, -1L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 1L, 0L, 0L, -1L, 0L, 0L, 1L, 0L, 0L, -1L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L), B = c(0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, -1L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, -1L, 0L, 1L, 0L, 0L, -1L, 0L,
0L), C = c(1L, 0L, 0L, 0L, -1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, -1L, 0L, 0L, 0L,
1L, 0L, 0L, -1L, 0L, 0L, 0L, 0L, 0L, 0L), D = c(0L, 1L, 0L, 0L,
0L, -1L, 0L, 0L, 0L, 1L, 0L, 0L, -1L, 0L, 0L, 1L, 0L, 0L, -1L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L), E = c(0L, 0L, 1L, 0L, 0L, 0L, -1L, 0L, 0L, 0L, 1L,
0L, 0L, -1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, -1L, 0L), F = c(0L, 0L,
0L, 1L, 0L, 0L, 0L, -1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L,
0L, 0L, -1L, 0L, 0L, 1L, 0L, 0L, -1L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 1L, 0L, 0L, -1L), Alternative = c(1L, 2L, 3L, 4L, 1L,
2L, 3L, 4L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L,
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L,
3L)), row.names = c(NA, -38L), class = "data.frame")
Code :
model = mlogit( Choice ~ B + C + D + E + F | 0, data = df1,
alt.levels = unique( df1$Alternative ),
shape = "long")
Error
Error in dfidx::dfidx(data = data, dfa$idx, drop.index = dfa$drop.index, :
the data must be balanced in order to use the levels argument
You need to provide mlogit with an explicit ID variable denoting which participant made the choice. It can't infer them from the data.frame you've provided.
I'm assuming in your reproducible example that the alternatives in rows running sequentially from [1 - 4] or [1 - 3] represent the choice sets presented to a unique individual. If so, then you can fit a model like so:
library(mlogit)
# Explicitly create an ID variable
df1$ID <- rep(1:12, times = c(rep(4, 2), rep(3, 10)))
#Convert to dfidx data
dfx1 <- mlogit.data(df1,
shape = "long",
choice = "Choice",
id.var = "ID")
# Fit a model
m0 <- mlogit(Choice ~ B + C + D + E + F | 0,
data = dfx1)
I have a table created from ftable()
structure(c(1L, 0L, 0L, 0L, 1L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 1L, 2L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L,
1L, 1L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L,
0L, 1L, 0L, 2L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L,
0L, 1L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 0L,
0L, 0L, 1L, 0L, 0L, 0L, 0L), .Dim = c(12L, 7L), class = "ftable", row.vars = list(
ï..petal_size = c("large ", "small", "small "), stem_length = c("long",
"long ", "short", "short ")), col.vars = list(flow_color = c("blue",
"green", "indigo ", "orange", "red ", "violet", "yellow")))
I would like to export it using htmlTable, but when I use htmlTableon this i get this result with no factors and just numbers like in the picture here
How do I recover the factor names for the htmltable? Please note the final output should have the same number of rows and columns as the picture's output, but it needs to have the factor names on the rows and columns.
I will convert it first to data.frame and the add the necessary tweaks to obtain the desired output:
tableToHtml <-structure(c(1L, 0L, 0L, 0L, 1L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 1L, 2L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L,
1L, 1L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L,
0L, 1L, 0L, 2L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L,
0L, 1L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 0L,
0L, 0L, 1L, 0L, 0L, 0L, 0L), .Dim = c(12L, 7L), class = "ftable", row.vars = list(
ï..petal_size = c("large ", "small", "small "), stem_length = c("long",
"long ", "short", "short ")), col.vars = list(flow_color = c("blue",
"green", "indigo ", "orange", "red ", "violet", "yellow")))
library(htmlTable)
htmlTable(as.data.frame(tableToHtml),rnames=F, header=rep("", length(colnames(as.data.frame(tableToHtml)))))
We have a species presence table (so binary: 1=present, 0=absent). When using metaMDS of the vegan package, it produces a horizontal distribution of our data when plotted, instead of clusters.
We tried using different distance methods (Euclidean, Bray, Jaccard), but they all seem to produce the same plot.
myfungi.all looks like this:
structure(list(Sample = 1:12, Habitat = structure(c(1L, 1L, 1L,
1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("Dune", "Forest"
), class = "factor"), OTU88 = c(0L, 0L, 1L, 1L, 0L, 0L, 0L, 0L,
1L, 1L, 1L, 1L), OTU28 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L), OTU165 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L), OTU178 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L,
0L), OTU97 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L
), OTU39 = c(0L, 0L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L),
OTU104 = c(1L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L
), OTU95 = c(0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 0L,
0L), OTU90 = c(1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L), OTU119 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L), OTU451 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L,
0L), OTU98 = c(1L, 1L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L), OTU45 = c(0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L,
1L), OTU2 = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, 1L, 1L, 1L,
1L), OTU24 = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L), OTU169 = c(0L, 0L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L), OTU29 = c(1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L), OTU85 = c(0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L,
0L), OTU140 = c(1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 0L,
0L), OTU42 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L,
0L), OTU70 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L,
0L), OTU25 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L), OTU34 = c(1L, 1L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L,
1L), OTU181 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L), OTU201 = c(1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L), OTU17 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L), OTU1146 = c(0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L,
1L, 1L), OTU14 = c(0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 1L,
1L, 1L), OTU72 = c(0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L,
0L, 0L), OTU13 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L,
1L, 1L), OTU20 = c(0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L,
1L, 1L), OTU63 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L), OTU170 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L), OTU262 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L), OTU48 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L), OTU6 = c(0L, 0L, 0L, 1L, 0L, 0L, 1L, 1L, 0L, 0L,
0L, 0L), OTU3 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L,
1L, 1L), OTU31 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L), OTU73 = c(1L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 1L, 1L,
0L, 0L), OTU32 = c(0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 0L, 0L,
0L, 0L), OTU37 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L), OTU196 = c(0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 0L, 0L,
0L, 0L), OTU5 = c(1L, 1L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L,
0L, 0L), OTU11 = c(0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L,
0L, 1L), OTU16 = c(0L, 0L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L,
0L, 0L), OTU41 = c(0L, 0L, 0L, 1L, 1L, 1L, 0L, 0L, 0L, 0L,
0L, 0L), OTU71 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L), OTU109 = c(0L, 0L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L,
0L, 0L), OTU233 = c(0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L)), class = "data.frame", row.names = c(NA, -12L))
Our script looks like this:
myfungi.all = read.csv("soil_fungi.csv",header=T)
myfungi = myfungi.all[,c(3:51)]
myfungi.nmds.bc <- metaMDS(myfungi, distance = "bray", k = 2, binary = TRUE)
plot(myfungi.nmds.bc, type="t", main=paste("NMDS/Bray-Curtis -?? Stress =", round(myfungi.nmds.bc$stress,10)))
Does anyone have suggestions as what seems to be the problem?
At the moment our plot looks like this:
The solution you reported gives a perfect fit (stress nearly 0), and also gives a warning because of this dubious stress. The solution effectively puts your sampling units into two points so that you have absolutely dichotomous data. As Ben Bolker demonstrated, Principal Coordinates Analysis, PCoA (which you also can perform with stats::cmdscale, vegan::wcmdscale or vegan::dbrda) still has points in two major cluster, but spreads points within these clusters. PCoA is a linear method, but NMDS is non-linear and therefore often needs more data. It seems that in this case the weak ties (read the documentation ?monoMDS or Kruskal's papers cited in that documentation) is the stage that puts most demand on the data, and setting weakties = FALSE will prevent collapsing non-identical observations into two points:
m3 <- metaMDS(myfungi, weakties = FALSE)
m3 # stress 0.04124
stressplot(m3) # compare this to your result stressplot(myfungi.nmds.bc)
plot(m3)
The default monoMDS with weakties = TRUE (like Kruskal recommended) will consider the dichotomy of two groups as the only important non-linear difference, but with weakties = FALSE the solutions cannot proceed to zero stress. You still have a dichotomy, but with scatter.
Best guess is that you simply don't have enough data to distinguish two separate environmental axes: when I run your code I get
Warning message: In metaMDS(myfungi[, -(1:2)], distance = "bray", k = 2, binary = TRUE) : stress is (nearly) zero: you may have insufficient data
Out of your 53 species, only 35 are informative (the others appear either at none or at all of the sites):
m2 <- myfungi[,apply(myfungi,2,var)>0]
ncol(m2) ## 35
vv <- function(x) (image(Matrix(as.matrix(x))))
How many distinct distribution patterns are there?
nrow(unique(t(m2))) ## 27
You could try PCoA instead:
library(ape)
biplot(pcoa(vegdist(m2,"bray"))
As Jari Oksanen points out, you could also do this with cmdscale() in base R:
plot(cmdscale(vegdist(mm,"bray")),
col=as.numeric(myfungi$Habitat))