Related
I'm analyzing a survey and I need to do an interaction.plot() between variable disclosure_1 and TYPE_1 to see how they affect a third variable ADTRUST.
The survey randomly showed a different scenario to each participant. DISCLOSURE_1 is the code to indicate the type of disclosure that was shown to a respondent (B = Before, D = During, A = After, N = None).
TYPE_1 indicates the terminology used (DF = Deepfake, SM = Sythetic Media).
When creating the survey I dumbly only used DF for N because I thought there was no need to create an SM scenario if there was no disclosure shown (so no difference in terminology used). It still makes no sense logically, but when plotting the interaction plot the variable N does not appear. And since the study wants to analyze:
how disclosure impacts ADTRUST
how disclosure positioning impacts ADTRUST
how different terminology used in disclosure impact ADTRUST
I need to randomly substitute 50% of the observations with SM instead of DF under TYPE_1 but ONLY if the column DISCLOSURE_1 is == N.
I have no clue how to do that. Could somebody please help?
NOTE!!!! The structure is part of a bigger dataset. the dput was only done only for [25:26], so keep in mind I need to be able to precisely select the columns in the code.
Thank youu
structure(list(DISCLOSURE_1 = structure(c(4L, 3L, 1L, 3L, 1L,
1L, 4L, 3L, 4L, 3L, 4L, 1L, 3L, 3L, 4L, 1L, 3L, 3L, 1L, 3L, 3L,
4L, 2L, 1L, 1L, 4L, 2L, 3L, 3L, 1L, 4L, 1L, 1L, 4L, 4L, 1L, 3L,
2L, 2L, 1L, 4L, 1L, 1L, 1L, 4L, 3L, 4L, 3L, 2L, 3L, 1L, 1L, 3L,
1L, 1L, 2L, 1L, 2L, 2L, 2L, 3L, 2L, 1L, 3L, 3L, 3L, 3L, 2L, 2L,
3L, 3L, 2L, 1L, 2L, 3L, 1L, 2L, 2L, 1L, 1L, 1L, 3L, 2L, 2L, 3L,
3L, 2L, 3L, 3L, 2L, 1L, 3L, 1L, 1L, 4L, 4L, 1L, 2L, 4L, 3L, 1L,
1L, 1L, 3L, 2L, 3L, 1L, 2L, 2L, 3L, 4L, 2L, 4L, 2L, 3L, 3L, 2L,
4L, 4L, 3L, 4L, 1L, 1L, 3L, 3L, 1L, 3L, 2L, 3L, 3L, 1L, 2L, 1L,
4L, 1L, 2L, 3L, 3L, 1L, 4L, 2L, 3L, 2L, 1L, 1L, 2L, 2L, 1L, 4L,
2L, 4L, 1L, 4L, 2L, 1L, 1L, 1L, 3L, 2L, 3L, 2L, 4L, 3L, 1L, 3L,
1L, 1L, 3L, 2L, 3L, 2L, 4L, 3L, 3L, 1L, 1L, 3L, 2L, 2L, 1L, 1L,
3L, 3L, 4L, 2L, 2L, 3L, 3L, 1L, 2L, 1L, 2L, 3L, 2L, 3L, 3L, 3L,
3L, 4L, 3L, 2L, 1L, 1L, 2L, 2L, 2L, 1L, 4L, 1L, 2L, 2L, 3L, 3L,
1L, 4L, 4L, 1L, 2L, 4L, 2L, 1L, 1L, 4L, 1L, 1L, 2L, 1L, 2L, 2L,
3L, 3L, 3L, 3L, 2L, 1L, 4L, 1L, 1L, 2L, 2L, 4L, 2L, 3L, 1L, 2L,
3L, 3L, 2L, 4L, 3L, 2L, 2L, 4L, 4L, 2L, 1L, 1L, 2L, 2L, 3L, 1L,
1L, 4L, 3L, 1L, 3L, 3L, 3L, 2L, 2L, 2L, 4L, 2L, 4L, 4L, 4L, 1L,
3L, 1L, 3L, 1L, 3L, 1L, 2L, 4L, 3L, 2L, 2L, 2L, 1L, 2L, 2L, 3L,
2L, 1L, 2L, 4L, 1L, 2L, 3L, 2L, 1L, 2L, 4L, 4L, 2L, 3L, 2L, 1L,
2L, 2L, 2L, 4L, 2L, 1L, 3L, 1L, 3L, 3L, 4L, 1L, 2L, 3L, 2L, 2L,
3L, 4L, 3L, 2L, 3L, 3L, 2L, 1L, 1L, 3L, 3L, 3L, 3L, 1L, 2L, 2L,
1L, 2L, 2L, 1L, 1L, 3L, 3L, 1L, 2L, 1L, 1L, 3L, 2L, 1L, 2L, 3L,
2L, 3L, 1L, 2L, 3L, 4L, 2L, 1L, 1L, 3L), .Label = c("A", "B",
"D", "N"), class = "factor"), TYPE_1 = structure(c(1L, 1L, 2L,
1L, 2L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 2L, 1L, 1L, 1L, 2L, 2L, 1L,
2L, 2L, 1L, 1L, 2L, 2L, 1L, 2L, 1L, 2L, 2L, 1L, 2L, 1L, 1L, 1L,
2L, 2L, 1L, 1L, 2L, 1L, 2L, 2L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L,
2L, 1L, 1L, 2L, 1L, 1L, 2L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L,
2L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 1L, 1L, 1L, 1L, 2L, 2L,
1L, 1L, 1L, 2L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 1L,
2L, 1L, 2L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L,
2L, 2L, 1L, 1L, 2L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 2L,
1L, 1L, 1L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 2L, 2L, 1L,
2L, 1L, 1L, 1L, 2L, 1L, 2L, 1L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L,
1L, 2L, 2L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 2L,
2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 2L, 1L,
2L, 2L, 1L, 1L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 2L, 1L, 2L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 2L, 1L, 2L, 1L,
2L, 1L, 2L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 2L,
2L, 2L, 1L, 1L, 2L, 1L, 1L, 1L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L,
1L, 1L, 1L, 1L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 2L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 1L, 2L, 2L, 1L,
2L, 1L, 1L, 1L, 2L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 2L,
1L, 1L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 1L, 2L, 2L, 1L, 1L, 1L, 1L,
2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 2L, 1L, 2L, 1L,
2L, 2L, 1L, 2L, 2L, 1L, 2L, 1L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L,
2L, 1L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 2L), .Label = c("DF",
"SM"), class = "factor")), row.names = c(NA, -367L), class = c("tbl_df",
"tbl", "data.frame"))
With data as your data.frame, this will replace exactly half (rounded down) of the N's with DF with SM:
blnN <- data$DISCLOSURE_1 == "N" & data$TYPE_1 == "DF"
data$TYPE_1[sample(which(blnN), sum(blnN)/2)] <- "SM"
If the 50% requirement is approximate, you can use runif() > 0.5
library(dplyr)
table(df)
TYPE_1
DISCLOSURE_1 DF SM
A 52 51
B 52 53
D 55 51
N 53 0
mut <- df |>
mutate(TYPE_1 = ifelse(DISCLOSURE_1 == "N" &
TYPE_1 == "DF" &
runif(n()) > 0.5,
"SM",
as.character(TYPE_1)))
table(mut)
TYPE_1
DISCLOSURE_1 DF SM
A 52 51
B 52 53
D 55 51
N 27 26
I am making a map of a dataset with 20 million data points in ggplot. A single map (with facetting) takes 10-15 mins, so I checked if using multiple cores in parallel mode would work better. Using foreach, maps took more time to make. Some foreach runs were running for 30min-1 hour without producing any results. I know that the answer to this question (Why is the parallel package slower than just using apply?) shows that in parallel computation, sometimes it takes more time to combine the result from the separate parallel processes than running the task itself. But I saw some run examples when I was searching that minute-runs can be improved. Do you think it is possible?
Due to the sheer amount of data I have, I sampled my dataset:
structure(list(lat = c(46.791667, 52.958333, 57.375, 62.625,
74.041667, 60.208333, 30.208333, 56.791667, 57.375, 40.958333,
56.541667, 38.958333, 35.291667, 43.625, 71.375, 66.875, 74.375,
66.458333, 47.791667, 48.041667, 41.541667, 40.875, 57.208333,
64.375, 42.625, 43.958333, 69.958333, 72.375, 36.875, 66.958333,
39.791667, 36.625, 52.625, 65.708333, 42.208333, 53.708333, 35.458333,
58.625, 34.875, 57.291667, 59.708333, 61.708333, 72.041667, 59.958333,
32.208333, 43.625, 39.541667, 62.625, 41.208333, 32.291667, 48.958333,
47.291667, 60.375, 49.458333, 37.208333, 65.708333, 57.958333,
31.041667, 63.875, 43.625, 54.541667, 55.541667, 45.458333, 72.375,
54.708333, 37.958333, 32.375, 60.125, 59.041667, 37.875, 42.958333,
44.375, 59.791667, 49.208333, 34.375, 53.208333, 59.458333, 53.375,
45.458333, 72.125, 66.208333, 60.958333, 47.625, 60.291667, 41.125,
67.541667, 54.625, 55.541667, 37.541667, 44.291667, 44.458333,
40.041667, 49.458333, 39.625, 73.375, 41.458333, 71.375, 31.041667,
66.791667, 42.541667), lon = c(-6.125, -19.541667, -29.291667,
-9.2083333, -11.541667, -6.625, -25.708333, -14.458333, -48.291667,
-63.541667, -41.291667, -12.541667, -48.291667, -58.708333, 6.625,
-2.7083333, -69.375, -19.291667, -27.208333, -36.625, -17.791667,
-50.541667, -38.708333, 9.375, -56.208333, -44.958333, -59.041667,
8.875, -21.125, -24.791667, -40.375, -26.208333, -31.875, -11.875,
-60.958333, -39.125, -32.458333, -54.791667, -44.541667, -37.958333,
-48.625, -10.541667, 3.5416667, -17.791667, -16.041667, -9.9583333,
-32.708333, 0.875, -18.625, -54.208333, -63.125, -56.458333,
-55.125, -22.708333, -40.958333, -56.208333, -33.625, -69.125,
-58.125, -17.541667, -23.541667, -17.041667, -18.458333, -64.791667,
-25.208333, -35.875, -44.791667, -11.291667, -58.291667, -46.208333,
-41.208333, -5.0416667, -38.208333, -13.875, -55.291667, -17.291667,
-48.625, -38.791667, -59.375, -19.291667, 5.5416667, -19.625,
-41.375, -66.291667, -17.625, -14.208333, -39.291667, -48.875,
-16.541667, -21.375, -46.375, 2.625, -60.291667, -40.375, 2.4583333,
-16.458333, 4.875, -66.291667, -4.2916667, -36.458333), entity = structure(c(2L,
2L, 1L, 1L, 2L, 2L, 1L, 2L, 2L, 1L, 2L, 1L, 2L, 2L, 1L, 2L, 2L,
2L, 2L, 1L, 2L, 2L, 2L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 2L, 1L, 2L,
1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 1L, 2L, 2L, 1L, 2L, 1L, 2L,
1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L,
2L, 1L, 2L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 2L, 1L,
2L, 1L, 1L, 2L, 2L, 1L, 2L, 1L, 2L, 1L, 1L, 2L, 1L, 1L, 1L, 1L,
2L, 1L, 1L), .Label = c("Cc", "Li"), class = "factor"), watcon = structure(c(1L,
2L, 2L, 1L, 1L, 2L, 2L, 2L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 2L, 1L,
2L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 2L, 1L, 2L, 2L, 1L, 1L, 1L, 2L,
2L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 1L, 2L, 1L, 2L, 2L, 2L, 2L, 1L,
1L, 2L, 1L, 1L, 1L, 2L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 1L,
1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 1L, 2L, 1L, 2L, 1L,
1L, 1L, 2L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 1L, 1L, 2L, 1L,
2L, 2L, 2L), .Label = c("calm", "stormy"), class = "factor"),
step = structure(c(1L, 3L, 2L, 4L, 4L, 2L, 3L, 3L, 3L, 4L,
2L, 2L, 3L, 2L, 1L, 3L, 3L, 3L, 4L, 4L, 4L, 1L, 1L, 1L, 2L,
4L, 1L, 2L, 2L, 4L, 2L, 4L, 4L, 3L, 4L, 2L, 2L, 1L, 2L, 2L,
1L, 4L, 3L, 2L, 3L, 3L, 2L, 4L, 1L, 3L, 2L, 2L, 3L, 4L, 4L,
2L, 2L, 3L, 4L, 1L, 3L, 2L, 1L, 4L, 2L, 3L, 1L, 3L, 1L, 2L,
1L, 4L, 4L, 3L, 1L, 1L, 1L, 3L, 1L, 2L, 2L, 4L, 3L, 4L, 4L,
4L, 1L, 2L, 1L, 4L, 3L, 4L, 2L, 4L, 1L, 4L, 1L, 2L, 2L, 4L
), .Label = c("abundance", "enccomb", "adscomb", "infcomb"
), class = "factor")), row.names = c(NA, -100L), class = "data.frame")
The code I am using is:
library(oceanmap)
library(sf)
library(ggmap)
library(rnaturalearth)
library(rnaturalearthdata)
library(rgeos)
library(tidyverse)
world <- ne_countries(scale = "medium", returnclass = "sf")
library(doParallel)
library(foreach)
cl <- makeCluster(5)
doParallel::registerDoParallel(cl)
entities <- unique(sample$entity)
foreach(i=1:length(entities), .packages = c("tidyverse", "dplyr")) %dopar% {
ggplot (data=world) + geom_sf(color="white", fill="white") + coord_sf(xlim = c(-34,-30), ylim = c(53,62) , expand = FALSE) + geom_point(subset(sample, sample$entity==entities[i]), mapping=aes(x = lon, y = lat, color = log10(value))) + facet_grid(watcon~step2) +
ggtitle(entities[i])
ggsave(filename = paste0(i,".png"))
}
stopCluster(cl)
P.S. I am using ggplot because of the opportunities for editing the map itself. Lattice is faster but lacks the customization that I am looking for.
Any help on how to improve my foreach code or if it's even possible is greatly appreciated!
I have very little experience with R and am trying to make a stacked barplot using ggplot2.
I have 2 groups - control and experimental, and 2 choices - red and green. I'm not sure how to organise my data.
There were 80 animals in my trial (control n=40, experimental n=40) and they were given the choice of red and green substrate, I noted which substrate they chose, and that's the data I'm trying to plot.
I would essentially want 'Experimental' and 'Control on the x-axis, and the number of choices on the y-axis (e.g. Control, Red n=20, Control, Green = 12 etc).
Any help would be appreciated!
Edited to add:
This is the graph it's outputting
This is the code I'm using (including suggested adjustments):
df <- data.frame(group = rep(c("control", "experimental"), each = 40),
substrate = sample (c("red","green"), 80, TRUE))
ggplot(df, aes(x = group, y = substrate, fill = substrate)) +
geom_bar(stat = "identity") +
scale_fill_manual(values = c("red", "green"))
This is the output:
structure(list(group = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("control", "experimental"
), class = "factor"), substrate = structure(c(1L, 2L, 1L, 2L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 2L, 1L, 1L, 1L,
2L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 1L, 2L, 1L, 1L,
2L, 1L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 2L, 2L, 1L, 2L, 1L, 1L,
1L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 1L,
1L, 1L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 2L), .Label = c("green",
"red"), class = "factor")), class = "data.frame", row.names = c(NA,
-80L))
output from df(behaviour) - original dataframe
structure(list(group = structure(c(2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("Control", "Experimental"
), class = "factor"), substrate = structure(c(1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("Green",
"Red"), class = "factor")), class = "data.frame", row.names = c(NA,
-80L))
Your data:
behaviour=structure(list(group = structure(c(2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("Control", "Experimental"
), class = "factor"), substrate = structure(c(1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("Green",
"Red"), class = "factor")), class = "data.frame", row.names = c(NA,
-80L))
We can tabulate your data:
table(behaviour$group,behaviour$substrate)
Green Red
Control 10 30
Experimental 27 13
So you can only specify fill or y with geom_bar. In your case, you specify the fill, the geom_bar() function will do the counting for you:
ggplot(behaviour,aes(x=group,fill=substrate))+
geom_bar() + scale_fill_manual(values=c("#29c7ac","#c02739"))
You could have your data like this, with one row for each observation (i.e. each animal), with the group and the substrate recorded for each:
df <- data.frame(group = rep(c("control", "experimental"), each = 40),
substrate = rep(c("green", "red", "green", "red"), c(10, 30, 27, 13)))
Now define your plot using ggplot, specifying group as your x axis, and ..count.. as your y axis. Use geom_bar to get the stacked bars you are looking for, and finally use scale_fill_manual to set the colours:
library(ggplot2)
ggplot(df, aes(x = group, y = ..count.., fill = substrate)) +
geom_bar(colour = "black") +
scale_fill_manual(values = c("green", "red"))
When I make a barplot with significance letters from anova above the bars, I use following code:
anova_NDW_geel<-aov(nodule_dry_weight~treatment,inoculatieproef_geel_variety2)
HSD_NDW_geel <- HSD.test(anova_NDW_geel,"treatment",alpha=0.05,group=TRUE)$groups
HSD_NDW_means_geel <- HSD.test(anova_NDW_geel,"treatment",alpha=0.05,group=TRUE)$means
HSD_NDW_means_geel <- HSD_NDW_means_geel[order(-HSD_NDW_means_geel$nodule_dry_weight),]
p_HSD_NDW_geel <- ggplot(aes(x=treatment, y=NDW_mean_geel, width=0.6), data=inoculatieproef_mean_geel)+
geom_bar(stat="identity", data=HSD_NDW_geel, aes(x=trt, y=means), fill="gray40")+
geom_text(data=HSD_NDW_geel, aes(x=trt, y=means, label=M), size=5, vjust=-1, hjust=1)+
ggtitle("Zand")+
ylab("Droog gewicht wortelknolletjes (g)")+
xlab("Behandeling")+
geom_errorbar(aes(ymin=NDW_mean_geel-NDW_sd_geel,ymax=NDW_mean_geel+NDW_sd_geel),
position=position_dodge(width=0.5),width=0.1,size=0.3)+
theme_bw() +
theme(axis.line = element_line(colour="black"),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
panel.border = element_blank(),
panel.background = element_blank())+
scale_y_continuous(expand = c(0, 0))+
theme(axis.text.x = element_text(angle = 0, hjust = 1, vjust = 0.5))+
theme(text = element_text(size=12))
which results in following graph: http://i.stack.imgur.com/bZidZ.png
This is probably not the best way to do this and when I want to add the letters to the barplots with facet wrap.
Here is a sample of the data I want to make a facet wrap with significance letters with:
structure(list(treatment = structure(c(1L, 2L, 3L, 4L, 5L, 6L,
7L, 8L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 1L, 2L, 3L, 4L, 5L, 6L,
7L, 8L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 1L, 2L, 3L, 4L, 5L, 6L,
7L, 8L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 1L, 2L, 3L, 4L, 5L, 6L,
7L, 8L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 1L, 2L, 3L, 4L, 5L, 6L,
7L, 8L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 1L, 2L, 3L, 4L, 5L, 6L,
7L, 8L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 1L, 2L, 3L, 4L, 5L, 6L,
7L, 8L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 1L, 2L, 3L, 4L, 5L, 6L,
7L, 8L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 1L, 2L, 3L, 4L, 5L, 6L,
7L, 8L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 1L, 2L, 3L, 4L, 5L, 6L,
7L, 8L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 1L, 2L, 3L, 4L, 5L, 6L,
7L, 8L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 1L, 2L, 3L, 4L, 5L, 6L,
7L, 8L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 1L, 2L, 3L, 4L, 5L, 6L,
7L, 8L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 1L, 2L, 3L, 4L, 5L, 6L,
7L, 8L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 1L, 2L, 3L, 4L, 5L, 6L,
7L, 8L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 1L, 2L, 3L, 4L, 5L, 6L,
7L, 8L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L), .Label = c("1", "2",
"3", "4", "5", "6", "7", "8"), class = "factor"), block = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L,
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L,
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L,
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L,
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L), .Label = c("I",
"II", "III", "IV"), class = "factor"), position = structure(c(2L,
1L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 1L, 2L, 2L, 2L, 2L, 1L,
2L, 1L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 1L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 1L, 2L, 2L,
2L, 2L, 2L, 2L, 1L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 1L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("b",
"gem(ab)"), class = "factor"), variety = structure(c(1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("1",
"2"), class = "factor"), location = structure(c(2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("Geel",
"Merelbeke"), class = "factor"), year = structure(c(1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("2014",
"2015"), class = "factor"), nodule_dry_weight = c(0, 0.0467,
0.0328, 0.0885, 0.0081, 0.1086, 0.0788, 0.0267, 0, 0.0128, 0.0143,
0.0333, 0.006, 0.098, 0.0286, 0.011, 0, 0.0627, 0.0769, 0.0784,
0.023, 0.1504, 0.1026, 0.0254, 0, 0.0597, 0.0158, 0.0354, 0.0226,
0.3261, 0.0436, 0, 0, 0.0203, 0.0469, 0.0904, 0.1593, 0.0836,
0.056, 0.0037, 0, 0.0534, 0.0901, 0.0435, 0.0248, 0.0435, 0.0279,
0.0029, 0, 0.0545, 0.038, 0.0991, 0.0099, 0.1453, 0.1096, 0.0272,
0, 0.0319, 0.0624, 0.0508, 0.0415, 0.11, 0.0079, 0, 0, 0.1257,
0.1242, 0.2899, 0.024, 0.2175, 0.2979, 0.0396, 0, 0.1583, 0.2935,
0.2541, 0.1027, 0.4196, 0.2059, 0.0396, 0, 0.0891, 0.167, 0.0907,
0.2153, 0.3063, 0.2921, 0.0528, 0, 0.0928, 0.2109, 0.1514, 0.0821,
0.3607, 0.0996, 0.0069, 0, 0.0685, 0.3109, 0.1862, 0.0393, 0.286,
0.3418, 0.0459, 0, 0.0765, 0.3486, 0.3988, 0.1155, 0.6341, 0.3653,
0.039, 0, 0.0766, 0.3112, 0.1988, 0.05, 0.2856, 0.34, 0.0862,
0, 0.2621, 0.1146, 0.393, 0.1644, 0.3415, 0.1343, 0.019, 0, 0.0976,
0.1853, 0.0691, 0.0248, 0.1764, 0.1244, 0.1525, 0, 0.1529, 0.1069,
0.2833, 0.0204, 0.2966, 0.2371, 0.1464, 0, 0.0691, 0.2094, 0.1633,
0.0264, 0.1344, 0.0694, 0.1175, 0, 0.1783, 0.1434, 0.2136, 0.0873,
0.19, 0.1683, 0.1927, 0, 0.0571, 0.0599, 0.1061, 0.0244, 0.1256,
0.0894, 0.0123, 0, 0.1696, 0.1046, 0.2164, 0.0939, 0.1552, 0.2942,
0.1652, 0, 0.0844, 0.102, 0.0227, 0.025, 0.0654, 0.1234, 0.0702,
0, 0.0979, 0.1246, 0.0958, 0.0867, 0.1104, 0.1969, 0.227, 0,
0.3704, 0.4727, 0.2527, 0.2078, 0.3377, 0.308, 0.1293, 0, 0.2417,
0.3744, 0.2916, 0.1773, 0.433, 0.2446, 0.1382, 0, 0.4718, 0.4271,
0.4882, 0.1799, 0.4178, 0.518, 0.3915, 0, 0.3421, 0.3804, 0.2112,
0.4292, 0.3829, 0.1315, 0.2719, 0, 0.3197, 0.6867, 0.414, 0.3112,
0.2914, 0.4994, 0.369, 0.0256, 0.1494, 0.5577, 0.2538, 0.3854,
0.4151, 0.544, 0.4009, 0, 0.5208, 0.2962, 0.4175, 0.2689, 0.3374,
0.5075, 0.3601, 0, 0.704, 0.4631, 0.4573, 0.154, 0.5087, 0.4319,
0.4155)), .Names = c("treatment", "block", "position", "variety",
"location", "year", "nodule_dry_weight"), row.names = c(NA, -256L
), class = "data.frame")
I use following code for my graph with facet wrap:
inoculatieproef <- inoculatieproef %>%
group_by(treatment, location, variety, year) %>%
mutate(NDW_mean = mean(nodule_dry_weight),
NDW_sd = sd(nodule_dry_weight))
ggplot(data=inoculatieproef,aes(x=treatment, y=NDW_mean))+
facet_wrap(~location*variety*year,ncol=2)+
geom_bar(position="dodge", stat="identity")+
geom_errorbar(aes(ymin = NDW_mean - NDW_sd,
ymax = NDW_mean + NDW_sd),
width=0.1,size=0.3,
color = "darkgrey")+
theme_bw() +
theme(axis.line = element_line(colour="black"),
panel.grid.minor = element_blank(),
panel.background = element_blank())
How do I add on each barplot the significance letters (anova) in de the facet wrap graph?
No idea if the test fits your data distribution, but you can start with that:
library(tidyverse)
stat_pvalue <- dd %>%
group_by(location, variety, year) %>%
rstatix::t_test(nodule_dry_weight~treatment) %>%
filter(p < 0.05) %>%
group_by(location, variety, year) %>%
rstatix::add_significance("p") %>%
rstatix::add_y_position() %>%
mutate(y.position = seq(min(y.position), max(y.position),length.out = n())*1.1) %>%
ungroup()
ggplot(data=dd,aes(x=treatment, y=nodule_dry_weight))+
geom_boxplot() +
facet_wrap(~location + variety + year,ncol=2, scales = "free_y") +
ggpubr::stat_pvalue_manual(stat_pvalue, label = "p")
When calculating a polychoric correlation in R (library(polycor), function hetcor) I get the warning message In log(P) : NaNs produced. I wasn't able to figure out what this warning message might constitute. I suppose it has to do with the calculation of the p-values for testing bivariate normality.
Thus my questions are:
What characteristics of this dataset result in this warning?
What's the meaning of this warning?
Is this warning problematic in terms of using the polychoric correlation matrix for further analyses?
Data subset:
foo <- structure(list(item1 = structure(c(4L, 4L, 4L, 2L, 2L, 2L,
2L, 2L, 4L, 2L, 2L, 3L, 2L, 3L, 2L, 2L, 2L, 3L, 2L, 2L, 3L, 1L,
2L, 2L, 3L, 3L, 3L, 2L, 2L, 1L, 1L, 2L, 3L, 2L, 2L, 3L, 2L, 3L,
2L, 2L, 2L, 2L, 3L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 3L, 3L, 2L, 3L, 3L, 3L, 2L, 2L, 2L, 1L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 1L, 3L, 2L, 2L, 1L, 2L, 2L, 2L, 1L, 2L,
1L, 2L, 2L, 4L, 2L, 4L, 2L, 2L, 3L, 1L, 2L, 1L, 2L, 2L, 2L, 1L,
2L, 2L, 3L, 2L, 2L, 2L, 3L, 1L, 2L, 2L, 2L, 2L, 4L, 2L, 2L, 2L,
2L, 2L, 2L, 4L, 2L, 2L, 1L, 2L, 2L, 2L, 1L, 2L, 1L, 2L, 3L, 3L,
1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 2L, 3L, 3L, 3L
), .Label = c("0", "1", "2", "3"), class = c("ordered", "factor"
)), item2 = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L,
1L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 3L, 2L, 1L, 3L, 2L, 1L, 1L, 3L,
1L, 3L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 3L, 2L, 2L, 1L,
3L, 2L, 1L, 2L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 3L, 1L, 1L,
2L, 3L, 2L, 1L, 2L, 2L, 3L, 1L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 1L,
1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 2L, 1L, 1L, 1L,
2L, 2L, 3L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 2L,
2L, 1L, 2L, 1L, 2L, 1L, 3L, 2L, 1L, 3L, 1L, 1L, 1L, 2L, 2L, 1L,
2L, 1L, 3L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 4L, 1L, 1L, 1L,
1L, 2L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 4L, 1L, 1L, 3L), .Label = c("0",
"1", "2", "3"), class = c("ordered", "factor")), item3 = structure(c(4L,
4L, 4L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 4L, 1L, 2L, 1L, 1L, 1L,
1L, 2L, 1L, 4L, 2L, 2L, 1L, 3L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 2L, 3L, 1L, 1L, 1L, 2L, 1L, 1L,
2L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 2L, 1L, 2L, 1L, 1L, 2L, 2L, 1L, 2L, 1L, 1L, 1L, 1L, 1L,
1L, 3L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 2L, 1L, 2L, 2L, 1L, 2L, 3L,
1L, 3L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 2L, 3L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
2L, 1L, 3L, 2L, 1L), .Label = c("0", "1", "2", "3"), class = c("ordered",
"factor")), item4 = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
2L, 1L, 1L, 1L, 3L, 1L, 2L, 1L, 1L, 1L, 2L, 2L, 1L, 3L, 2L, 1L,
1L, 3L, 1L, 3L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 2L, 2L,
2L, 1L, 2L, 2L, 1L, 2L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 2L,
1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 3L, 1L, 2L, 3L, 2L, 1L, 1L, 1L,
1L, 1L, 2L, 1L, 1L, 1L, 2L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 1L,
1L, 2L, 1L, 2L, 3L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L,
1L, 2L, 2L, 2L, 3L, 1L, 1L, 2L, 2L, 2L, 1L, 3L, 1L, 1L, 1L, 2L,
2L, 1L, 1L, 1L, 2L, 1L, 3L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 4L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 4L, 1L, 2L, 3L), .Label = c("0",
"1", "2", "3"), class = c("ordered", "factor")), item5 = structure(c(4L,
4L, 4L, 1L, 1L, 1L, 1L, 2L, 3L, 2L, 2L, 4L, 2L, 3L, 2L, 1L, 1L,
3L, 3L, 3L, 4L, 3L, 2L, 1L, 3L, 3L, 4L, 1L, 2L, 1L, 1L, 1L, 2L,
2L, 2L, 3L, 3L, 3L, 3L, 1L, 1L, 3L, 4L, 2L, 1L, 2L, 2L, 2L, 2L,
3L, 1L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 4L, 3L, 3L, 1L,
2L, 1L, 1L, 3L, 1L, 2L, 2L, 1L, 3L, 2L, 1L, 2L, 2L, 1L, 1L, 2L,
1L, 2L, 4L, 2L, 2L, 1L, 2L, 2L, 4L, 2L, 4L, 1L, 1L, 2L, 1L, 1L,
1L, 2L, 2L, 2L, 2L, 3L, 2L, 3L, 2L, 1L, 3L, 2L, 1L, 1L, 3L, 3L,
1L, 4L, 1L, 1L, 1L, 1L, 2L, 3L, 3L, 3L, 2L, 1L, 3L, 2L, 1L, 1L,
1L, 1L, 2L, 3L, 4L, 1L, 1L, 1L, 1L, 2L, 1L, 2L, 1L, 1L, 3L, 1L,
3L, 3L, 4L, 3L, 3L), .Label = c("0", "1", "2", "3"), class = c("ordered",
"factor"))), .Names = c("item1", "item2", "item3", "item4",
"item5"))
Computation of correlation matrix:
hetcor(foo)
Comment: the real dataset contains about 2500 rows (and more variables), but when evaluating the contingency tables a sparse matrix doesn't seem to be an issue.
A short (and belated) answer to a very old question. The warning is because some of the cells in the cross tabulation of the variables (for example, variables 1 and 2) have 0 values in the cells. This can lead to problems in estimation.
The polychoric (and tetrachoric) correlations are normal theory approximations of what would happen if bivariate normal (and continuous) data were converted into categorical (dichotomous for tetrachorics, polytomous for polychorics) data. The normal theory approximation assumes that all cells have some value. However, the correlations can be found with 0 cell values, but with a warning. The resulting correlations are correct, but unstable, in that if we add a small correction for continuity (i.e., add .1 or .5 to the 0 cells), the values change a great deal. This problem is discussed by Gunther and Hofler for the case of tetrachoric correlations where they compare solutions with and with the correction for continuity.
(See the article by A. Gunther and M. Hofler. Different results on tetrachorical correlations in mplus and stata-stata announces modified procedure. Int J Methods Psychiatr Res, 15(3):157-66, 2006. for a discussion of this problem with tetrachoric correlations.)
Using the polychoric function in the psych package, we find the same answer as the hetcor function from polycor if we do not apply the correction for continuity, but somewhat different values if we do correct for continuity. I recommend the correction.
See the help function for polychoric in psych for a longer discussion of this problem.