Print factor analysis from factanal() with item labels - r
EDIT
So it looks like it's something in my call to library(reshape) that's breaking the labeling of factors. This was not included in the minimal example, but will be added now. It's not needed to create the example, but it's needed to recreate the issue. I need the library to get my data in shape to even do factanal(). Any ideas what part of reshape is breaking it and how to fix it?
Original question
I have been running factor analyses on my data and have been having an intermittent issue with the way results are printed.
If I create a data set like the following:
library(reshape)
mock <- data.frame(
sample_name1 = sample(1:100),
sample_name2 = sample(1:100),
sample_name3 = sample(1:100),
s_amplename_4 = sample(1:100),
samplename5 = sample(1:100),
sa_mplen_a_me_6 = sample(1:100),
samplename7 = sample(1:100),
samplename8 = sample(1:100)
)
and run a factor analysis with
factanal(mock, factors = 2)
I get the output to print out very prettily with item names as labels for the rows, e.g.:
# Snip snip
Loadings:
Factor1 Factor2
sample_name1 -0.126 -0.105
sample_name2 -0.414
sample_name3 0.665
s_amplename_4 -0.314
samplename5 0.850
sa_mplen_a_me_6 -0.117
samplename7 0.442
samplename8 -0.139
This kind of output is exactly what I am looking for. However, when I run the same type of analysis on my own data (and I apologize for the length here):
miniset <- structure(list(`clarity1` = c(2, 2, 2, 3, 4.5, 1.5, 1.5, 3.5,
2, 6, 2.5, 4, 1, 1.5, 6, 2, 5.5, 2, 2, 3, 1.5, 5, 3.5, 2, 1.5,
2.5, 3, 3, 2, 1),
`clarity2` = c(1.5, 2, 2, 2, 3.5, 5, 3, 5,
2, 4, 2, 2.5, 1, 1.5, 2, 4, 5, 2, 2, 3.5, 6, 1, 2, 1.5, 1, 2,
2, 3, 6.5, 1),
`clarity3` = c(3, 3.5, 2, 3.5, 5.5, 4, 6, 5.5,
2, 3, 3, 3.5, 1, 2.5, 2, 5, 5, 5, 2, 6.5, 5.5, 5, 5.5, 6, 3,
2, 2, 5, 4.5, 5.5),
`detail1` = c(3, 4, 2, 6, 5, 6.5, 5.5,
4, 3, 6, 2.5, 4, 1, 4, 2, 4.5, 7, 6.5, 2, 6.5, 6, 2, 6, 5, 2.5,
5.5, 4, 5.5, 6, 1.5),
`detail2` = c(3.5, 4, 4, 6.5, 4.5, 6,
4, 4.5, 2, 6, 2.5, 5, 2, 4, 3, 6, 7, 7, 2, 6.5, 6, 3, 6, 6, 2.5,
6, 3, 5, 6.5, 2.5),
`detail3` = c(2.5, 4, 2, 6, 5, 6, 6, 4,
2, 6, 2, 5, 2, 3, 3, 5, 6.5, 6, 2, 6.5, 7, 7, 5.5, 5, 3.5, 2,
3, 5, 6, 2),
`complete1` = c(2, 2.5, 2, 3, 3.5, 5.5, 2.5, 2.5,
2, 3, 3, 3.5, 2, 4, 3, 3, 7, 4, 2, 3, 6, 3, 5.5, 2, 3, 2, 2,
3, 6, 3),
`complete2` = c(3, 4.5, 2, 3, 4.5, 6, 6, 4.5, 3,
3, 3.5, 4, 2, 5, 3, 4, 7, 4, 2, 6, 7, 5, 5, 6, 3, 3, 5, 5, 6,
2),
`complete3` = c(3, 4.5, 2, 2.5, 4.5, 6.5, 5, 5, 2, 6.5,
3.5, 3.5, 1, 3, 3, 2.5, 7, 4, 2, 6, 1.5, 7, 5.5, 6.5, 3.5, 5.5,
3, 3, 2.5, 1),
`truthful1` = c(2.5, 2, 2, 3, 3.5, 2, 2, 2.5,
2, 3, 3, 2.5, 2, 3, 2, 2, 3.5, 3, 2, 3.5, 1.5, 1, 3.5, 2.5, 3,
2, 2, 3, 1.5, 1.5),
`truthful2` = c(2.5, 1.5, 2, 2, 3, 1.5,
2, 1, 1, 5.5, 3, 3.5, 1, 4.5, 2, 2, 5, 2, 2, 1.5, 4.5, 1, 3.5,
2, 3.5, 2.5, 2, 2, 4.5, 1),
`truthful3` = c(2, 1.5, 2, 3.5,
2.5, 2, 2, 2.5, 2, 2, 3.5, 2.5, 1, 1.5, 3, 2, 5, 3, 3, 2, 3.5,
1, 2, 1, 3.5, 2, 2, 2.5, 4.5, 1),
`relevant1` = c(1.5, 1.5,
2, 5, 2.5, 1.5, 2, 3.5, 2, 4.5, 2.5, 3.5, 1, 3.5, 3, 1.5, 5.5,
3.5, 2, 2, 6, 3, 3.5, 3, 1.5, 2, 3, 3, 6, 1),
`relevant2` = c(1.5,
3, 2, 2, 3.5, 1.5, 2.5, 5.5, 1, 2, 3.5, 2, 1, 1.5, 2, 4, 5.5,
2, 3, 5.5, 5.5, 1, 4, 5, 1.5, 2, 3, 2.5, 3, 1),
`relevant3` = c(1.5,
2, 2, 3, 2, 1, 2, 2, 1, 2, 1.5, 2.5, 1, 1.5, 2, 1.5, 5.5, 5,
2, 1, 7, 1, 1, 2, 1, 2, 3, 3, 2.5, 1)),
.Names = c("clarity1",
"clarity2", "clarity3", "detail1", "detail2", "detail3",
"complete1", "complete2", "complete3", "truthful1", "truthful2",
"truthful3", "relevant1", "relevant2", "relevant3"),
row.names = c(NA, 30L), class = c("cast_df", "data.frame"))
factanal(miniset, factors = 3)
the result is much less pretty, e.g.:
Loadings:
Factor1 Factor2 Factor3
[1,] 0.222 0.664
[2,] 0.559 0.524
[3,] 0.824
[4,] 0.740 0.361 0.282
[5,] 0.698 0.374 0.251
[6,] 0.783 0.278 0.265
[7,] 0.498 0.598 0.140
[8,] 0.796 0.227 0.204
[9,] 0.490 -0.240 0.835
[10,] 0.147 0.156 0.348
[11,] 0.697 0.324
[12,] 0.756
[13,] 0.319 0.811 0.204
[14,] 0.567 0.252 0.108
[15,] 0.320 0.690
So rather than having the nice item names as labels for the loadings, I now get indices. While that's fine for me, I'll be working with a professor tomorrow who is less familiar with R and will probably get frustrated by the lack of labels. So what happens to the labels in the second case? And how can I get them back?
The issue is that miniset is a cast_df and factanal calls as.matrix(x). The as.matrix.cast_df method uses rrownames and rcolnames (all reshape functions) to extract "special dimension names".
For miniset these are NULL (hence the rownames are lost). Without knowing how you constructed miniset I can't help further here. (You must have used reshape to construct miniset at some point as you have created a cast_df object.
Good news is that
factanal(as.data.frame(miniset))
Works as you wish
Related
R: How to plot a boxplot with numeric x-axis for according spacing (not ggplot)
I want to plot a boxplot with the regular /boxplot function of R - not ggplot. Y and X axis are continously numeric varibales (x-axis 6 forces: 1.0, 1.3, 1.6, 2.0, 2.5, 3.1 [N]) On Y-axis the participants ratings (1 to 7). I would like to plot it, with quantified spacing on x-axis, and also later add regression line into the plot. Cant find anything for the regular /boxplot function. Code so far: kraft_ou <- data.frame(VR1_100$ou_kraft, VR1_125$ou_kraft, VR1_160$ou_kraft, VR1_200$ou_kraft, VR1_250$ou_kraft, VR1_310$ou_kraft) colnames(kraft_ou) <- c("kraft_100", "kraft_125", "kraft_160", "kraft_200", "kraft_250", "kraft_310") kraft_ou boxplot(kraft_ou, names=c("1,0 [N]", "1,3 [N]","1,6 [N]","2,0 [N]","2,5 [N]","3,1 [N]"), col = "bisque", ylim = c(1, 7)) points(1:6, meanskraftou, pch=4) text(1:6, meanskraftou + 0.24, labels = meanskraftou) abline(h=4) data (n=30 ratings from 1 to 7 for each of the 6 forces): dput(kraft_ou) structure(list(kraft_100 = c(4, 3, 5, 5, 3, 4, 2, 4, 4, 5, 4, 5, 5, 4, 4, 3, 4, 4, 5, 4, 6, 5, 4, 5, 5, 5, 4, 4, 4, 4), kraft_125 = c(4, 4, 5, 6, 4, 3, 4, 4, 4, 5, 4, 5, 4, 5, 4, 3, 4, 4, 4, 6, 6, 4, 4, 5, 3, 5, 4, 4, 4, 5), kraft_160 = c(5, 6, 6, 6, 6, 4, 6, 5, 6, 5, 4, 3, 6, 6, 6, 5, 5, 5, 5, 6, 6, 6, 5, 5, 4, 6, 4, 5, 5, 5), kraft_200 = c(6, 5, 6, 6, 5, 4, 5, 5, 6, 7, 5, 3, 5, 5, 5, 4, 7, 6, 5, 5, 7, 6, 5, 6, 6, 6, 5, 4, 5, 3), kraft_250 = c(5, 6, 6, 7, 6, 6, 6, 6, 7, 7, 6, 5, 7, 7, 5, 5, 6, 6, 7, 7, 6, 6, 5, 5, 5, 7, 4, 6, 6, 5), kraft_310 = c(7, 7, 7, 7, 6, 5, 6, 6, 6, 7, 4, 5, 7, 6, 5, 5, 7, 6, 5, 6, 6, 6, 5, 6, 5, 6, 5, 6, 6, 6)), class = "data.frame", row.names = c(NA, -30L))
You can use the at argument to specify x locations for your boxplots, though to get them narrow enough to avoid overplotting, you need to add an invisible box and set the relative widths of the visible boxes to a smaller value: boxplot(cbind(kraft_ou, n = rep(NA, nrow(kraft_ou))), names=c("1,0 [N]", "1,3 [N]","1,6 [N]","2,0 [N]","2,5 [N]","3,1 [N]", " "), col = "bisque", ylim = c(1, 7), width = c(0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 1), at = c(1, 1.3, 1.6, 2.0, 2.5, 3.1, 3.1)) abline(h = 4) To add a regression line, you would need to have all your data frame values in a single y variable, and a vector of their corresponding x axis positions: abline(lm(unlist(kraft_ou) ~ rep(c(1, 1.3, 1.6, 2.0, 2.5, 3.1), each = 30)))
Math Symbols within for loop of GGplots in R
I'm currently trying to develop a similar result as this link. I have a significant number of columns and several different labels for the x-axis. col1 <- c(2, 4, 1, 2, 5, 1, 2, 0, 1, 4, 4, 3, 5, 2, 4, 3, 3, 6, 5, 3, 6, 4, 3, 4, 4, 3, 4, 2, 4, 3, 3, 5, 3, 5, 5, 0, 0, 3, 3, 6, 5, 4, 4, 1, 3, 3, 2, 0, 5, 3, 6, 6, 2, 3, 3, 1, 5, 3, 4, 6) col2 <- c(2, 4, 4, 0, 4, 4, 4, 4, 1, 4, 4, 3, 5, 0, 4, 5, 3, 6, 5, 3, 6, 4, 4, 2, 4, 4, 4, 1, 1, 2, 2, 3, 3, 5, 0, 3, 4, 2, 4, 5, 5, 4, 4, 2, 3, 5, 2, 6, 5, 2, 4, 6, 3, 3, 3, 1, 4, 3, 5, 4) col3 <- c(2, 5, 4, 1, 4, 2, 3, 0, 1, 3, 4, 2, 5, 1, 4, 3, 4, 6, 3, 4, 6, 4, 1, 3, 5, 4, 3, 2, 1, 3, 2, 2, 2, 4, 0, 1, 4, 4, 3, 5, 3, 2, 5, 2, 3, 3, 4, 2, 4, 2, 4, 5, 1, 3, 3, 3, 4, 3, 5, 4) col4 <- c(2, 5, 2, 1, 4, 1, 3, 4, 1, 3, 5, 2, 4, 3, 5, 3, 4, 6, 3, 4, 6, 4, 3, 2, 5, 5, 4, 2, 3, 2, 2, 3, 3, 4, 0, 1, 4, 3, 3, 5, 4, 4, 4, 3, 3, 5, 4, 3, 5, 3, 6, 6, 4, 2, 3, 3, 4, 4, 4, 6) data2 <- data.frame(col1,col2,col3,col4) data2[,1:4] <- lapply(data2[,1:4], as.factor) colnames(data2)<- c("A","B","C", "D") > x.axis.list [[1]] expression(beta[paste(1, ",", 1L)]) [[2]] expression(beta[paste(1, ",", 2L)]) [[3]] expression(beta[paste(1, ",", 3L)]) [[4]] expression(beta[paste(1, ",", 4L)]) myplots <- vector('list', ncol(data2)) for (i in seq_along(data2)) { message(i) myplots[[i]] <- local({ i <- i p1 <- ggplot(data2, aes(x = data2[[i]])) + geom_histogram(fill = "lightgreen") + xlab(x.axis.list[[i]]) print(p1) }) } In the past, I've been able to do something similar to this where I can just put x.axis.list[[i]] in my loop and change the symbols. However, I continue to get the term expression on the axis. So the symbol for Beta is correct as well as the subscript but the word "expression" remains. I'm not sure exactly what I'm doing wrong, for a moment, I was able to produce a plot without "expression" but it has since stayed in the ggplot. I want to be able to produce this plot, or one with the title on the y-axis without the word "expression". My image currently looks . I'm not worried about this example data and the result of the plot, I'm wondering how to get rid of "expression" so only the math symbol shows. Thanks in advance.
You can do: for (i in seq_along(data2)) { df <- data2[i] names(df)[1] <- "x" myplots[[i]] <- local({ p1 <- ggplot(df, aes(x = x)) + geom_bar(fill = "lightgreen", stat = "count") + xlab(x.axis.list[[i]]) }) } And we can show all the plots together: library(patchwork) (myplots[[1]] + myplots[[2]]) / (myplots[[3]] + myplots[[4]]) Note I created the expression list like this: x.axis.list <- lapply(1:4, function(i){ parse(text = paste0("beta[paste(1, \",\", ", i, ")]")) })
R Multiple T-test: Grouping factor must have 2 variables
I'm trying to compare a control group with an experimental group on a range of variable to show that they are similar (baseline). I thus need to do multiple t-test (unpaired/ Welch t-test). My data is in a long format with the first variable called "Group" with either a number 1 or a number 2. There are some missing values in some of my other variables but it's pretty random. So when I run t-test manually using this line of code: t.test(variable_1 ~ Group,df) it works. I then tried to do it all at once using this line of code: sapply(df[,2:71], function(i) t.test(i ~ df$Group)$p.value) But I get the following error: grouping factor must have exactly 2 levels Could anyone help? Here is what the structure looks like structure(list(Group = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2), EM_Accuracy_Time_Airport = c(3, 3, 0, 1, 1, 2, 2, 1, 1, 3, 3, 2, 2, 2, 1, 3, 1, 3, 1, 1), EM_Accuracy_Place_Airport = c(2, 2, 1, 2, 1, 2, 2, 1, 1, 2, 0, 2, 2, 0, 2, 2, 2, 1, 1, 1), EM_Accuracy_Expl_Airport = c(2, 2, 2, 0, 2, 2, 2, 1, 2, 2, 2, 2, 2, 0, 0, 1, 0, 2, 2, 1), EM_Accuracy_Death_Airport = c(0, 2, 0, 0, 1, 2, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 3, 0, 0, 0), EM_Accuracy_Time_Metro = c(3, 1, 0, 0, 1, 1, 0, 1, 1, 1, 1, 2, 1, 3, 1, 1, 2, 1, 3, 3), EM_Accuracy_Death_Metro = c(3, 0, 1, 0, 1, 1, 0, 0, 0, 3, 0, 0, 1, 0, 3, 1, 1, 1, 0, 0), EM_Accuracy_PC_Time_Airpot = c(100, 100, 0, 33.3333333333333, 33.3333333333333, 66.6666666666667, 66.6666666666667, 33.3333333333333, 33.3333333333333, 100, 100, 66.6666666666667, 66.6666666666667, 66.6666666666667, 33.3333333333333, 100, 33.3333333333333, 100, 33.3333333333333, 33.3333333333333 ), EM_Accuracy_PC_Place_Airport = c(100, 100, 50, 100, 50, 100, 100, 50, 50, 100, 0, 100, 100, 0, 100, 100, 100, 50, 50, 50), EM_Accuracy_PC_Expl_Airport = c(100, 100, 100, 0, 100, 100, 100, 50, 100, 100, 100, 100, 100, 0, 0, 50, 0, 100, 100, 50), EM_Accuracy_PC_Death_Airport = c(0, 66.6666666666667, 0, 0, 33.3333333333333, 66.6666666666667, 0, 0, 0, 0, 0, 0, 66.6666666666667, 0, 0, 0, 100, 0, 0, 0), EM_Accuracy_PC_Time_Metro = c(100, 33.3333333333333, 0, 0, 33.3333333333333, 33.3333333333333, 0, 33.3333333333333, 33.3333333333333, 33.3333333333333, 33.3333333333333, 66.6666666666667, 33.3333333333333, 100, 33.3333333333333, 33.3333333333333, 66.6666666666667, 33.3333333333333, 100, 100), EM_Accuracy_PC_Death_Metro = c(100, 0, 33.3333333333333, 0, 33.3333333333333, 33.3333333333333, 0, 0, 0, 100, 0, 0, 33.3333333333333, 0, 100, 33.3333333333333, 33.3333333333333, 33.3333333333333, 0, 0), EM_ACCURACY_PC = c(83.3333333333333, 66.6666666666667, 30.5555555555556, 22.2222222222222, 47.2222222222222, 66.6666666666666, 44.4444444444444, 27.7777777777778, 36.1111111111111, 72.2222222222222, 38.8888888888889, 55.5555555555555, 66.6666666666666, 27.7777777777778, 44.4444444444444, 52.7777777777778, 55.5555555555556, 52.7777777777778, 47.2222222222222, 38.8888888888889), EM_Certainty_Time_Airport = c(3, 1, 1, 1, 2, 2, 1, 1, 2, 3, 3, 2, 2, 2, 4, 2, 3, 3, 2, 2), EM_Certainty__Place_Airport = c(3, 4, 2, 2, 2, 2, 4, 1, 3, 4, 4, 4, 4, 3, 3, 4, 4, 3, 2, 3), EM_Certainty__Expl_Airport = c(4, 2, 3, 1, 2, 3, 2, 1, 2, 4, 1, 3, 2, 2, 1, 3, 1, 2, 2, 3), EM_Certainty__Death_Airport = c(1, 1, NA, 1, 2, 1, 3, 1, 2, 3, NA, 3, 2, 1, 2, 1, 1, 1, 4, 4), EM_Certainty__Time_Metro = c(3, 3, 1, 1, 2, 2, 2, 1, 3, 2, 3, 2, 3, 2, 2, 2, 3, 1, 2, 2), EM_Certainty__Death_Metro = c(2, 1, 1, NA, 2, 1, 1, 1, 2, 1, NA, 3, 2, 1, 1, 1, 1, 1, 1, 4), EM_CERTAINTY = c(2.66666666666667, 2, 1.6, 1.2, 2, 1.83333333333333, 2.16666666666667, 1, 2.33333333333333, 2.83333333333333, 2.75, 2.83333333333333, 2.5, 1.83333333333333, 2.16666666666667, 2.16666666666667, 2.16666666666667, 1.83333333333333, 2.16666666666667, 3), EM_CONFIDENCE = c(5, 5, 1, 2, 2, 4, 5, 2, 3, 4, 5, 5, 3, 3, 4, 4, 3, 2, 3, 2), FBM_CONFIDENCE = c(4, 6, 7, 7, 5, 4, 2, 7, 5, 6, 6, 7, 6, 7, 3, 6, 6, 4, 5, 6), FBM_Vividness_Time = c(3, 3, 1, 4, 3, 2, 4, 3, 4, 4, 1, 3, 4, 4, 3, 3, 3, 2, 4, 3), FBM_Vividness_How = c(4, 4, 2, 4, 4, 3, 4, 4, 4, 4, 3, 4, 3, 4, 4, 4, 4, 4, 4, 4), FBM_Vividness_Where = c(4, 4, 2, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4), FBM_Vividness_WithWhom = c(4, 4, 3, 4, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4), FBM_Vividness_WereDoing = c(4, 4, 1, 4, 3, 4, 4, 4, 4, 4, 1, 4, 4, 4, 4, 4, 4, 4, 4, 4), FBM_Vividness_Did_After = c(4, 4, 3, 4, 2, 3, 4, 4, 2, 4, 1, 4, 4, 4, 3, 4, 4, 3, 4, 4), FBM_VIVIDNESS = c(3.83333333333333, 3.83333333333333, 2, 4, 3.16666666666667, 3.33333333333333, 4, 3.83333333333333, 3.66666666666667, 4, 2.33333333333333, 3.83333333333333, 3.83333333333333, 4, 3.66666666666667, 3.83333333333333, 3.83333333333333, 3.5, 4, 3.83333333333333 ), FBM_Details_NB_T2 = c(3, 5, 0, 5, 5, 5, 2, 5, 1, 5, 3, 5, 5, 5, 2, 4, 2, 3, 5, 5), P_Novelty_5 = c(5, 6.2, 6.5, 5.6, 4.8, 5.4, 4, 4.2, 4.4, 5.8, 3.4, 5.8, 6, 5.8, 3.8, 6.4, 6.8, 6.6, 7, 3), P_Suprise_emotion = c(6, 6, 6, 6, 4, 5, 1, 7, 1, 5, 4, 5, 7, 7, 6, 4, 7, 7, 2, 5), P_Surprise_Expected = c(1, 3, 5, 2, 4, 3, 6, 2, 2, 1, 6, 4, 3, 1, 5, 1, 1, 1, 5, 4), P_Surprise_Unbelievable = c(5, 4, 1, 6, 4, 4, 2, 7, 1, 4, 1, 6, 7, 7, 6, 3, 7, 7, 5, 3), `P_Consequence-Importance_5` = c(5.6, 4.8, 3.4, 5, 4.8, 4, 5, 5.4, 3, 5.2, 6.8, 5.4, 4, 4.4, 6, 3.8, 4, 4.8, 5, 5.2), P_Emotional_Intensity_4 = c(5.25, 5.75, 3, 4.75, 4.75, 6, 4, 5.25, 2.5, 5.5, 7, 6.5, 5.75, 6.75, 6.75, 6, 6.25, 6, 5, 2.5), P_Social_Sharing_6 = c(3.66666666666667, 3.83333333333333, 3.4, 3.16666666666667, 3, 3.33333333333333, 3.8, 3.16666666666667, 2.16666666666667, 4.16666666666667, 4, 4.5, 4.5, 4.33333333333333, 4, 3.16666666666667, 3.66666666666667, 4, NA, NA), P_Media_3 = c(4.66666666666667, 4, 3, 2.66666666666667, 2.66666666666667, 2.33333333333333, 3, 2.33333333333333, 2.33333333333333, 3.33333333333333, 4.33333333333333, 5, 4.33333333333333, 5, 4, 2, 3, 3.33333333333333, 2, 1.66666666666667 ), P_Ruminations = c(3, NA, 3, 2, 4, NA, 4, 2, 1, 4, 4, 4, 2, 4, 2, 3, 3, 3, 4, 3), P_Novelty_Common_rev = c(6, 7, 7, 7, 4, 6, 4, 7, 2, 6, 3, 7, 7, 7, 3, 6, 7, 7, 7, 3), P_Novelty_Unusual = c(2, 5, 7, 7, 3, 5, 3, 3, 5, 6, 1, 4, 7, 1, 4, 6, 6, 6, 7, 2), P_Novelty_Special = c(6, 6, NA, 6, 5, 5, 4, 3, 5, 4, 1, 5, 6, 7, 4, 6, 7, 7, 7, 3), P_Novelty_Singular = c(4, 6, 5, 1, 5, 5, 4, 1, 3, 6, 5, 6, 4, 7, 3, 7, 7, 6, 7, 2), P_Novelty_Ordinary_rev = c(7, 7, 7, 7, 7, 6, 5, 7, 7, 7, 7, 7, 6, 7, 5, 7, 7, 7, 7, 5), P_Consequence = c(6, 7, 5, 4, 5, 4, 5, 3, 5, 5, 7, 5, 5, 2, 6, 6, 1, 4, 6, 3), P_Importance_self = c(4, 3, 3, 4, 4, 3, 5, 6, 1, 5, 7, 5, 3, 3, 5, 2, 2, 4, 5, 3), `P_Importance_friends&family` = c(4, 4, 3, 4, 4, 4, 4, 6, 1, 5, 6, 5, 3, 3, 5, 2, 6, 4, 5, 10), P_Importance_Belgium = c(7, 5, 3, 7, 6, 5, 6, 7, 3, 7, 7, 7, 5, 7, 7, 5, 6, 7, 6, 6), P_Importance_International = c(7, 5, 3, 6, 5, 4, 5, 5, 5, 4, 7, 5, 4, 7, 7, 4, 5, 5, 3, 4), P_Emotional_Intensity_Upset = c(4, 5, NA, 3, 3, 5, 3, 5, 2, 5, 7, 5, 5, 6, 7, 6, 6, 5, 5, 3), P_Emotional_Intensity_Indiferent_rev = c(7, 7, 5, 7, 6, 7, 4, 6, 4, 7, 7, 7, 7, 7, 7, 7, 7, 7, NA, 4), P_Emotional_Intensity_Affected = c(6, 6, 3, 5, 5, 6, 5, 6, 2, 5, 7, 7, 5, 7, 7, 6, 6, 6, NA, 2), P_Emotional_Intensity_Shaken = c(4, 5, 1, 4, 5, 6, 4, 4, 2, 5, 7, 7, 6, 7, 6, 5, 6, 6, 5, 1), P_Rehearsal_Media_TV = c(5, 3, NA, 3, 2, 3, NA, 1, 1, 4, 3, 5, 5, 5, 2, 3, 2, 2, 2, 2), P_Rehearsal_Media_Internet = c(4, 4, 1, 3, 2, 2, 2, 4, 3, 2, 5, 5, 3, 5, 5, 1, 5, 4, 2, 1), P_Rehearsal_Media_Social_Networks = c(5, 5, 5, 2, 4, 2, 4, 2, 3, 4, 5, 5, 5, 5, 5, 2, 2, 4, 2, 2), P_Social_Sharing_How_Often = c(4, 5, 4, 4, 4, 3, 3, 3, 3, 5, 4, 5, 5, 5, 5, 3, 4, 4, 5, NA), P_Social_Sharing_With_How_Many_People = c(5, 4, NA, 3, 3, 3, 3, 3, 2, 5, 3, 5, 5, 3, 5, 3, 3, 4, 3, NA), PK_Shops_YN = c(0, 1, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1), PK_Comic = c(0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0), PK_Hotel = c(0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0), PK_Decoration_Maelbeek = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 1, 1), PK_Stations_before_after_Maelbeek = c(0, 0.5, 0, 0, 0, 0, 0, 0, 0.5, 1, 0, 0, 0.5, 0.5, 0, 0, 0.5, 0, 0.5, 0), PK_TOTAL_PC = c(0, 50, 0, 40, 40, 40, 20, 0, 10, 60, 20, 40, 90, 70, 20, 0, 30, 20, 70, 40), SI_Attachment_BXL = c(6, 4, 1, 4, 2, 5, 1, 6, 5, 4, 2, 6, 6, 7, 1, 3, 6, 4, 5, 4), SI_Pride_BXL = c(1, 2, 1, 2, 1, 2, 1, 5, 1, 6, 1, 1, 7, 7, 1, 2, 6, 1, 3, 3), SI_Attachment_Belgium = c(7, 3, 5, 5, 4, 6, 7, 6, 5, 6, 7, 7, 7, 7, 5, 6, 7, 6, 4, 2), SI_Pride_Belgium = c(7, 2, 6, 4, 2, 6, 4, 5, 1, 5, 1, 6, 7, 7, 5, 7, 7, 6, 2, 2), SI_Attachment_EU = c(6, 4, 2, 5, 4, 4, 5, 4, 7, 4, 1, 6, 7, 7, 5, 4, 6, 6, 2, 6), SI_Pride_EU = c(7, 1, 1, 4, 3, 4, 4, 4, 1, 4, 1, 6, 7, 7, 4, 3, 6, 6, 2, 4)), .Names = c("Group", "EM_Accuracy_Time_Airport", "EM_Accuracy_Place_Airport", "EM_Accuracy_Expl_Airport", "EM_Accuracy_Death_Airport", "EM_Accuracy_Time_Metro", "EM_Accuracy_Death_Metro", "EM_Accuracy_PC_Time_Airpot", "EM_Accuracy_PC_Place_Airport", "EM_Accuracy_PC_Expl_Airport", "EM_Accuracy_PC_Death_Airport", "EM_Accuracy_PC_Time_Metro", "EM_Accuracy_PC_Death_Metro", "EM_ACCURACY_PC", "EM_Certainty_Time_Airport", "EM_Certainty__Place_Airport", "EM_Certainty__Expl_Airport", "EM_Certainty__Death_Airport", "EM_Certainty__Time_Metro", "EM_Certainty__Death_Metro", "EM_CERTAINTY", "EM_CONFIDENCE", "FBM_CONFIDENCE", "FBM_Vividness_Time", "FBM_Vividness_How", "FBM_Vividness_Where", "FBM_Vividness_WithWhom", "FBM_Vividness_WereDoing", "FBM_Vividness_Did_After", "FBM_VIVIDNESS", "FBM_Details_NB_T2", "P_Novelty_5", "P_Suprise_emotion", "P_Surprise_Expected", "P_Surprise_Unbelievable", "P_Consequence-Importance_5", "P_Emotional_Intensity_4", "P_Social_Sharing_6", "P_Media_3", "P_Ruminations", "P_Novelty_Common_rev", "P_Novelty_Unusual", "P_Novelty_Special", "P_Novelty_Singular", "P_Novelty_Ordinary_rev", "P_Consequence", "P_Importance_self", "P_Importance_friends&family", "P_Importance_Belgium", "P_Importance_International", "P_Emotional_Intensity_Upset", "P_Emotional_Intensity_Indiferent_rev", "P_Emotional_Intensity_Affected", "P_Emotional_Intensity_Shaken", "P_Rehearsal_Media_TV", "P_Rehearsal_Media_Internet", "P_Rehearsal_Media_Social_Networks", "P_Social_Sharing_How_Often", "P_Social_Sharing_With_How_Many_People", "PK_Shops_YN", "PK_Comic", "PK_Hotel", "PK_Decoration_Maelbeek", "PK_Stations_before_after_Maelbeek", "PK_TOTAL_PC", "SI_Attachment_BXL", "SI_Pride_BXL", "SI_Attachment_Belgium", "SI_Pride_Belgium", "SI_Attachment_EU", "SI_Pride_EU"), row.names = c(NA, -20L), class = c("tbl_df", "tbl", "data.frame"))
The error you get means that there's a problem in your dataset, with at least one of your variables. Here's a process to help you spot problematic variables: library(tidyverse) df %>% group_by(Group) %>% # for each group value summarise_all(~sum(!is.na(.))) %>% # count non NA values for each variable gather(var,value,-Group) %>% # reshape spread(Group, value, sep = "_") %>% # reshape filter(Group_2 < 2) # get problematic variables # # A tibble: 5 x 3 # var Group_1 Group_2 # <chr> <int> <int> # 1 P_Emotional_Intensity_Affected 18 1 # 2 P_Emotional_Intensity_Indiferent_rev 18 1 # 3 P_Social_Sharing_6 18 0 # 4 P_Social_Sharing_How_Often 18 1 # 5 P_Social_Sharing_With_How_Many_People 17 1 0 counts will throw an error about needing two levels in your grouping variables. 1 count will throw an error about needing more observations in one of your groups. After spotting those you have to treat them accordingly and then your original t.test code should work.
So my problem was just missing data in one variable. However, if you are looking at doing multiple T-test in a long format: this line of code works: sapply(df[,2:71], function(i) t.test(i ~ df$Group)$p.value)
Histogram for diagonal axis in scatterplot
I have a 5 x 5 scatterplot matrix that I created using ggplot. I made histograms for X and Y axis, but I needed an additional histogram for the diagonals of the matrix as well. Edited for data data <- structure(c(5, 5, 5, 3, 4, 4, 2, 4, 4, 4, 5, 4, 5, 4, 5, 1, 4, 3, 5, 4, 5, 2, 3, 3, 3, 4, 2, 5, 2, 4, 3, 3, 3, 3, 5, 4, 3, 4, 4, 4, 3, 3, 5, 3, 1, 3, 4, 5, 5, 3, 2, 4, 5, 4, 4, 5, 3, 5, 1, 3, 4, 5, 3, 2, 4, 3, 4, 1, 4, 3, 5, 2, 3, 3, 4, 5, 5, 5, 4, 3, 1, 1, 4, 2, 5, 4, 4, 1, 5, 3, 4, 2, 4, 3, 4, 4, 5, 4, 5, 1, 4, 5, 5, 5, 3, 4, 4, 2, 4, 4, 4, 5, 4, 5, 4, 5, 1, 4, 3, 5, 4, 5, 2, 3, 3, 3, 4, 2, 5, 2, 4, 3, 3, 3, 3, 5, 4, 3, 4, 4, 4, 3, 3, 5, 3, 1, 3, 4, 5, 5, 3, 2, 4, 5, 4, 4, 5, 3, 5, 1, 3, 3, 5, 2, 1, 1, 4, 5, 4, 5, 1, 1, 5, 4, 5, 3, 1, 3, 5, 5, 5, 5, 2, 1, 1, 1, 2, 3, 5, 1, 2, 5, 3, 5, 4, 5, 2, 2, 5, 2, 3, 5), .Dim = c(101L, 2L)) Here is the code library(ggplot2) library(gridExtra) data <- as.data.frame(data) x <- data$V2 y <- data$V1 xhist <- qplot(x, geom="histogram", binwidth = 0.5) yhist <- qplot(y, geom="histogram", binwidth = 0.5) + coord_flip() none <- ggplot()+geom_point(aes(1,1), colour="white") + theme(axis.ticks=element_blank(), panel.background=element_blank(), axis.text.x=element_blank(), axis.text.y=element_blank(), axis.title.x=element_blank(), axis.title.y=element_blank()) g1 <- ggplot(data, aes(x,y)) + geom_point(size = 1, position = position_jitter(w=0.3, h=0.3)) grid.arrange(yhist, g1, none, xhist, ncol=2, nrow=2, widths=c(1, 4), heights=c(4,1)) Is there a way to directly plot z-axis histogram from this data alone? What I want is to remove the panel of 'none', and instead place a histogram for data points across the diagonal.
R: Error in matrix(0, nrow = N, ncol = n.seq) : non-numeric matrix extent
library(RMallow) > dput(rankings) structure(list(MEMORY1 = c(5.5, 7, 1.5, 6, 4.5, 4.5, 5, 4, 1, 5.5, 2.5, 4.5, 2.5, 5.5, 4, 1, 4, 5, 2.5, 5.5), MEMORY2 = c(5.5, 3, 1.5, 6, 4.5, 4.5, 5, 4, 5, 5.5, 6.5, 4.5, 2.5, 5.5, 4, 7, 8, 5, 6.5, 5.5), MEMORY3 = c(5.5, 3, 4.5, 2, 4.5, 4.5, 5, 4, 5, 1.5, 6.5, 4.5, 6.5, 5.5, 4, 7, 4, 5, 6.5, 5.5), MEMORY4 = c(1.5, 3, 4.5, 2, 1, 4.5, 5, 4, 5, 5.5, 2.5, 4.5, 2.5, 1.5, 4, 2, 4, 5, 2.5, 1.5), MEMORY5 = c(5.5, 3, 4.5, 6, 4.5, 4.5, 5, 1, 5, 5.5, 6.5, 4.5, 6.5, 5.5, 4, 4, 4, 5, 2.5, 1.5), MEMORY6 = c(5.5, 7, 7.5, 6, 8, 4.5, 5, 7.5, 5, 5.5, 6.5, 4.5, 6.5, 5.5, 4, 4, 4, 5, 2.5, 5.5), MEMORY7 = c(1.5, 3, 4.5, 2, 4.5, 4.5, 1, 4, 5, 1.5, 2.5, 4.5, 6.5, 1.5, 4, 7, 4, 1, 6.5, 5.5), MEMORY8 = c(5.5, 7, 7.5, 6, 4.5, 4.5, 5, 7.5, 5, 5.5, 2.5, 4.5, 2.5, 5.5, 8, 4, 4, 5, 6.5, 5.5)), .Names = c("MEMORY1", "MEMORY2", "MEMORY3", "MEMORY4", "MEMORY5", "MEMORY6", "MEMORY7", "MEMORY8"), row.names = c(NA, 20L), class = "data.frame") abils = ncol(rankings) R = Rgen(2, hyp = NULL, abils) AllKendall(ranking, R) When I run the above code, I get an error saying that Error in matrix(0, nrow = N, ncol = n.seq) : non-numeric matrix extent. I read a few other related posts and it seems like the problem is nrow = N is not numeric. What's causing this to happen and how can I fix it?
If you have a look at the examples in ?AllKendall it appears that your "set of sequences" should be a matrix (see how they have a list of rankings and then they rbind them together?) To this effect, try AllKendall(do.call(rbind, R), do.call(rbind, rankings)) # for some reason if you put it the other way round there is an error And the result is a matrix such that output[i, j] is the distance from sequence i in R from sequence j in rankings.