Predict using psych in R for PCA - r

I have a data set which I divided into the training and testing set after first recoding qualitative variables to integers. I ran PCA analysis using the psych package.
For the training set, I ran the below code:
train.scale<-scale(trainagain[,-1:-2])
pcafit<-principal(train.scale,nfactors = 11, rotate="Varimax")
It extracted the components as below:
RC1 RC4 RC3 RC5 RC2 RC6 RC7 RC8 RC9 RC11 RC10
SS loadings 2.44 1.92 1.90 1.72 1.65 1.46 1.40 1.15 1.10 1.01 1.01
Proportion Var 0.10 0.08 0.08 0.07 0.07 0.06 0.06 0.05 0.05 0.04 0.04
Cumulative Var 0.10 0.18 0.26 0.33 0.40 0.46 0.52 0.57 0.61 0.66 0.70
Proportion Explained 0.15 0.11 0.11 0.10 0.10 0.09 0.08 0.07 0.07 0.06 0.06
Cumulative Proportion 0.15 0.26 0.37 0.48 0.58 0.66 0.75 0.81 0.88 0.94 1.00
For the test set, I ran the below code:
str(testagain)
testagain.scores<-data.frame(predict(pcafit,testagain[,c(-1:-2)]))
The str(testagain) shows that my data structure is similar to trainagain, with all contents being integers. However, for the testagain.scores, the contents are all NaN.
How can I get "predict" to work? To my knowledge, I am following:
# S3 method for psych
predict(object, data,old.data,options=NULL,missing=FALSE,impute="none",...)
from:
https://www.rdocumentation.org/packages/psych/versions/2.0.7/topics/predict.psych

I think I might stumble across the solution: to remove one of the features/columns whose data is exactly the same across all samples.

Related

Grouped boxplot in R - simplest way

I have been struggling with creating a very simple grouped boxplot. My data looks as follows
> data
Wörter Sätze Text
P.01 0.15 0.24 0.34
P.02 0.10 0.15 0.08
P.03 0.05 0.18 0.16
P.04 0.55 0.60 0.44
P.05 0.00 0.06 0.26
P.06 0.20 0.65 0.68
P.07 0.15 0.31 0.47
P.08 0.35 0.87 0.69
P.09 0.35 0.75 0.76
N.01 0.40 0.78 0.59
N.02 0.55 0.95 0.76
N.03 0.65 0.96 0.83
N.04 0.60 0.90 0.77
N.05 0.50 0.95 0.82
If I simply execute boxplot(data) I obtain almost what I want. One plot with three boxes, each for one of the variables in my data.
Boxplot, almost
What I want is to separate these into two boxes per variable (one for the P-indexed, one for the N-indexed observations) for a total of six plots each.
I began by introducing a new variable
data$Gruppe <- c(rep("P",9), rep("N",5))
> data
Wörter Sätze Text Gruppe
P.01 0.15 0.24 0.34 P
P.02 0.10 0.15 0.08 P
P.03 0.05 0.18 0.16 P
P.04 0.55 0.60 0.44 P
P.05 0.00 0.06 0.26 P
P.06 0.20 0.65 0.68 P
P.07 0.15 0.31 0.47 P
P.08 0.35 0.87 0.69 P
P.09 0.35 0.75 0.76 P
N.01 0.40 0.78 0.59 N
N.02 0.55 0.95 0.76 N
N.03 0.65 0.96 0.83 N
N.04 0.60 0.90 0.77 N
N.05 0.50 0.95 0.82 N
Now that the data contains a non-numerical variable I cannot simply execute the boxplot() function as before. What would be a minimal alteration to make here to obtain the six plots that I want? (colour coding for the two groups would be nice)
I have encountered some solutions to a grouped boxplot, however the data from which others start tends to be organised differently than my (very simple) one.
Many thanks!
As #teunbrand already mentioned in the comments you could use pivot_longer to make your data in a longer format by Gruppe. You could use fill to make for each variable two boxplot in total 6 like this:
library(tidyr)
library(dplyr)
library(ggplot2)
data$Gruppe <- c(rep("P",9), rep("N",5))
data %>%
pivot_longer(cols = -Gruppe) %>%
ggplot(aes(x = name, y = value, fill = Gruppe)) +
geom_boxplot()
Created on 2023-01-10 with reprex v2.0.2
Data used:
data <- read.table(text = " Wörter Sätze Text
P.01 0.15 0.24 0.34
P.02 0.10 0.15 0.08
P.03 0.05 0.18 0.16
P.04 0.55 0.60 0.44
P.05 0.00 0.06 0.26
P.06 0.20 0.65 0.68
P.07 0.15 0.31 0.47
P.08 0.35 0.87 0.69
P.09 0.35 0.75 0.76
N.01 0.40 0.78 0.59
N.02 0.55 0.95 0.76
N.03 0.65 0.96 0.83
N.04 0.60 0.90 0.77
N.05 0.50 0.95 0.82", header = TRUE)

Getting a null score vector using princomp funtion in R

I'm trying to do PCA analysis on some data. I'm not given the raw data, just the correlation matrix in this way:
Tmax Tmin P H PT V Vmax
Tmax 1.00 0.70 -0.08 -0.41 -0.09 -0.23 -0.08
Tmin 0.70 1.00 -0.30 0.07 0.14 -0.03 -0.01
P -0.08 -0.30 1.00 -0.18 -0.13 -0.29 -0.25
H -0.41 0.07 -0.18 1.00 0.32 -0.15 -0.19
PT -0.09 0.14 -0.13 0.32 1.00 0.11 0.07
V -0.23 -0.03 -0.29 -0.15 0.11 1.00 0.83
Vmax -0.08 -0.01 -0.25 -0.19 0.07 0.83 1.00
For this I'm trying to use the princomp() function since it has the covmat option so I can introduce data as a correlation matrix. For the pca analysis I'm using the following code:
pca_prim <- princomp(covmat=Primavera, cor = T, scores = TRUE)
I need the scores in order to plot a biplot in following steps but the scores vector I get is null:
biplot(pca_prim)
Error in biplot.princomp(pca_prim) : object 'pca_prim' has no scores
pca_prim$scores
NULL
I can't seem to find what the problem is in order to get the scores. Any suggestions?

`psych::alpha`- detailed interpretation of the output

I am aware that Cronbach's alpha has been extensively discussed here and elsewhere, but I cannot find a detailed interpretation of the output table.
psych::alpha(questionaire)
Reliability analysis
Call: psych::alpha(x = diagnostic_test)
raw_alpha std.alpha G6(smc) average_r S/N ase mean sd median_r
0.69 0.73 1 0.14 2.7 0.026 0.6 0.18 0.12
lower alpha upper 95% confidence boundaries
0.64 0.69 0.74
Reliability if an item is dropped:
raw_alpha std.alpha G6(smc) average_r S/N alpha se var.r med.r
Score1 0.69 0.73 0.86 0.14 2.7 0.027 0.0136 0.12
Score2 0.68 0.73 0.87 0.14 2.7 0.027 0.0136 0.12
Score3 0.69 0.73 0.87 0.14 2.7 0.027 0.0136 0.12
Score4 0.67 0.72 0.86 0.14 2.5 0.028 0.0136 0.11
Score5 0.68 0.73 0.87 0.14 2.7 0.027 0.0134 0.12
Score6 0.69 0.73 0.91 0.15 2.7 0.027 0.0138 0.12
Score7 0.69 0.73 0.85 0.15 2.7 0.027 0.0135 0.12
Score8 0.68 0.72 0.86 0.14 2.6 0.028 0.0138 0.12
Score9 0.68 0.73 0.92 0.14 2.7 0.027 0.0141 0.12
Score10 0.68 0.72 0.90 0.14 2.6 0.027 0.0137 0.12
Score11 0.67 0.72 0.86 0.14 2.5 0.028 0.0134 0.11
Score12 0.67 0.71 0.87 0.13 2.5 0.029 0.0135 0.11
Score13 0.67 0.72 0.86 0.14 2.6 0.028 0.0138 0.11
Score14 0.68 0.72 0.86 0.14 2.6 0.028 0.0138 0.11
Score15 0.67 0.72 0.86 0.14 2.5 0.028 0.0134 0.11
Score16 0.68 0.72 0.88 0.14 2.6 0.028 0.0135 0.12
score 0.65 0.65 0.66 0.10 1.8 0.030 0.0041 0.11
Item statistics
n raw.r std.r r.cor r.drop mean sd
Score1 286 0.36 0.35 0.35 0.21 0.43 0.50
Score2 286 0.37 0.36 0.36 0.23 0.71 0.45
Score3 286 0.34 0.34 0.34 0.20 0.73 0.44
Score4 286 0.46 0.46 0.46 0.33 0.35 0.48
Score5 286 0.36 0.36 0.36 0.23 0.73 0.44
Score6 286 0.29 0.32 0.32 0.18 0.87 0.34
Score7 286 0.33 0.32 0.32 0.18 0.52 0.50
Score8 286 0.42 0.41 0.41 0.28 0.36 0.48
Score9 286 0.32 0.36 0.36 0.22 0.90 0.31
Score10 286 0.37 0.40 0.40 0.26 0.83 0.37
Score11 286 0.48 0.47 0.47 0.34 0.65 0.48
Score12 286 0.49 0.49 0.49 0.37 0.71 0.46
Score13 286 0.46 0.44 0.44 0.31 0.44 0.50
Score14 286 0.44 0.43 0.43 0.30 0.43 0.50
Score15 286 0.48 0.47 0.47 0.35 0.61 0.49
Score16 286 0.39 0.39 0.39 0.26 0.25 0.43
score 286 1.00 1.00 1.00 1.00 0.60 0.18
Warning messages:
1: In cor.smooth(r) : Matrix was not positive definite, smoothing was done
2: In cor.smooth(R) : Matrix was not positive definite, smoothing was done
3: In cor.smooth(R) : Matrix was not positive definite, smoothing was done
as far as I know, r.cor stand for the total-item correlation, or biserial correlation. I have seen that this is usually interpreted together with the corresponding p-value.
1. What is the exact interpretation of r.cor and r.drop?
2. How can the p-value be calculated ?
1. Although this is more of a question for Crossvalidated, here is the detailed explanation of ‘Item statistics’ section:
raw.r: correlation between the item and the total score from the scale (i.e., item-total correlations); there is a problem with raw.r, that is, the item itself is included in the total—this means we’re correlating the item with itself, so of course it will correlate (r.cor and r.drop solve this problem; see ?alpha for details)
r.drop: item-total correlation without that item itself (i.e., item-rest correlation or corrected item-total correlation); low item-total correlations indicate that that item doesn’t correlate well with the scale overall
r.cor: item-total correlation corrected for item overlap and scale reliability
mean and sd: mean and sd of the scale if that item is dropped
2. You should not use the p-values corresponding to these correlation coefficient to guide your decisions. I would suggest not to bother calculating them.

correlation with 3D matrices from MatLab using R software

I have two 3D (12x12x10) matrices obtained from Functional Connectivity Analysis in CONN Software and in .mat format. Each 3D matrix is composed by 10 individual matrices of 12 regions of interest. One is considering a rest condition and the other a task condition. I want to compare the differences in FC performing the correlation between the two 3D matrices in R, but the I dont how to make R understand that I have a 3D matrix! It mix in a odd 2D matrix. Using the following code:
# Load connectivity matrix
mat<-read.table("R/Matriz/neural", header = FALSE)
View(mat)
r<-corr.test(mat,mat)
And trying to compute a correlation matrix with only 1 values, I got a completely different matrix:
Call:corr.test(x = mat, y = mat)
Correlation matrix
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12
V1 1.00 0.84 0.43 -0.14 0.02 -0.30 -0.20 -0.08 -0.04 -0.20 -0.46 -0.51
V2 0.84 1.00 0.55 -0.03 0.02 -0.23 -0.12 -0.02 -0.04 -0.13 -0.49 -0.50
V3 0.43 0.55 1.00 0.15 0.20 -0.03 0.14 0.35 0.09 -0.08 -0.31 -0.23
V4 -0.14 -0.03 0.15 1.00 0.54 0.45 0.57 0.51 0.23 -0.09 0.20 0.19
V5 0.02 0.02 0.20 0.54 1.00 -0.18 0.04 0.16 0.80 0.12 0.37 0.39
V6 -0.30 -0.23 -0.03 0.45 -0.18 1.00 0.68 0.51 -0.44 -0.31 -0.20 -0.25
V7 -0.20 -0.12 0.14 0.57 0.04 0.68 1.00 0.69 -0.20 -0.11 0.01 0.02
V8 -0.08 -0.02 0.35 0.51 0.16 0.51 0.69 1.00 -0.04 -0.11 -0.13 0.02
V9 -0.04 -0.04 0.09 0.23 0.80 -0.44 -0.20 -0.04 1.00 0.40 0.55 0.60
V10 -0.20 -0.13 -0.08 -0.09 0.12 -0.31 -0.11 -0.11 0.40 1.00 0.45 0.51
V11 -0.46 -0.49 -0.31 0.20 0.37 -0.20 0.01 -0.13 0.55 0.45 1.00 0.87
V12 -0.51 -0.50 -0.23 0.19 0.39 -0.25 0.02 0.02 0.60 0.51 0.87 1.00
I assume you want the correlation between two different matrices (unlike in your code), and treat all the values as independent data. The way you describe your data may hint that perhaps the correlation between these 2 is not the best way to compare, but if you want to compare, one way is unrolling your matrices, as the fact that they are 3D is not relevant to compute the correlation between them. If you make them 3D, then R assumes variables and structure.
try
r<-corr.test(as.vector(mat),as.vector(mat2))

'x' must be numeric ERROR in R while trying to create a Leaf and Stem display

I am a beginner at R and I'm just trying to read a text file that contains values and create a stem display, but I keep getting an error. Here is my code:
setwd("C:/Users/Michael/Desktop/ch1-ch9 data/CH01")
gravity=read.table("C:ex01-11.txt", header=T)
stem(gravity)
**Error in stem(gravity) : 'x' must be numeric**
The File contains this:
'spec_gravity'
0.31
0.35
0.36
0.36
0.37
0.38
0.4
0.4
0.4
0.41
0.41
0.42
0.42
0.42
0.42
0.42
0.43
0.44
0.45
0.46
0.46
0.47
0.48
0.48
0.48
0.51
0.54
0.54
0.55
0.58
0.62
0.66
0.66
0.67
0.68
0.75
If you can help, I would appreciate it! Thanks!
gravity is a data frame. stem expects a vector. You need to select a column of your data set and pass to stem, i.e.
## The first column
stem(gravity[,1])

Resources