correlation with 3D matrices from MatLab using R software - r

I have two 3D (12x12x10) matrices obtained from Functional Connectivity Analysis in CONN Software and in .mat format. Each 3D matrix is composed by 10 individual matrices of 12 regions of interest. One is considering a rest condition and the other a task condition. I want to compare the differences in FC performing the correlation between the two 3D matrices in R, but the I dont how to make R understand that I have a 3D matrix! It mix in a odd 2D matrix. Using the following code:
# Load connectivity matrix
mat<-read.table("R/Matriz/neural", header = FALSE)
View(mat)
r<-corr.test(mat,mat)
And trying to compute a correlation matrix with only 1 values, I got a completely different matrix:
Call:corr.test(x = mat, y = mat)
Correlation matrix
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12
V1 1.00 0.84 0.43 -0.14 0.02 -0.30 -0.20 -0.08 -0.04 -0.20 -0.46 -0.51
V2 0.84 1.00 0.55 -0.03 0.02 -0.23 -0.12 -0.02 -0.04 -0.13 -0.49 -0.50
V3 0.43 0.55 1.00 0.15 0.20 -0.03 0.14 0.35 0.09 -0.08 -0.31 -0.23
V4 -0.14 -0.03 0.15 1.00 0.54 0.45 0.57 0.51 0.23 -0.09 0.20 0.19
V5 0.02 0.02 0.20 0.54 1.00 -0.18 0.04 0.16 0.80 0.12 0.37 0.39
V6 -0.30 -0.23 -0.03 0.45 -0.18 1.00 0.68 0.51 -0.44 -0.31 -0.20 -0.25
V7 -0.20 -0.12 0.14 0.57 0.04 0.68 1.00 0.69 -0.20 -0.11 0.01 0.02
V8 -0.08 -0.02 0.35 0.51 0.16 0.51 0.69 1.00 -0.04 -0.11 -0.13 0.02
V9 -0.04 -0.04 0.09 0.23 0.80 -0.44 -0.20 -0.04 1.00 0.40 0.55 0.60
V10 -0.20 -0.13 -0.08 -0.09 0.12 -0.31 -0.11 -0.11 0.40 1.00 0.45 0.51
V11 -0.46 -0.49 -0.31 0.20 0.37 -0.20 0.01 -0.13 0.55 0.45 1.00 0.87
V12 -0.51 -0.50 -0.23 0.19 0.39 -0.25 0.02 0.02 0.60 0.51 0.87 1.00

I assume you want the correlation between two different matrices (unlike in your code), and treat all the values as independent data. The way you describe your data may hint that perhaps the correlation between these 2 is not the best way to compare, but if you want to compare, one way is unrolling your matrices, as the fact that they are 3D is not relevant to compute the correlation between them. If you make them 3D, then R assumes variables and structure.
try
r<-corr.test(as.vector(mat),as.vector(mat2))

Related

Getting a null score vector using princomp funtion in R

I'm trying to do PCA analysis on some data. I'm not given the raw data, just the correlation matrix in this way:
Tmax Tmin P H PT V Vmax
Tmax 1.00 0.70 -0.08 -0.41 -0.09 -0.23 -0.08
Tmin 0.70 1.00 -0.30 0.07 0.14 -0.03 -0.01
P -0.08 -0.30 1.00 -0.18 -0.13 -0.29 -0.25
H -0.41 0.07 -0.18 1.00 0.32 -0.15 -0.19
PT -0.09 0.14 -0.13 0.32 1.00 0.11 0.07
V -0.23 -0.03 -0.29 -0.15 0.11 1.00 0.83
Vmax -0.08 -0.01 -0.25 -0.19 0.07 0.83 1.00
For this I'm trying to use the princomp() function since it has the covmat option so I can introduce data as a correlation matrix. For the pca analysis I'm using the following code:
pca_prim <- princomp(covmat=Primavera, cor = T, scores = TRUE)
I need the scores in order to plot a biplot in following steps but the scores vector I get is null:
biplot(pca_prim)
Error in biplot.princomp(pca_prim) : object 'pca_prim' has no scores
pca_prim$scores
NULL
I can't seem to find what the problem is in order to get the scores. Any suggestions?

Create data frame from EFA output in R

I am working on EFA and would like to customize my tables. There is a function, psych.print to suppress factor loadings of a certain value to make the table easier to read. When I run this function, it produces this data and the summary stats in the console (in an .RMD document, it produces console text and a separate data frame of the factor loadings with loadings suppressed). However, if I attempt to save this as an object, it does not keep this data.
Here is an example:
library(psych)
bfi_data=bfi
bfi_data=bfi_data[complete.cases(bfi_data),]
bfi_cor <- cor(bfi_data)
factors_data <- fa(r = bfi_cor, nfactors = 6)
print.psych(fa_ml_oblimin_2, cut=.32, sort="TRUE")
In an R script, it produces this:
item MR2 MR3 MR1 MR5 MR4 MR6 h2 u2 com
N2 17 0.83 0.654 0.35 1.0
N1 16 0.82 0.666 0.33 1.1
N3 18 0.69 0.549 0.45 1.1
N5 20 0.47 0.376 0.62 2.2
N4 19 0.44 0.43 0.506 0.49 2.4
C4 9 -0.67 0.555 0.45 1.3
C2 7 0.66 0.475 0.53 1.4
C5 10 -0.56 0.433 0.57 1.4
C3 8 0.56 0.317 0.68 1.1
C1 6 0.54 0.344 0.66 1.3
In R Markdown, it produces this:
How can I save that data.frame as an object?
Looking at the str of the object it doesn't look that what you want is built-in. An ugly way would be to use capture.output and try to convert the character vector to dataframe using string manipulation. Else since the data is being displayed it means that the data is present somewhere in the object itself. I could find out vectors of same length which can be combined to form the dataframe.
loadings <- unclass(factors_data$loadings)
h2 <- factors_data$communalities
#There is also factors_data$communality which has same values
u2 <- factors_data$uniquenesses
com <- factors_data$complexity
data <- cbind(loadings, h2, u2, com)
data
This returns :
# MR2 MR3 MR1 MR5 MR4 MR6 h2 u2 com
#A1 0.11 0.07 -0.07 -0.56 -0.01 0.35 0.38 0.62 1.85
#A2 0.03 0.09 -0.08 0.64 0.01 -0.06 0.47 0.53 1.09
#A3 -0.04 0.04 -0.10 0.60 0.07 0.16 0.51 0.49 1.26
#A4 -0.07 0.19 -0.07 0.41 -0.13 0.13 0.29 0.71 2.05
#A5 -0.17 0.01 -0.16 0.47 0.10 0.22 0.47 0.53 2.11
#C1 0.05 0.54 0.08 -0.02 0.19 0.05 0.34 0.66 1.32
#C2 0.09 0.66 0.17 0.06 0.08 0.16 0.47 0.53 1.36
#C3 0.00 0.56 0.07 0.07 -0.04 0.05 0.32 0.68 1.09
#C4 0.07 -0.67 0.10 -0.01 0.02 0.25 0.55 0.45 1.35
#C5 0.15 -0.56 0.17 0.02 0.10 0.01 0.43 0.57 1.41
#E1 -0.14 0.09 0.61 -0.14 -0.08 0.09 0.41 0.59 1.34
#E2 0.06 -0.03 0.68 -0.07 -0.08 -0.01 0.56 0.44 1.07
#E3 0.02 0.01 -0.32 0.17 0.38 0.28 0.51 0.49 3.28
#E4 -0.07 0.03 -0.49 0.25 0.00 0.31 0.56 0.44 2.26
#E5 0.16 0.27 -0.39 0.07 0.24 0.04 0.41 0.59 3.01
#N1 0.82 -0.01 -0.09 -0.09 -0.03 0.02 0.67 0.33 1.05
#N2 0.83 0.02 -0.07 -0.07 0.01 -0.07 0.65 0.35 1.04
#N3 0.69 -0.03 0.13 0.09 0.02 0.06 0.55 0.45 1.12
#N4 0.44 -0.14 0.43 0.09 0.10 0.01 0.51 0.49 2.41
#N5 0.47 -0.01 0.21 0.21 -0.17 0.09 0.38 0.62 2.23
#O1 -0.05 0.07 -0.01 -0.04 0.57 0.09 0.36 0.64 1.11
#O2 0.12 -0.09 0.01 0.12 -0.43 0.28 0.30 0.70 2.20
#O3 0.01 0.00 -0.10 0.05 0.65 0.04 0.48 0.52 1.06
#O4 0.10 -0.05 0.34 0.15 0.37 -0.04 0.24 0.76 2.55
#O5 0.04 -0.04 -0.02 -0.01 -0.50 0.30 0.33 0.67 1.67
#gender 0.20 0.09 -0.12 0.33 -0.21 -0.15 0.18 0.82 3.58
#education -0.03 0.01 0.05 0.11 0.12 -0.22 0.07 0.93 2.17
#age -0.06 0.07 -0.02 0.16 0.03 -0.26 0.10 0.90 2.05
Ronak Shaw answered my question above, and I used his answer to help create the following function, which nearly reproduces the psych.print data.frame of fa.sort output
fa_table <- function(x, cut) {
#get sorted loadings
loadings <- fa.sort(fa_ml_oblimin)$loadings %>% round(3)
#cut loadings
loadings[loadings < cut] <- ""
#get additional info
add_info <- cbind(x$communalities,
x$uniquenesses,
x$complexity) %>%
as.data.frame() %>%
rename("commonality" = V1,
"uniqueness" = V2,
"complexity" = V3) %>%
rownames_to_column("item")
#build table
loadings %>%
unclass() %>%
as.data.frame() %>%
rownames_to_column("item") %>%
left_join(add_info) %>%
mutate(across(where(is.numeric), round, 3))
}

R. Add column to df where rows have names of element from list

I have a list of all files (dataframes) within a directory:
library("plyr")
library("dplyr")
library("broom")
library("tidyr")
snp_list <- list.files(pattern="*.txt", all.files = T,full.names = F)
I also have a dataframe A obtained through the following function:
pv1= lapply(snp_list, function(x) tidy(lm(PV ~ GT*SEX + M + GT*N,read.table(x,header=TRUE)))) %>%
bind_rows()
Dataframe A has 7 rows ((Intercept), GT, SEX, M, N, GT:SEX, GT:N) for each element in list snp_list. In this toy example the list has 3 elements (rs1406947.txt rs25904.txt rs7133579.txt), but in reality there are 1,200,000 elements
A:
term estimate st.error statistic p.value
(Intercept) 7.68 0.17 44.64 0
GT 0.01 0.01 0.07 0.19
SEX 1.52 0.14 10.87 0.1
M 0.12 0.29 0.41 0.67
N -0.06 0.12 -0.48 0.63
GT:SEX -0.03 0.08 -0.44 0.65
GT:N -0.00 0.06 -0.08 0.93
(Intercept) 9.23 0.20 34.64 0
GT 0.05 0.04 0.12 0.22
SEX 1.67 0.76 10.34 0.1
M 0.14 0.39 0.51 0.55
N -0.08 0.05 -0.46 0.55
GT:SEX -0.19 0.11 -0.34 0.44
GT:N -0.22 0.33 -0.44 0.55
(Intercept) 7.99 0.66 44.44 0
GT 0.01 0.3 0.04 0.33
SEX 1.22 0.22 10.44 0.15
M 0.88 0.22 0.33 0.44
N -0.5 0.5 -0.5 0.6
GT:SEX -0.06 0.09 -0.74 0.35
GT:N -0.00 0.03 -0.04 0.78
I want to add a new column "SNP" to A, where each row has the name of the element the rows belongs to (nrows = 7*1,200,000). I would get this:
term estimate st.error statistic p.value SNP
(Intercept) 7.68 0.17 44.64 0 rs1406947
GT 0.01 0.01 0.07 0.19 rs1406947
SEX 1.52 0.14 10.87 0.1 rs1406947
M 0.12 0.29 0.41 0.67 rs1406947
N -0.06 0.12 -0.48 0.63 rs1406947
GT:SEX -0.03 0.08 -0.44 0.65 rs1406947
GT:N -0.00 0.06 -0.08 0.93 rs1406947
(Intercept) 9.23 0.20 34.64 0 rs25904
GT 0.05 0.04 0.12 0.22 rs25904
SEX 1.67 0.76 10.34 0.1 rs25904
M 0.14 0.39 0.51 0.55 rs25904
N -0.08 0.05 -0.46 0.55 rs25904
GT:SEX -0.19 0.11 -0.34 0.44 rs25904
GT:N -0.22 0.33 -0.44 0.55 rs25904
(Intercept) 7.99 0.66 44.44 0 rs7133579
GT 0.01 0.3 0.04 0.33 rs7133579
SEX 1.22 0.22 10.44 0.15 rs7133579
M 0.88 0.22 0.33 0.44 rs7133579
N -0.5 0.5 -0.5 0.6 rs7133579
GT:SEX -0.06 0.09 -0.74 0.35 rs7133579
GT:N -0.00 0.03 -0.04 0.78 rs7133579
Here's how to do what you asked:
A$SNP=rep(0,nrow(A))
for (i in 1:nrow(A)){
A$SNP[i]=snp_list[(i%/%8)+1]
}
Using integer division, you can generate an index for 7 elements to map to each element in snp_list.

`psych::alpha`- detailed interpretation of the output

I am aware that Cronbach's alpha has been extensively discussed here and elsewhere, but I cannot find a detailed interpretation of the output table.
psych::alpha(questionaire)
Reliability analysis
Call: psych::alpha(x = diagnostic_test)
raw_alpha std.alpha G6(smc) average_r S/N ase mean sd median_r
0.69 0.73 1 0.14 2.7 0.026 0.6 0.18 0.12
lower alpha upper 95% confidence boundaries
0.64 0.69 0.74
Reliability if an item is dropped:
raw_alpha std.alpha G6(smc) average_r S/N alpha se var.r med.r
Score1 0.69 0.73 0.86 0.14 2.7 0.027 0.0136 0.12
Score2 0.68 0.73 0.87 0.14 2.7 0.027 0.0136 0.12
Score3 0.69 0.73 0.87 0.14 2.7 0.027 0.0136 0.12
Score4 0.67 0.72 0.86 0.14 2.5 0.028 0.0136 0.11
Score5 0.68 0.73 0.87 0.14 2.7 0.027 0.0134 0.12
Score6 0.69 0.73 0.91 0.15 2.7 0.027 0.0138 0.12
Score7 0.69 0.73 0.85 0.15 2.7 0.027 0.0135 0.12
Score8 0.68 0.72 0.86 0.14 2.6 0.028 0.0138 0.12
Score9 0.68 0.73 0.92 0.14 2.7 0.027 0.0141 0.12
Score10 0.68 0.72 0.90 0.14 2.6 0.027 0.0137 0.12
Score11 0.67 0.72 0.86 0.14 2.5 0.028 0.0134 0.11
Score12 0.67 0.71 0.87 0.13 2.5 0.029 0.0135 0.11
Score13 0.67 0.72 0.86 0.14 2.6 0.028 0.0138 0.11
Score14 0.68 0.72 0.86 0.14 2.6 0.028 0.0138 0.11
Score15 0.67 0.72 0.86 0.14 2.5 0.028 0.0134 0.11
Score16 0.68 0.72 0.88 0.14 2.6 0.028 0.0135 0.12
score 0.65 0.65 0.66 0.10 1.8 0.030 0.0041 0.11
Item statistics
n raw.r std.r r.cor r.drop mean sd
Score1 286 0.36 0.35 0.35 0.21 0.43 0.50
Score2 286 0.37 0.36 0.36 0.23 0.71 0.45
Score3 286 0.34 0.34 0.34 0.20 0.73 0.44
Score4 286 0.46 0.46 0.46 0.33 0.35 0.48
Score5 286 0.36 0.36 0.36 0.23 0.73 0.44
Score6 286 0.29 0.32 0.32 0.18 0.87 0.34
Score7 286 0.33 0.32 0.32 0.18 0.52 0.50
Score8 286 0.42 0.41 0.41 0.28 0.36 0.48
Score9 286 0.32 0.36 0.36 0.22 0.90 0.31
Score10 286 0.37 0.40 0.40 0.26 0.83 0.37
Score11 286 0.48 0.47 0.47 0.34 0.65 0.48
Score12 286 0.49 0.49 0.49 0.37 0.71 0.46
Score13 286 0.46 0.44 0.44 0.31 0.44 0.50
Score14 286 0.44 0.43 0.43 0.30 0.43 0.50
Score15 286 0.48 0.47 0.47 0.35 0.61 0.49
Score16 286 0.39 0.39 0.39 0.26 0.25 0.43
score 286 1.00 1.00 1.00 1.00 0.60 0.18
Warning messages:
1: In cor.smooth(r) : Matrix was not positive definite, smoothing was done
2: In cor.smooth(R) : Matrix was not positive definite, smoothing was done
3: In cor.smooth(R) : Matrix was not positive definite, smoothing was done
as far as I know, r.cor stand for the total-item correlation, or biserial correlation. I have seen that this is usually interpreted together with the corresponding p-value.
1. What is the exact interpretation of r.cor and r.drop?
2. How can the p-value be calculated ?
1. Although this is more of a question for Crossvalidated, here is the detailed explanation of ‘Item statistics’ section:
raw.r: correlation between the item and the total score from the scale (i.e., item-total correlations); there is a problem with raw.r, that is, the item itself is included in the total—this means we’re correlating the item with itself, so of course it will correlate (r.cor and r.drop solve this problem; see ?alpha for details)
r.drop: item-total correlation without that item itself (i.e., item-rest correlation or corrected item-total correlation); low item-total correlations indicate that that item doesn’t correlate well with the scale overall
r.cor: item-total correlation corrected for item overlap and scale reliability
mean and sd: mean and sd of the scale if that item is dropped
2. You should not use the p-values corresponding to these correlation coefficient to guide your decisions. I would suggest not to bother calculating them.

How to subset a time series in R

In particular, I'd like to subset the temperature measurements from 1960 onwards in the time series gtemp in the package astsa:
require(astsa)
gtemp
Time Series:
Start = 1880
End = 2009
Frequency = 1
[1] -0.28 -0.21 -0.26 -0.27 -0.32 -0.32 -0.29 -0.36 -0.27 -0.17 -0.39 -0.27 -0.32
[14] -0.33 -0.33 -0.25 -0.14 -0.11 -0.25 -0.15 -0.07 -0.14 -0.24 -0.30 -0.34 -0.24
[27] -0.19 -0.39 -0.33 -0.35 -0.33 -0.34 -0.32 -0.30 -0.15 -0.10 -0.30 -0.39 -0.33
[40] -0.20 -0.19 -0.14 -0.26 -0.22 -0.22 -0.17 -0.02 -0.15 -0.12 -0.26 -0.08 -0.02
[53] -0.08 -0.19 -0.07 -0.12 -0.05 0.07 0.10 0.01 0.04 0.10 0.03 0.09 0.19
[66] 0.06 -0.05 0.00 -0.04 -0.07 -0.16 -0.04 0.03 0.11 -0.10 -0.10 -0.17 0.08
[79] 0.08 0.06 -0.01 0.07 0.04 0.08 -0.21 -0.11 -0.03 -0.01 -0.04 0.08 0.03
[92] -0.10 0.00 0.14 -0.08 -0.05 -0.16 0.12 0.01 0.08 0.18 0.26 0.04 0.26
[105] 0.09 0.05 0.12 0.26 0.31 0.19 0.37 0.35 0.12 0.13 0.23 0.37 0.29
[118] 0.39 0.56 0.32 0.33 0.48 0.56 0.55 0.48 0.62 0.54 0.57 0.43 0.57
The individual time points are not labeled in years, so although I can do gtemp[3] [1] -0.26, I can't do gtemp[as.date(1960)], for instance to get the value in 1960.
How can I bring out the correspondence between year and measurements, so as to later subset values?
We can make use of the window function
gtemp1 <- window(gtemp, start = 1960)
gtemp1
#Time Series:
#Start = 1960
#End = 2009
#Frequency = 1
#[1] -0.01 0.07 0.04 0.08 -0.21 -0.11 -0.03 -0.01 -0.04 0.08 0.03
#[12]-0.10 0.00 0.14 -0.08 -0.05 -0.16 0.12 0.01 0.08 0.18 0.26
#[23] 0.04 0.26 0.09 0.05 0.12 0.26 0.31 0.19 0.37 0.35 0.12
#[34] 0.13 0.23 0.37 0.29 0.39 0.56 0.32 0.33 0.48 0.56 0.55
#[45] 0.48 0.62 0.54 0.57 0.43 0.57
Function time can also help to answer your question
How can I bring out the correspondence between year and measurements, so as to later subset values?
head(time(gtemp))
[1] 1880 1881 1882 1883 1884 1885
If you want the value that corresponds to 1961, you can write
gtemp[time(gtemp) == 1961]
[1] 0.07
As mentioned in the first answer, you can also use the function window
window(gtemp, start = 1961, end = 1961)
Time Series:
Start = 1961
End = 1961
Frequency = 1
[1] 0.07
that returns the result as one point time series. You can convert it into a number by
as.numeric(window(gtemp, start = 1961, end = 1961))
[1] 0.07

Resources