Mysterious source of output in R? - r

I am using following code using mtcars data and factanal function for factor analysis. The print of fit$loadings give the proportional variance but it does not seem to be there in str(fit$loadings) :
> fit <- factanal(mtcars, 3, rotation="varimax")
> fit$loadings
Loadings:
Factor1 Factor2 Factor3
mpg 0.643 -0.478 -0.473
cyl -0.618 0.703 0.261
disp -0.719 0.537 0.323
hp -0.291 0.725 0.513
drat 0.804 -0.241
wt -0.778 0.248 0.524
qsec -0.177 -0.946 -0.151
vs 0.295 -0.805 -0.204
am 0.880
gear 0.908 0.224
carb 0.114 0.559 0.719
Factor1 Factor2 Factor3
SS loadings 4.380 3.520 1.578
Proportion Var 0.398 0.320 0.143 <<<<<<<<<<<<< I NEED THESE NUMBERS AS A VECTOR
Cumulative Var 0.398 0.718 0.862
>
> str(fit$loadings)
loadings [1:11, 1:3] 0.643 -0.618 -0.719 -0.291 0.804 ...
- attr(*, "dimnames")=List of 2
..$ : chr [1:11] "mpg" "cyl" "disp" "hp" ...
..$ : chr [1:3] "Factor1" "Factor2" "Factor3"
How can I get Proportional variance vector from fit$loadings? Thanks for your help.

Let obj <- fit$loadings. Here is a complete path how to obtain the result.
By writing fit$loadings (or obj) we actually call print(obj). So, after looking at str, you might want to check what does the specific print method do with obj. To know what method we should look for, we check class(obj) and get "loadings".
Then, writing print.loadings does not give anything because the function is hidden. Therefore, since function factanal is in the package stats, we call stats:::print.loadings and get a complete source code of the function. By inspecting it, we see that we can get the desired result as follows.
colSums(obj^2) / nrow(obj)
# Factor1 Factor2 Factor3
# 0.3982190 0.3199652 0.1434125

Related

Problems with full_join in R. no applicable method to "character"

I am new to R world I am struggling with full_join function. I am pretty sure the problem is easy. I got it working on other situations I assume they were the same as the present one. Anyhow, probably someone can help me. Let's go:
I have several datasets within a big list:
NDVI2003 <- ls(pattern = "x2003_meanNDVI_m.*$")
PixelQa2003 <- ls(pattern = "x2003_meanPixelQa_m.*$")
full_list <- do.call(c, list(NDVI2003,PixelQa2003))
The first 2 functions are just grabbing some files from a folder. This files look like:
> str(x2003_meanNDVI_m1)
'data.frame': 354 obs. of 5 variables:
$ date : chr "2001-12-03" "2001-12-10" "2001-12-19" "2001-12-26" ...
$ 2003_NDVI_1: num 0.441 0.518 0.322 0.311 0.499 0.319 0.163 0.134 0.452 0.536 ...
$ 2003_NDVI_2: num 0.377 0.446 0.075 0.1 0.006 0.279 0.368 0.135 0.423 0.522 ...
$ 2003_NDVI_3: num 0.332 0.397 0.07 0.093 0.006 0.236 0.469 0.127 0.411 0.535 ...
$ 2003_NDVI_4: num 0.653 0.621 0.536 0.064 0.652 0.576 0.52 0.158 0.666 0.663 ...
The 3rd function is simply getting together all these files:
> head(full_list,20)
[1] "x2003_meanNDVI_m1" "x2003_meanNDVI_m2" "x2003_meanNDVI_m3" "x2003_meanNDVI_m4" "x2003_meanNDVI_m5"
[6] "x2003_meanNDVI_m6" "x2003_meanPixelQa_m1" "x2003_meanPixelQa_m2" "x2003_meanPixelQa_m3" "x2003_meanPixelQa_m4"
[11] "x2003_meanPixelQa_m5" "x2003_meanPixelQa_m6"
So far, very simple. Now it comes to the problem... I want to join all these files by the column 'date'. This very same procedure is working on other scripts I built:
data2003 <- reduce(full_list, full_join, by="date")
But I keep getting an error:
> data2003 <- reduce(full_list, full_join, by="date")
Error in UseMethod("full_join") :
no applicable method for 'full_join' applied to an object of class "character"
So far, what I have tried:
Changing the column type from character, to date, to number... Nothing.
Altering the order of dplyr and plyr packages when opening R.
Changing variable names and so on.
full_lst <- list(NDVI2003,PixelQa2003) instead of full_list <- do.call(c, list(NDVI2003,PixelQa2003))
-Adding full_list <- mget(full_list)
Google for hours lookin for an answer...
Any help will be really welcome.

How to keep only one of higly correlated values from a matrix?

I created my correlation matrix on this website (under tab LDmatrix) with 246 SNPs pasted bellow
https://ldlink.nci.nih.gov/?tab=ldmatrix and loaded it:
calc.rho=read.table("ro246_matrix.txt")
calc.rho=data.matrix(calc.rho)
What I want to do is to extract from this matrix only pairs where correlation is bellow 0.8.
I can do that via:
keeprows<-apply(calc.rho,1,function(x) return(sum(x>0.8)<3))
ro246.lt.8<-calc.rho[keeprows,keeprows]
ro246.lt.8[ro246.lt.8 == 1] <- NA
(mmax <- max(abs(ro246.lt.8), na.rm=TRUE))
[1] 0.566
The problem here is that I am getting only 17 SNPs out of 246 which means that I removed every pair of highly correlated SNPs, where actually I should keep one of them. In this example:
rs3764410 and rs56192520 have correlation of 0.976 so I would keep only one of them at random, say rs56192520.
How to do this, please advise.
> calc.rho[30:40,30:40]
rs4313843 rs8069610 rs883504 rs8072394 rs4280293 rs4465638
rs4313843 1.000 0.642 0.975 0.642 0.642 0.925
rs8069610 0.642 1.000 0.659 1.000 1.000 0.589
rs883504 0.975 0.659 1.000 0.659 0.659 0.901
rs8072394 0.642 1.000 0.659 1.000 1.000 0.589
rs4280293 0.642 1.000 0.659 1.000 1.000 0.589
rs4465638 0.925 0.589 0.901 0.589 0.589 1.000
rs12602378 0.326 0.519 0.335 0.519 0.519 0.344
rs9899059 0.326 0.519 0.335 0.519 0.519 0.344
rs6502530 0.333 0.530 0.342 0.530 0.530 0.351
rs4380085 0.950 0.605 0.926 0.605 0.605 0.975
rs6502532 0.276 0.439 0.283 0.439 0.439 0.292
rs12602378 rs9899059 rs6502530 rs4380085 rs6502532
rs4313843 0.326 0.326 0.333 0.950 0.276
rs8069610 0.519 0.519 0.530 0.605 0.439
rs883504 0.335 0.335 0.342 0.926 0.283
rs8072394 0.519 0.519 0.530 0.605 0.439
rs4280293 0.519 0.519 0.530 0.605 0.439
rs4465638 0.344 0.344 0.351 0.975 0.292
rs12602378 1.000 1.000 0.980 0.353 0.813
rs9899059 1.000 1.000 0.980 0.353 0.813
rs6502530 0.980 0.980 1.000 0.360 0.833
rs4380085 0.353 0.353 0.360 1.000 0.300
rs6502532 0.813 0.813 0.833 0.300 1.000
Or this as a more direct reproducible example:
calc.rho<-matrix(c(0.903,0.268,0.327,0.327,0.327,0.582,
0.928,0.276,0.336,0.336,0.336,0.598,
0.975,0.309,0.371,0.371,0.371,0.638,
0.975,0.309,0.371,0.371,0.371,0.638,
0.975,0.309,0.371,0.371,0.371,0.638,
0.975,0.309,0.371,0.371,0.371,0.638),ncol=6,byrow=TRUE)
rnames<-c("rs56192520","rs3764410","rs145984817","rs1807401",
"rs1807402","rs35350506")
rownames(calc.rho)<-rnames
cnames<-c("rs9900318","rs8069906","rs9908521","rs9908336",
"rs9908870","rs9895995")
colnames(calc.rho)<-cnames
All 246 SNPs:
rs56192520
rs3764410
rs145984817
rs1807401
rs1807402
rs35350506
rs2089177
rs12325677
rs62064624
rs62064631
rs2349295
rs2174369
rs7218554
rs62064634
rs4360974
rs4527060
rs6502526
rs6502527
rs9900318
rs8069906
rs9908521
rs9908336
rs9908870
rs9895995
rs7211086
rs9905280
rs8073305
rs8072086
rs4312350
rs4313843
rs8069610
rs883504
rs8072394
rs4280293
rs4465638
rs12602378
rs9899059
rs6502530
rs4380085
rs6502532
rs4792798
rs4792799
rs4316813
rs148563931
rs74751226
rs8068857
rs8069441
rs77397878
rs75339756
rs4608391
rs79569548
rs4275914
rs11870422
rs8075751
rs11658904
rs138437542
rs80344434
rs7222311
rs7221842
rs7223686
rs78013597
rs74965036
rs78063986
rs118106233
rs117345712
rs113004656
rs9898995
rs4985718
rs9893911
rs79110942
rs7208929
rs12601453
rs4078062
rs75129280
rs76664572
rs78961289
rs146364798
rs76715413
rs4078534
rs79457460
rs74369938
rs76423171
rs74668400
rs75146120
rs1135237
rs9914671
rs117759512
rs4985696
rs16961340
rs17794159
rs4247118
rs78572469
rs12601193
rs2349646
rs2090018
rs12601424
rs4985701
rs8064550
rs2271521
rs2271520
rs11078374
rs4985702
rs1124961
rs11652674
rs3924340
rs112450164
rs7208973
rs9910857
rs78574480
rs8072184
rs12602196
rs6502563
rs3744135
rs148779543
rs77689691
rs41319048
rs117340532
rs78647096
rs77712968
rs16961396
rs80054920
rs7206981
rs4985740
rs3803762
rs77103270
rs7207485
rs77342773
rs3826304
rs3744126
rs7210879
rs7211576
rs117967362
rs75978745
rs6502564
rs9894565
rs36079048
rs8076621
rs7218795
rs3803761
rs12602675
rs7208065
rs4985705
rs8080386
rs8065832
rs2018781
rs1736221
rs1736220
rs1736217
rs1708620
rs1708619
rs1736216
rs76319098
rs1736215
rs1736214
rs1708617
rs12602831
rs12602871
rs1736213
rs1736212
rs76045368
rs34518797
rs11078378
rs8079562
rs8065774
rs8066090
rs41337846
rs1736209
rs1736208
rs12949822
rs76246042
rs12600635
rs55689224
rs1736207
rs1708626
rs1736206
rs9896078
rs16961474
rs1708627
rs1736205
rs1708628
rs7220577
rs2294155
rs1736204
rs1736203
rs1736202
rs12937908
rs1736200
rs1708623
rs1708624
rs9894884
rs9901894
rs9903294
rs2472689
rs1630656
rs111478970
rs3182911
rs7219012
rs9890657
rs12453455
rs12947291
rs150267386
rs16961493
rs11652745
rs9907107
rs8070574
rs4985759
rs3866959
rs7219248
rs6502568
rs7220275
rs12450037
rs7225876
rs9892352
rs4985760
rs6502569
rs1029830
rs2012954
rs1029832
rs2270180
rs8072402
rs7221553
rs145597919
rs150772017
rs2041393
rs6502578
rs11078382
rs9912109
rs12601631
rs11869054
rs11869079
rs9912599
rs7220057
rs9896970
rs34121330
rs34668117
rs67773570
rs242252
rs955893
rs28583584
rs9944423
rs7217764
rs11651957
rs73978990
rs8071007
rs56044345
rs17804843
Your reproducible example is not a true correlation matrix, so I prefere to use the dataset mtcars available in R amd make a correlation matrix from there. But you should be able to apply it to your dataset
data(mtcars)
my_data <- mtcars[,c(1,3:7)]
res <- cor(my_data)
res <-data.frame(res)
# Adding rownames as new column to be able to change their format later
res$rnames = rownames(res)
Here is the output of res:
> res
mpg disp hp drat wt qsec rnames
mpg 1.0000000 -0.8475514 -0.7761684 0.68117191 -0.8676594 0.41868403 mpg
disp -0.8475514 1.0000000 0.7909486 -0.71021393 0.8879799 -0.43369788 disp
hp -0.7761684 0.7909486 1.0000000 -0.44875912 0.6587479 -0.70822339 hp
drat 0.6811719 -0.7102139 -0.4487591 1.00000000 -0.7124406 0.09120476 drat
wt -0.8676594 0.8879799 0.6587479 -0.71244065 1.0000000 -0.17471588 wt
qsec 0.4186840 -0.4336979 -0.7082234 0.09120476 -0.1747159 1.00000000 qsec
Now, using tidyverse, we can reshape the data.frame:
library(tidyverse)
res2 = res %>%
pivot_longer(-rnames,names_to = "col",values_to = "Corr")
Now, it looks like:
> head(res2)
# A tibble: 6 x 3
rnames col Corr
<chr> <chr> <dbl>
1 mpg mpg 1
2 mpg disp -0.848
3 mpg hp -0.776
4 mpg drat 0.681
5 mpg wt -0.868
6 mpg qsec 0.419
Then, you can create a new list object that will contain both rnames and col ordered.
res4 <- unlist(res2 %>% rowwise() %>% do(i = sort(c(.$rnames,.$col))))
# Here is the output of res4
> head(res4)
i1 i2 i3 i4 i5 i6
"mpg" "mpg" "disp" "mpg" "hp" "mpg
And we used res4 to create a new column called Comparaison in res2 that will be the fusion of rnames and col ordered. We will used this new column to filter rows (using distinct) that have the same name and thus are the same comparison. Finally, we apply a filter to keep only values superior to 0.5 (but you can do 0.8 if you want) and remove values that are equal to 1 (self-comparison)
res2 %>%
mutate(Comparaison = paste0(res4[seq(1,length(res4),by = 2)],res4[seq(2,length(res4),by = 2)])) %>%
distinct(Comparaison, .keep_all = T) %>%
filter(Corr >0.5 & Corr !=1)
Here is the final output
# A tibble: 4 x 4
rnames col Corr Comparaison
<chr> <chr> <dbl> <chr>
1 mpg drat 0.681 dratmpg
2 disp hp 0.791 disphp
3 disp wt 0.888 dispwt
4 hp wt 0.659 hpwt
Maybe there is easier way to get the same output but at least this one should works for your data.

R effects package Error: non-conformable arguments

I am using effects R package and effect function on a cox model. There is a default method for this function so it somehow should work for any model.
When I try to use this function I get this error:
Any idea how to fix this and what is wrong?
> eff_cf <- effect("TP53:MDM2", model)
Error in mod.matrix %*% mod$coefficients[!is.na(coef(mod))] :
non-conformable arguments
My model looks like this:
> model
Call:
coxph(formula = Surv(times, patient.vital_status) ~ TP53 + MDM2 +
TP53:MDM2, data = clinForPlot)
coef exp(coef) se(coef) z p
TP53Other -0.163 0.850 0.217 -0.752 4.5e-01
TP53WILD -1.086 0.337 0.277 -3.928 8.6e-05
MDM2(1183.7,1674.7] -0.669 0.512 0.235 -2.851 4.4e-03
MDM2(1674.7,2248.5] -0.744 0.475 0.305 -2.444 1.5e-02
MDM2(2248.5,50339] -0.867 0.420 0.375 -2.308 2.1e-02
TP53Other:MDM2(1183.7,1674.7] 0.394 1.483 0.412 0.958 3.4e-01
TP53WILD:MDM2(1183.7,1674.7] 0.133 1.142 0.413 0.323 7.5e-01
TP53Other:MDM2(1674.7,2248.5] -0.192 0.825 0.517 -0.372 7.1e-01
TP53WILD:MDM2(1674.7,2248.5] 0.546 1.726 0.433 1.260 2.1e-01
TP53Other:MDM2(2248.5,50339] -0.140 0.869 0.650 -0.215 8.3e-01
TP53WILD:MDM2(2248.5,50339] 0.786 2.195 0.484 1.623 1.0e-01
Likelihood ratio test=72.8 on 11 df, p=3.54e-11 n= 1321, number of events= 258
And the model and the data.frame used for model can be reproduced using this code
library(archivist)
model <- loadFromGitub("68eeefba87be70364eb3801cec58eb3d",
user = "MarcinKosinski",
repo = "Museum",
value = TRUE)
clinForPlot <- loadFromGitub("cfa5145e6b98964d5f8b760bf749e426",
user = "MarcinKosinski",
repo = "Museum",
value = TRUE)
Any idea how to fix this and what is wrong?

How to set the level above which to display factor loadings from factanal() in R?

I was performing factor analysis with data state.x77, which is in R by default. After running the analysis, I inspected the factor loadings.
> output = factanal(state.x77, factors=3, rotation="promax")
> ld = output$loadings
> ld
Loadings:
Factor1 Factor2 Factor3
Population 0.161 0.239 -0.316
Income -0.149 0.681
Illiteracy 0.446 -0.284 -0.393
Life Exp -0.924 0.172 -0.221
Murder 0.917 0.103 -0.129
HS Grad -0.414 0.731
Frost 0.107 1.046
Area 0.387 0.585 0.101
Factor1 Factor2 Factor3
SS loadings 2.274 1.519 1.424
Proportion Var 0.284 0.190 0.178
Cumulative Var 0.284 0.474 0.652
It looks like that by default R is blocking all values less than 0.1. I was wondering if there is a way to set this blocking level by hand, say 0.3 instead of 0.1?
try this:
print(output$loadings, cutoff = 0.3)
see ?print.loadings for the details.

Spearman correlation loop in R

A previous post explained how to do a Chi-squared loop in R on all your data-pairs: Chi Square Analysis using for loop in R.
I wanted to use this code to do the same thing for a Spearman correlation.
I've already tried altering a few of the variables and I was able to calculate the pearson correlation variables using this code:
library(plyr)
combos <- combn(ncol(fullngodata),2)
adply(combos, 2, function(x) {
test <- cor.test(fullngodata[, x[1]], fullngodata[, x[2]])
out <- data.frame("Row" = colnames(fullngodata)[x[1]]
, "Column" = colnames(fullngodata[x[2]])
, "cor" = round(test$statistic,3)
, "df"= test$parameter
, "p.value" = round(test$p.value, 3)
)
return(out)
})
But since I work with data on an ordinal scale, I need to use the Spearman correlation.
I thought I could get this data by just adding the method="spearman" command but this does not seem to work. If I use the code:
library(plyr)
combos <- combn(ncol(fullngodata),2)
adply(combos, 2, function(x) {
test <- cor.test(fullngodata[, x[1]], fullngodata[, x[2]], method="spearman")
out <- data.frame("Row" = colnames(fullngodata)[x[1]]
, "Column" = colnames(fullngodata[x[2]])
, "Chi.Square" = round(test$statistic,3)
, "df"= test$parameter
, "p.value" = round(test$p.value, 3)
)
return(out)
})
I get the response:
Error in data.frame(Row = colnames(fullngodata)[x[1]], Column =
colnames(fullngodata[x[2]]), :
arguments imply differing number of rows: 1, 0
In addition: Warning message:
In cor.test.default(fullngodata[, x[1]], fullngodata[, x[2]], method = "spearman") :
Cannot compute exact p-values with ties
what am I doing wrong?
Try rcor.test function in ltm package.
mat <- matrix(rnorm(1000), 100, 10, dimnames = list(NULL, LETTERS[1:10]))
rcor.test(mat, method = "spearman")
A B C D E F G H I J
A ***** -0.035 0.072 0.238 -0.097 0.007 -0.010 -0.031 0.039 -0.090
B 0.726 ***** -0.042 -0.166 0.005 0.025 0.007 -0.231 0.005 0.006
C 0.473 0.679 ***** 0.046 0.074 -0.020 0.091 -0.183 -0.040 -0.084
D 0.017 0.098 0.647 ***** -0.060 -0.151 -0.175 -0.068 0.039 0.181
E 0.338 0.960 0.466 0.553 ***** 0.254 0.055 -0.031 0.072 -0.059
F 0.948 0.805 0.843 0.133 0.011 ***** -0.014 -0.121 0.153 0.048
G 0.923 0.941 0.370 0.081 0.588 0.892 ***** -0.060 -0.050 0.011
H 0.759 0.021 0.069 0.501 0.756 0.230 0.555 ***** -0.053 -0.193
I 0.700 0.963 0.690 0.701 0.476 0.130 0.621 0.597 ***** -0.034
J 0.373 0.955 0.406 0.072 0.561 0.633 0.910 0.055 0.736 *****
upper diagonal part contains correlation coefficient estimates
lower diagonal part contains corresponding p-values
The problem is that cor.test returns a value NULL for parameter when you do the spearman test. From ?cor.test: parameter: the degrees of freedom of the test statistic in the case that it follows a t distribution.
You can see this in the following example:
x <- c(44.4, 45.9, 41.9, 53.3, 44.7, 44.1, 50.7, 45.2, 60.1)
y <- c( 2.6, 3.1, 2.5, 5.0, 3.6, 4.0, 5.2, 2.8, 3.8)
str(cor.test(x, y, method = "spearman"))
List of 8
$ statistic : Named num 48
..- attr(*, "names")= chr "S"
$ parameter : NULL
$ p.value : num 0.0968
$ estimate : Named num 0.6
..- attr(*, "names")= chr "rho"
$ null.value : Named num 0
..- attr(*, "names")= chr "rho"
$ alternative: chr "two.sided"
$ method : chr "Spearman's rank correlation rho"
$ data.name : chr "x and y"
- attr(*, "class")= chr "htest"
Solution: if you remove the following line from your code, it should work:
, "df"= test$parameter

Resources