I am currently doing a PCA of some data with 35rows and 21 columńs by using the package FactoMineR of R. I'm doing this for my bachelor thesis and I'm studying forestry, so "I have no clue what I'm doing" :).
It works somehow and the interpretation is another chapter, but my Professors, unfortunately also have no clue what they are doing in this statistics kind of thing, so they expect the results in nice little word-sheets, with the data nicely arranged into tables.
The text-output is printed by me with the following methods:
capture.output(mydata)
summary.PCA(mydata)
summary(mydata)
summary.PCA is a tool directly from the package FactoMineR and I use it because capture.output keeps giving me errors when I try and capture PCA("whatever") with it.
But this output is impossible to import into a table unless I do i all by hand, which I cannot accept as a solution (I very much hope so).
Output like the following.. I don't see a way to put this into a table:
Call:
PCA(mydata)
Eigenvalues
Dim.1 Dim.2 Dim.3 Dim.4 Dim.5 Dim.6 Dim.7 Dim.8 Dim.9 Dim.10 Dim.11 Dim.12 Dim.13 Dim.14 Dim.15 Dim.16 Dim.17 Dim.18 Dim.19 Dim.20 Dim.21
Variance 8.539 2.937 1.896 1.644 1.576 1.071 0.738 0.695 0.652 0.463 0.261 0.184 0.136 0.108 0.049 0.021 0.019 0.010 0.000 0.000 0.000
% of var. 40.662 13.984 9.027 7.830 7.505 5.100 3.517 3.311 3.106 2.203 1.242 0.878 0.650 0.513 0.233 0.102 0.093 0.046 0.000 0.000 0.000
Cumulative % of var. 40.662 54.645 63.672 71.502 79.007 84.107 87.624 90.934 94.041 96.244 97.486 98.363 99.013 99.526 99.759 99.862 99.954 100.000 100.000 100.000 100.000
So is there a way to do this? Do I have to transform the data before I can print it into a table?
I hope very much I have expressed myself clearly!
All the best!
Lukas
The summary.PCA function writes all the tables data are available in the outputs.
So you can do:
res <- PCA(mydata)
res$eig ### and you will have the table with the eigenvalues in an object
res$ind$coord ## and you will have the coordinate of the individuals in an object
write.infile(res,file="OutputFile.csv") ## and all the outputs will be written in a csv file
Hope it helps,
Francois
Related
Hi i created a shiny app and i want to display my results from the clogit function in this app.
Bit I dont know where to start and how to display this list. I was thinking about to display the outcome of the console. But I have not found a solution yet
I want to results from this website to be displayed: http://lab.agr.hokudai.ac.jp/spmur/examples_ch3.R
clogit(fm, data = ds)
coef exp(coef) se(coef) z p
ASC 0.027622 1.028007 0.041651 0.663 0.507
Green 0.027622 1.028007 0.058689 0.471 0.638
Brown 0.055239 1.056794 0.059322 0.931 0.352
Normal 0.027618 1.028003 0.058697 0.471 0.638
Muscular -0.027622 0.972756 0.058689 -0.471 0.638
Z -0.001381 0.998620 0.002935 -0.471 0.638
Likelihood ratio test=2.28 on 6 df, p=0.8925
n= 5400, number of events= 2310
i want this result somehow to be displayed in the shiny app.
Can somebody provide me with an hint/ solution
I have one table with four columns Y, T, D, AF.So my intention is to apply so called "proportional adjustment method".
library(dplyr)
df<-data.frame(
Y=c(2017,2016,2015,2014,2013,2012,2011,2010,2009,2008,2007,2006,2005,2004,2003,2002,2001,2000),
T=c(11742,10927,9931,9477,7888,7348,7318,6825,6700,6688,6841,6472,6228,5928,5771,5780,5575,8302),
D=c(0,450,0,1540,0,0,0,-314,0,-1200,0,0,0,0,0,0,-3707,0),
AF=c( 1.000, 1.043 , 1.000 , 1.194 , 1.000 , 1.000 , 1.000 , 0.956 , 1.000 , 0.848 , 1.000 , 1.000 , 1.000 , 1.000 , 1.000 , 1.000 ,0.601 , 1.000 ))
In order to apply proportional adjustment method on this table I should apply formulas like screen shoot from Excel below.
For that reasons I am trying to resolve this issue with, cumproduct function and dplyr package with this code:
df1<-mutate(df,
AT= T*lag(cumprod(AF), k=1, default=1)
)
View(df1)
But I got results in AT column, which very similar with correct results but is not the same.So can anybody help me with this code how to apply formula like screen shoot from Excel, with some function or whatever?
Your code is fine. The difference is due to rounding errors.
I want to check if a series of numbers multiplying a number n are integers. However, when I use the seq function to develop this series and multiply n, then checking if it is a integer sequence, I will find something wrong, such as the following example. Please help me to figure out this question!
x <- seq(from=0.001, to=0.015, by=0.001)
x
[1] 0.001 0.002 0.003 0.004 0.005 0.006 0.007 0.008 0.009 0.010 0.011 0.012 0.013 0.014 0.015
n <- 1000
a = x[9]*n
a
[1] 9
a == 9
[1] FALSE
Floating math operations in R may give surprising results in R, as in your example.
Using your code, you will see that there is a very small difference between the the variable a and 9 (note that the exact value you see may vary):
a-9 # yields 1.776357e-15
You can deal with this by comparing the difference to a very small value:
abs(a-9) < 1e-10 # yields TRUE
You will find the compare library useful
library(compare)
compare(a,9) # yields TRUE
I am trying to understand PCA by finding practical examples online. Sadly most tutorials I have found don't really seem to show simple practical applications of PCA. After a lot of searching, I came across this
http://yatani.jp/HCIstats/PCA
It is a nice simple tutorial. I want to re-create the results in Matlab, but the tutorial is in R. I have been trying to replicate the results in Matlab, but have been so far unsuccessful; I am new to Matlab. I have created the arrays as follows:
Price = [6,7,6,5,7,6,5,6,3,1,2,5,2,3,1,2];
Software = [5,3,4,7,7,4,7,5,5,3,6,7,4,5,6,3];
Aesthetics = [3,2,4,1,5,2,2,4,6,7,6,7,5,6,5,7];
Brand = [4,2,5,3,5,3,1,4,7,5,7,6,6,5,5,7];
Then in his example, he does this
data <- data.frame(Price, Software, Aesthetics, Brand)
I did a quick search online, and this apparently converts vectors into a data table in R code. So in Matlab I did this
dataTable(:,1) = Price;
dataTable(:,2) = Software;
dataTable(:,3) = Aesthetics;
dataTable(:,4) = Brand;
Now it is the next part I am unsure of.
pca <- princomp(data, cor=TRUE)
summary(pca, loadings=TRUE)
I have tried using Matlab's PCA function
[COEFF SCORE LATENT] = princomp(dataTable)
But my results do not match the ones shown in the tutorial at all. My results are
COEFF =
-0.5958 0.3786 0.7065 -0.0511
-0.1085 0.8343 -0.5402 -0.0210
0.6053 0.2675 0.3179 -0.6789
0.5166 0.2985 0.3287 0.7321
SCORE =
-2.3362 0.0276 0.6113 0.4237
-4.3534 -2.1268 1.4228 -0.3707
-1.1057 -0.2406 1.7981 0.4979
-3.6847 0.4840 -2.1400 1.0586
-1.4218 2.9083 1.2020 -0.2952
-3.3495 -1.3726 0.5049 0.3916
-4.1126 0.1546 -2.4795 -1.0846
-1.7309 0.2951 0.9293 -0.2552
2.8169 0.5898 0.4318 0.7366
3.7976 -2.1655 -0.2402 -1.2622
3.3041 1.0454 -0.8148 0.7667
1.4969 2.9845 0.7537 -0.8187
2.3993 -1.1891 -0.3811 0.7556
1.7836 -0.0072 -0.2255 -0.7276
2.2613 -0.1977 -2.4966 0.0326
4.2350 -1.1899 1.1236 0.1509
LATENT =
9.3241
2.2117
1.8727
0.5124
Yet the results in the tutorial are
Importance of components:
Comp.1 Comp.2 Comp.3 Comp.4
Standard deviation 1.5589391 0.9804092 0.6816673 0.37925777
Proportion of Variance 0.6075727 0.2403006 0.1161676 0.03595911
Cumulative Proportion 0.6075727 0.8478733 0.9640409 1.00000000
Loadings:
Comp.1 Comp.2 Comp.3 Comp.4
Price -0.523 0.848
Software -0.177 0.977 -0.120
Aesthetics 0.597 0.134 0.295 -0.734
Brand 0.583 0.167 0.423 0.674
Could anyone please explain why my results differ so much from the tutorial. Am I using the wrong Matlab function?
Also if you are able to provide any other nice simple practical applications of PCA, would be very beneficial. Still trying to get my head around all the concepts in PCA and I like examples where I can code it and see the results myself, so I can play about with it, I find it is easier when to learn this way
Any help would be much appreciated!!
Edit: The issue is purely the scaling.
R code:
summary(princomp(data, cor = FALSE), loadings=T, cutoff = 0.01)
Loadings:
Comp.1 Comp.2 Comp.3 Comp.4
Price -0.596 -0.379 0.706 -0.051
Software -0.109 -0.834 -0.540 -0.021
Aesthetics 0.605 -0.268 0.318 -0.679
Brand 0.517 -0.298 0.329 0.732
According to the Matlab help you should use this if you want scaling:
Matlab code:
princomp(zscore(X))
Old answer (a red herring):
From help(princomp) (in R):
The calculation is done using eigen on the correlation or covariance
matrix, as determined by cor. This is done for compatibility with the
S-PLUS result. A preferred method of calculation is to use svd on x,
as is done in prcomp.
Note that the default calculation uses divisor N for the covariance
matrix.
In the documentation of the R function prcomp (help(prcomp)) you can read:
The calculation is done by a singular value decomposition of the
(centered and possibly scaled) data matrix, not by using eigen on the
covariance matrix. This is generally the preferred method for
numerical accuracy. [...] Unlike princomp, variances are computed with
the usual divisor N - 1.
The Matlab function apparently uses the svd algorithm. If I use prcom (without scaling, i.e., not based on correlations) with the example data I get:
> prcomp(data)
Standard deviations:
[1] 3.0535362 1.4871803 1.3684570 0.7158006
Rotation:
PC1 PC2 PC3 PC4
Price -0.5957661 0.3786184 -0.7064672 0.05113761
Software -0.1085472 0.8342628 0.5401678 0.02101742
Aesthetics 0.6053008 0.2675111 -0.3179391 0.67894297
Brand 0.5166152 0.2984819 -0.3286908 -0.73210631
This is (appart from the irrelevant signs) identical to the Matlab output.
I am trying to get more than 2 decimal places from model summary output when I use nnet package. I read other threads regarding this and none of those solutions seem to work for me. I tried:
options(digits=10)
summary(model)
b->h1 i1->h1 i2->h1 i3->h1 i4->h1 i5->h1
0.94 -2.67 0.83 -1.06 -2.51 -0.69
b->o1 h1->o1
1.14 -3.41
b->o2 h1->o2
-0.62 3.92
I also tried:
summary(model,digits=10)
b->h1 i1->h1 i2->h1 i3->h1 i4->h1 i5->h1
0.94 -2.67 0.83 -1.06 -2.51 -0.69
b->o1 h1->o1
1.14 -3.41
b->o2 h1->o2
-0.62 3.92
None of those solutions work for me. I have to use caputure.output after summary output If i output the entire model or use coefnames I can get more than 2 decimal places but that is not going to help me if I use caputre.output.
It's likely that the print method for the object returned by summary is where the two decimal places are coming from. As a first attempt, try
print(summary(model),digits=10) ## or whatever other number of digits
If that doesn't work, try the kind of investigation that was done in this answer:
How to make decimal digits of chisq.test four numbers ?
Just use
summary(model)$wts
This will give you the weights with maximum decimal points.
If you want to have other values, e.g. residuals or so, see the manual, I attach a screenshot of the relevant part:
Just write summary(model) then $ and then e.g. wts to get the weights or e.g. residuals to get the residuals