Proportional adjustment method in R - r

I have one table with four columns Y, T, D, AF.So my intention is to apply so called "proportional adjustment method".
library(dplyr)
df<-data.frame(
Y=c(2017,2016,2015,2014,2013,2012,2011,2010,2009,2008,2007,2006,2005,2004,2003,2002,2001,2000),
T=c(11742,10927,9931,9477,7888,7348,7318,6825,6700,6688,6841,6472,6228,5928,5771,5780,5575,8302),
D=c(0,450,0,1540,0,0,0,-314,0,-1200,0,0,0,0,0,0,-3707,0),
AF=c( 1.000, 1.043 , 1.000 , 1.194 , 1.000 , 1.000 , 1.000 , 0.956 , 1.000 , 0.848 , 1.000 , 1.000 , 1.000 , 1.000 , 1.000 , 1.000 ,0.601 , 1.000 ))
In order to apply proportional adjustment method on this table I should apply formulas like screen shoot from Excel below.
For that reasons I am trying to resolve this issue with, cumproduct function and dplyr package with this code:
df1<-mutate(df,
AT= T*lag(cumprod(AF), k=1, default=1)
)
View(df1)
But I got results in AT column, which very similar with correct results but is not the same.So can anybody help me with this code how to apply formula like screen shoot from Excel, with some function or whatever?

Your code is fine. The difference is due to rounding errors.

Related

R lowpass filter using Signal package

I am new to R and having trouble fitting a Lowpass filter to my data. I am measuring Force exerted on a treadmil over a period of 30 seconds with a sample rate of 250/s or 250Hz.
The data contains negative force values as seen in this image
This is due to ripples in the signal or background noise. I need to be able to filter out any force signal <0, and for this I have used the Butter function within the Signal package:
ritLowPass = function(s, frqCutOff, bPlot = F )
{
f = butter( 4, frqCutOff/(smpRate/2), "low" ); # lowpass filter
s.lp = rev( filter( f, rev( filter( f, s ))) );
if( bPlot ) {
idx=(1*smpRate):(4*smpRate);
plot( x=idx/smpRate, y=s[idx], xlab="time/s", ylab="signal", ty="l" );
lines( x=idx/smpRate, y=s.lp[idx], col="red", lwd=2)
}
return(data.frame(s.lp));
}
VT_filter <- ritLowPass(guest$Fz, 250, bPlot)
sample data:
Time Fz
0 3.769
0.004 -32.94
0.008 -117.305
0.012 -142.329
0.016 -55.35
0.02 -27.362
0.024 29.039
0.028 73.718
0.032 76.633
0.036 4.482
0.04 -80.949
0.044 -114.279
0.048 -102.968
0.052 -9.76
0.056 35.405
0.06 152.541
0.064 79.249
0.068 50.147
0.072 22.547
0.076 47.757
0.08 -29.123
0.084 57.384
0.088 88.715
0.092 195.115
0.096 118.752
0.1 183.22
0.104 157.957
0.108 37.992
0.112 -7.893
When I run the code I get the following error:
VT_filter <- ritLowPass(guest$Fz, 250, bPlot)
Error in butter.default(4, frqCutOff/(smpRate/2), "low") :
butter: critical frequencies must be in (0 1)
Called from: butter.default(4, frqCutOff/(smpRate/2), "low")
I wonder if I should be using HighPass instead or is there another option for attenuating any force signal lower than zero?
Preamble
I'm not sure I can see anything in the data that suggests your "culprit" frequency is 250 Hz, or that you should cut frequencies above this value.
If you're trying to remove signal noise at a specific frequency, you'll need to find the noise frequency first. spectrum is your friend.
However, assuming you actually want to filter frequencies above 250 Hz:
Short Answer
If you want to filter frequencies above 250 Hz, your sampling frequency needs to be at least 500 Hz.
Long Answer
Your filter can only filter between frequencies of 0 and the Nyquist frequency, i.e. 0 to (Sampling Frequency)/2. This is a hard limit of information theory, not an implementation issue.
You're asking it to filter something that is twice the Nyquist Frequency.
help(butter) gives the following about the W parameter:
W: critical frequencies of the filter. ... For digital filters, W must be between 0 and 1 where 1 is the Nyquist frequency.
The cutoff value you are trying to assign to the filter is (250)/(250/2) = 2. The function is telling you this is outside its capabilities (or the capabilities of any digital filter).
From the question it looks like you did not bother to read the whole output of ?butter manual. The frequencies for pretty much all filter design functions in the package are only used relative to the Nyquist frequency so whenever a function asks you for a frequency f_1 you are expected to provide it with f_1/(f_sample/2) and the result is expected to be between 0 and 1 because your signal is expected not to have nonrecoverable distortions. They don't tell you the simple equation exactly and they have some mistakes in the manuals (like the formula for the bilinear transform function) but you are of course expected to have some general and basic knowledge about the topic before you attempt to use the package so it is not a big deal.
Also if the only thing that makes you worry for whatever reason is the negative signal values then why even bother trying to filter it? Here I use the definition of the word "filtering" found in DSP-related books of course which is probably not what you mean in the question. You can just do something like guest$Fz[guest$Fz<0]=0. It is generally a better idea than using NAs or removing the samples completely because the missing values and therefore irregular sampling creates global signal artifacts and it is much worse than local high frequency spikes from just replacing a single sample value with another. Then you can use some data smoothing method to make your signal look nicer if you feel a need to do so.
In fact my guess is that this is some purely educational test signal and you probably really need to filter the signal with a simple low pass filter and the single cutoff frequency required is well below the 250Hz Fs and the negative values are not a problem by themselves but rather they are an indication of really bad or nonexistent filtering but who knows...

Export summary.PCA (package FactoMineR) into a table, with R-cran

I am currently doing a PCA of some data with 35rows and 21 columńs by using the package FactoMineR of R. I'm doing this for my bachelor thesis and I'm studying forestry, so "I have no clue what I'm doing" :).
It works somehow and the interpretation is another chapter, but my Professors, unfortunately also have no clue what they are doing in this statistics kind of thing, so they expect the results in nice little word-sheets, with the data nicely arranged into tables.
The text-output is printed by me with the following methods:
capture.output(mydata)
summary.PCA(mydata)
summary(mydata)
summary.PCA is a tool directly from the package FactoMineR and I use it because capture.output keeps giving me errors when I try and capture PCA("whatever") with it.
But this output is impossible to import into a table unless I do i all by hand, which I cannot accept as a solution (I very much hope so).
Output like the following.. I don't see a way to put this into a table:
Call:
PCA(mydata)
Eigenvalues
Dim.1 Dim.2 Dim.3 Dim.4 Dim.5 Dim.6 Dim.7 Dim.8 Dim.9 Dim.10 Dim.11 Dim.12 Dim.13 Dim.14 Dim.15 Dim.16 Dim.17 Dim.18 Dim.19 Dim.20 Dim.21
Variance 8.539 2.937 1.896 1.644 1.576 1.071 0.738 0.695 0.652 0.463 0.261 0.184 0.136 0.108 0.049 0.021 0.019 0.010 0.000 0.000 0.000
% of var. 40.662 13.984 9.027 7.830 7.505 5.100 3.517 3.311 3.106 2.203 1.242 0.878 0.650 0.513 0.233 0.102 0.093 0.046 0.000 0.000 0.000
Cumulative % of var. 40.662 54.645 63.672 71.502 79.007 84.107 87.624 90.934 94.041 96.244 97.486 98.363 99.013 99.526 99.759 99.862 99.954 100.000 100.000 100.000 100.000
So is there a way to do this? Do I have to transform the data before I can print it into a table?
I hope very much I have expressed myself clearly!
All the best!
Lukas
The summary.PCA function writes all the tables data are available in the outputs.
So you can do:
res <- PCA(mydata)
res$eig ### and you will have the table with the eigenvalues in an object
res$ind$coord ## and you will have the coordinate of the individuals in an object
write.infile(res,file="OutputFile.csv") ## and all the outputs will be written in a csv file
Hope it helps,
Francois

Principal Component Analysis Tutorial - Convert R code to Matlab issues

I am trying to understand PCA by finding practical examples online. Sadly most tutorials I have found don't really seem to show simple practical applications of PCA. After a lot of searching, I came across this
http://yatani.jp/HCIstats/PCA
It is a nice simple tutorial. I want to re-create the results in Matlab, but the tutorial is in R. I have been trying to replicate the results in Matlab, but have been so far unsuccessful; I am new to Matlab. I have created the arrays as follows:
Price = [6,7,6,5,7,6,5,6,3,1,2,5,2,3,1,2];
Software = [5,3,4,7,7,4,7,5,5,3,6,7,4,5,6,3];
Aesthetics = [3,2,4,1,5,2,2,4,6,7,6,7,5,6,5,7];
Brand = [4,2,5,3,5,3,1,4,7,5,7,6,6,5,5,7];
Then in his example, he does this
data <- data.frame(Price, Software, Aesthetics, Brand)
I did a quick search online, and this apparently converts vectors into a data table in R code. So in Matlab I did this
dataTable(:,1) = Price;
dataTable(:,2) = Software;
dataTable(:,3) = Aesthetics;
dataTable(:,4) = Brand;
Now it is the next part I am unsure of.
pca <- princomp(data, cor=TRUE)
summary(pca, loadings=TRUE)
I have tried using Matlab's PCA function
[COEFF SCORE LATENT] = princomp(dataTable)
But my results do not match the ones shown in the tutorial at all. My results are
COEFF =
-0.5958 0.3786 0.7065 -0.0511
-0.1085 0.8343 -0.5402 -0.0210
0.6053 0.2675 0.3179 -0.6789
0.5166 0.2985 0.3287 0.7321
SCORE =
-2.3362 0.0276 0.6113 0.4237
-4.3534 -2.1268 1.4228 -0.3707
-1.1057 -0.2406 1.7981 0.4979
-3.6847 0.4840 -2.1400 1.0586
-1.4218 2.9083 1.2020 -0.2952
-3.3495 -1.3726 0.5049 0.3916
-4.1126 0.1546 -2.4795 -1.0846
-1.7309 0.2951 0.9293 -0.2552
2.8169 0.5898 0.4318 0.7366
3.7976 -2.1655 -0.2402 -1.2622
3.3041 1.0454 -0.8148 0.7667
1.4969 2.9845 0.7537 -0.8187
2.3993 -1.1891 -0.3811 0.7556
1.7836 -0.0072 -0.2255 -0.7276
2.2613 -0.1977 -2.4966 0.0326
4.2350 -1.1899 1.1236 0.1509
LATENT =
9.3241
2.2117
1.8727
0.5124
Yet the results in the tutorial are
Importance of components:
Comp.1 Comp.2 Comp.3 Comp.4
Standard deviation 1.5589391 0.9804092 0.6816673 0.37925777
Proportion of Variance 0.6075727 0.2403006 0.1161676 0.03595911
Cumulative Proportion 0.6075727 0.8478733 0.9640409 1.00000000
Loadings:
Comp.1 Comp.2 Comp.3 Comp.4
Price -0.523 0.848
Software -0.177 0.977 -0.120
Aesthetics 0.597 0.134 0.295 -0.734
Brand 0.583 0.167 0.423 0.674
Could anyone please explain why my results differ so much from the tutorial. Am I using the wrong Matlab function?
Also if you are able to provide any other nice simple practical applications of PCA, would be very beneficial. Still trying to get my head around all the concepts in PCA and I like examples where I can code it and see the results myself, so I can play about with it, I find it is easier when to learn this way
Any help would be much appreciated!!
Edit: The issue is purely the scaling.
R code:
summary(princomp(data, cor = FALSE), loadings=T, cutoff = 0.01)
Loadings:
Comp.1 Comp.2 Comp.3 Comp.4
Price -0.596 -0.379 0.706 -0.051
Software -0.109 -0.834 -0.540 -0.021
Aesthetics 0.605 -0.268 0.318 -0.679
Brand 0.517 -0.298 0.329 0.732
According to the Matlab help you should use this if you want scaling:
Matlab code:
princomp(zscore(X))
Old answer (a red herring):
From help(princomp) (in R):
The calculation is done using eigen on the correlation or covariance
matrix, as determined by cor. This is done for compatibility with the
S-PLUS result. A preferred method of calculation is to use svd on x,
as is done in prcomp.
Note that the default calculation uses divisor N for the covariance
matrix.
In the documentation of the R function prcomp (help(prcomp)) you can read:
The calculation is done by a singular value decomposition of the
(centered and possibly scaled) data matrix, not by using eigen on the
covariance matrix. This is generally the preferred method for
numerical accuracy. [...] Unlike princomp, variances are computed with
the usual divisor N - 1.
The Matlab function apparently uses the svd algorithm. If I use prcom (without scaling, i.e., not based on correlations) with the example data I get:
> prcomp(data)
Standard deviations:
[1] 3.0535362 1.4871803 1.3684570 0.7158006
Rotation:
PC1 PC2 PC3 PC4
Price -0.5957661 0.3786184 -0.7064672 0.05113761
Software -0.1085472 0.8342628 0.5401678 0.02101742
Aesthetics 0.6053008 0.2675111 -0.3179391 0.67894297
Brand 0.5166152 0.2984819 -0.3286908 -0.73210631
This is (appart from the irrelevant signs) identical to the Matlab output.

Create a table for N, Min/Max, SD, Mean, and Median in R

I'm very new to R, so please bear with me on this basic question.
I have a dataset, DATA, that I created using the data.table package. I created 200 random numbers between 0 and 1, then did that 10000 times, finally creating a data table for with descriptive statistics for each iteration. My code for it looked like this:
rndm<-runif(200, min=0, max=1)
reps <- data.table(x=runif(200*10000),iter=rep(1:200,each=10000))
DATA <- reps[,list(mean=mean(rndm),median=median(rndm),sd=sd(rndm),min=min(rndm),
max=max(rndm)),by=iter]
The data looks something like this:
Mean Median SD Min Max
1 0.521 0.499 0.287 0.010 0.998
2 0.511 0.502 0.290 0.009 0.996
. ... ...
etc.
What I want to do is create a table that finds N, mean, median, standard deviation, minimum, and maximum of the accumulated sample means (not of each column like above). I need the output to look something like this:
N Mean Median SD Min Max
10000 .502 .499 .280 .002 .999
How can I accomplish this?
You could also define a function. This approach allows you to make the same table for a different variable.
summaryfun <- function(x)list(N=length(x),Mean=mean(x),Median=median(x),SD=sd(x),Min=min(x),Max=max(x))
DATA[,summaryfun(mean)]
At the moment, you're calculating functions in the list separately for every item different iter. But if you want the aggregate stats, just remove the by clause, and your functions will run once, over the whole of the dataset. Then add an item to give N - making use of the .N variable provided by data.table.
DATA <- reps[, list(N=.N, mean=mean(rndm), median=median(rndm),
sd=sd(rndm), min=min(rndm), max=max(rndm))]

Decimal places in Summary(model) output in R

I am trying to get more than 2 decimal places from model summary output when I use nnet package. I read other threads regarding this and none of those solutions seem to work for me. I tried:
options(digits=10)
summary(model)
b->h1 i1->h1 i2->h1 i3->h1 i4->h1 i5->h1
0.94 -2.67 0.83 -1.06 -2.51 -0.69
b->o1 h1->o1
1.14 -3.41
b->o2 h1->o2
-0.62 3.92
I also tried:
summary(model,digits=10)
b->h1 i1->h1 i2->h1 i3->h1 i4->h1 i5->h1
0.94 -2.67 0.83 -1.06 -2.51 -0.69
b->o1 h1->o1
1.14 -3.41
b->o2 h1->o2
-0.62 3.92
None of those solutions work for me. I have to use caputure.output after summary output If i output the entire model or use coefnames I can get more than 2 decimal places but that is not going to help me if I use caputre.output.
It's likely that the print method for the object returned by summary is where the two decimal places are coming from. As a first attempt, try
print(summary(model),digits=10) ## or whatever other number of digits
If that doesn't work, try the kind of investigation that was done in this answer:
How to make decimal digits of chisq.test four numbers ?
Just use
summary(model)$wts
This will give you the weights with maximum decimal points.
If you want to have other values, e.g. residuals or so, see the manual, I attach a screenshot of the relevant part:
Just write summary(model) then $ and then e.g. wts to get the weights or e.g. residuals to get the residuals

Resources