Calculate autocorrelation with lag u in R - r

Hi I tried calculating autocorrelation with lag u, u = 1...9
I expect 9x1 autocorrelation functions. However when I try to use this code it always gave me 10x1 autocorrelation function with the first term = 1. I am not sure how to proceed.
# initialize a vector to store autocovariance
maxlag <- 9
varstore <- rep(NA,maxlag)
# Calculate Variance
varstore[1] <- sd(as.vector(sample1),na.rm=T)^2
# Estimate autocovariances for all residuals
for (lag in 1:maxlag)
varstore[lag+1] <- mean(sample1[,1:(10-lag)] *
sample1[,(lag+1):10],na.rm=T)
print(round(varstore,3))
# calculate autocorrelations
corrstore <- varstore/varstore[1]
print(corrstore)
And this is what I get:
[1] 1.0000000 0.6578243 0.5670389 0.5292314 0.5090411 0.4743944 0.4841038 0.4756297
[9] 0.4275208 0.4048436

You get a vector of length 10 because of the recycling.
for lag =maxlog ( the last step of your for loop)
varstore[lag+1]
will create a new entry with NA. To see this clearly, try this for example :
v <- NA ## a vector of length 1
v[10] <- 2
v
[1] NA NA NA NA NA NA NA NA NA 2 ## you get a vector of legnth 10!!
That'said , why do you want a vector of length 9? Why not to use the acf function? Here the output of the acf function:
length(acf(1:10)$lag)
[1] 10

Related

How can I get this nested for loop to populate my matrix p.F2.D with something other than NA values?

# using a nested for loop to calculate the probability of detecting a disease given a positive test for four different prevalence rates of said disease and 100 different false positive rates
# p(D|F2)
p.D.F2 <- 0.7
# p(F2) : {0.2, 0.4, 0.6, 0.8}
p.F2 <- seq(0.2, 0.8, by = 0.2)
# P(D|notF2) : 100 linearly spaced intervals ranging from 0.01 to 0.5
p.D.not.F2 <- seq(0.01, 0.5, by = 0.005)
# create an empty matrix to populate
p.F2.D <- matrix(nrow = length(p.F2), ncol = length(p.D.not.F2))
# Loop across values of one parameter
for (i in 1:length(p.F2)) {
# Loop across values of other parameter
for (j in 1:length(p.D.not.F2)) {
# Fill in your calculation of p.F2.D on the right-hand side (using Bayes theorem)
p.F2.D[i,j] <- c(prod(p.D.F2[j], p.F[i])/p.D)
}
}
# this loop as I have it keeps returning a matrix with only the first run through the loop populated and the others are all "NA"
# 0.9130435 NA NA NA NA NA NA NA NA NA
# please let me know what it is that i'm missing to get this to run!!

Most efficient way in base R to do pairwise correlations between thousands of columns in a matrix [duplicate]

I'm new to R, so I apologize if this is a straightforward question, however I've done quite a bit of searching this evening and can't seem to figure it out. I've got a data frame with a whole slew of variables, and what I'd like to do is create a table of the correlations among a subset of these, basically the equivalent of "pwcorr" in Stata, or "correlations" in SPSS. The one key to this is that not only do I want the r, but I also want the significance associated with that value.
Any ideas? This seems like it should be very simple, but I can't seem to figure out a good way.
Bill Venables offers this solution in this answer from the R mailing list to which I've made some slight modifications:
cor.prob <- function(X, dfr = nrow(X) - 2) {
R <- cor(X)
above <- row(R) < col(R)
r2 <- R[above]^2
Fstat <- r2 * dfr / (1 - r2)
R[above] <- 1 - pf(Fstat, 1, dfr)
cor.mat <- t(R)
cor.mat[upper.tri(cor.mat)] <- NA
cor.mat
}
So let's test it out:
set.seed(123)
data <- matrix(rnorm(100), 20, 5)
cor.prob(data)
[,1] [,2] [,3] [,4] [,5]
[1,] 1.0000000 NA NA NA NA
[2,] 0.7005361 1.0000000 NA NA NA
[3,] 0.5990483 0.6816955 1.0000000 NA NA
[4,] 0.6098357 0.3287116 0.5325167 1.0000000 NA
[5,] 0.3364028 0.1121927 0.1329906 0.5962835 1
Does that line up with cor.test?
cor.test(data[,2], data[,3])
Pearson's product-moment correlation
data: data[, 2] and data[, 3]
t = 0.4169, df = 18, p-value = 0.6817
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
-0.3603246 0.5178982
sample estimates:
cor
0.09778865
Seems to work ok.
Here is something that I just made, I stumbled on this post because I was looking for a way to take every pair of variables, and get a tidy nX3 dataframe. Column 1 is a variable, Column 2 is a variable, and Column 3 and 4 are their absolute value and true correlation. Just pass the function a dataframe of numeric and integer values.
pairwiseCor <- function(dataframe){
pairs <- combn(names(dataframe), 2, simplify=FALSE)
df <- data.frame(Vairable1=rep(0,length(pairs)), Variable2=rep(0,length(pairs)),
AbsCor=rep(0,length(pairs)), Cor=rep(0,length(pairs)))
for(i in 1:length(pairs)){
df[i,1] <- pairs[[i]][1]
df[i,2] <- pairs[[i]][2]
df[i,3] <- round(abs(cor(dataframe[,pairs[[i]][1]], dataframe[,pairs[[i]][2]])),4)
df[i,4] <- round(cor(dataframe[,pairs[[i]][1]], dataframe[,pairs[[i]][2]]),4)
}
pairwiseCorDF <- df
pairwiseCorDF <- pairwiseCorDF[order(pairwiseCorDF$AbsCor, decreasing=TRUE),]
row.names(pairwiseCorDF) <- 1:length(pairs)
pairwiseCorDF <<- pairwiseCorDF
pairwiseCorDF
}
This is what the output is:
> head(pairwiseCorDF)
Vairable1 Variable2 AbsCor Cor
1 roll_belt accel_belt_z 0.9920 -0.9920
2 gyros_dumbbell_x gyros_dumbbell_z 0.9839 -0.9839
3 roll_belt total_accel_belt 0.9811 0.9811
4 total_accel_belt accel_belt_z 0.9752 -0.9752
5 pitch_belt accel_belt_x 0.9658 -0.9658
6 gyros_dumbbell_z gyros_forearm_z 0.9491 0.9491
I've found that the R package picante does a nice job dealing with the problem that you have. You can easily pass your dataset to the cor.table function and get a table of correlations and p-values for all of your variables. You can specify Pearson's r or Spearman in the function. See this link for help:
http://www.inside-r.org/packages/cran/picante/docs/cor.table
Also remember to remove any non-numeric columns from your dataset prior to running the function. Here's an example piece of code:
install.packages("picante")
library(picante)
#Insert the name of your dataset in the code below
cor.table(dataset, cor.method="pearson")
You can use the sjt.corr function of the sjPlot-package, which gives you a nicely formatted correlation table, ready for use in your Office application.
Simplest function call is just to pass the data frame:
sjt.corr(df)
See examples here.

automatically try different initial values in optim

I use optim(.) to try to find the best fitting parameters for some function fn(dat, par, out=FALSE) where par must be a vector of two elements and out determines the output format. I use
optim(par=c(1,1), fn, dat=dat)
to identify the best-fitting values of par. Depending on the data in dat, this either works ot throws an error that
function cannot be evaluated at initial parameters
which I understand requires different starting values for optim(.). My problem is that I apply the function to many data sets in parallel and wonder whether I indeed need to try different values by hand or whether there is some way of automatizing this along the lines of
if no error then great
if error try par=c(0.5,1)
if no error then great
if error try par=c(0.5,0.5)
...
You could run a grid search before you start and discard NA parameters. Here is an example.
A test function:
fn <- function(x) {
if (x[1] < 0)
NA
else
prod(x)
}
Now run a grid search.
library("NMOF")
res <- gridSearch(fn,
npar = 2, ## length of x
lower = -1, ## lower bound for x
upper = 3, ## upper bound for x
n = 5) ## number of levels per element in x
## 2 variables with 5, 5 levels: 25 function evaluations required.
The function shows you all the parameter combinations it tried.
res$levels
## [[1]]
## [1] -1 -1
##
## [[2]]
## [1] 0 -1
##
## [[3]]
## [1] 1 -1
##
## ....
And it provides the objective function values associated with these combinations.
res$values
## [1] NA 0 -1 -2 -3 NA 0 0 0 0 NA 0 1 2 3
## [16] NA 0 2 4 6 NA 0 3 6 9
## => many objective functions values are NA
The best (none-NA) solution:
res$minlevels
## [1] 3 -1
## => your starting value for optim:
##
## optim(gridSearch(fn, npar = 2,
## lower = -1, upper = 3, n = 5)$minlevels,
## fn, dat = dat)
Of course, this won't give you a guarantee that at least one none-NAvector is found, but the chances may improve.

comparing a vector to a probability distribution

I have a vector:
r <- runif(10)
r
[1] 0.52324423 0.89110751 0.44616915 0.70163640 0.63741495 0.31263977
[7] 0.73947973 0.83278799 0.04971461 0.01820381
I also have a probability distribution
p <- c(0, cumsum(rep(0.25, 4)))
p
[1] 0.00 0.25 0.50 0.75 1.00
I would like to assign factors to r based on the probability distribution in p.
In other words, I would like my output to be:
r
[1] 3 4 2 3 3 2 3 4 1 1
When I try this, I get a warning:
which( r >= p) -1
[1] 3
Warning message:
In r < p : longer object length is not a multiple of shorter object length
In other words, only the first value in r is compared to p.
How would I go about converting r into a vector of levels that I can then turn into factors?
You can use cut
as.integer(cut(r, breaks=p))

Pairwise Correlation Table

I'm new to R, so I apologize if this is a straightforward question, however I've done quite a bit of searching this evening and can't seem to figure it out. I've got a data frame with a whole slew of variables, and what I'd like to do is create a table of the correlations among a subset of these, basically the equivalent of "pwcorr" in Stata, or "correlations" in SPSS. The one key to this is that not only do I want the r, but I also want the significance associated with that value.
Any ideas? This seems like it should be very simple, but I can't seem to figure out a good way.
Bill Venables offers this solution in this answer from the R mailing list to which I've made some slight modifications:
cor.prob <- function(X, dfr = nrow(X) - 2) {
R <- cor(X)
above <- row(R) < col(R)
r2 <- R[above]^2
Fstat <- r2 * dfr / (1 - r2)
R[above] <- 1 - pf(Fstat, 1, dfr)
cor.mat <- t(R)
cor.mat[upper.tri(cor.mat)] <- NA
cor.mat
}
So let's test it out:
set.seed(123)
data <- matrix(rnorm(100), 20, 5)
cor.prob(data)
[,1] [,2] [,3] [,4] [,5]
[1,] 1.0000000 NA NA NA NA
[2,] 0.7005361 1.0000000 NA NA NA
[3,] 0.5990483 0.6816955 1.0000000 NA NA
[4,] 0.6098357 0.3287116 0.5325167 1.0000000 NA
[5,] 0.3364028 0.1121927 0.1329906 0.5962835 1
Does that line up with cor.test?
cor.test(data[,2], data[,3])
Pearson's product-moment correlation
data: data[, 2] and data[, 3]
t = 0.4169, df = 18, p-value = 0.6817
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
-0.3603246 0.5178982
sample estimates:
cor
0.09778865
Seems to work ok.
Here is something that I just made, I stumbled on this post because I was looking for a way to take every pair of variables, and get a tidy nX3 dataframe. Column 1 is a variable, Column 2 is a variable, and Column 3 and 4 are their absolute value and true correlation. Just pass the function a dataframe of numeric and integer values.
pairwiseCor <- function(dataframe){
pairs <- combn(names(dataframe), 2, simplify=FALSE)
df <- data.frame(Vairable1=rep(0,length(pairs)), Variable2=rep(0,length(pairs)),
AbsCor=rep(0,length(pairs)), Cor=rep(0,length(pairs)))
for(i in 1:length(pairs)){
df[i,1] <- pairs[[i]][1]
df[i,2] <- pairs[[i]][2]
df[i,3] <- round(abs(cor(dataframe[,pairs[[i]][1]], dataframe[,pairs[[i]][2]])),4)
df[i,4] <- round(cor(dataframe[,pairs[[i]][1]], dataframe[,pairs[[i]][2]]),4)
}
pairwiseCorDF <- df
pairwiseCorDF <- pairwiseCorDF[order(pairwiseCorDF$AbsCor, decreasing=TRUE),]
row.names(pairwiseCorDF) <- 1:length(pairs)
pairwiseCorDF <<- pairwiseCorDF
pairwiseCorDF
}
This is what the output is:
> head(pairwiseCorDF)
Vairable1 Variable2 AbsCor Cor
1 roll_belt accel_belt_z 0.9920 -0.9920
2 gyros_dumbbell_x gyros_dumbbell_z 0.9839 -0.9839
3 roll_belt total_accel_belt 0.9811 0.9811
4 total_accel_belt accel_belt_z 0.9752 -0.9752
5 pitch_belt accel_belt_x 0.9658 -0.9658
6 gyros_dumbbell_z gyros_forearm_z 0.9491 0.9491
I've found that the R package picante does a nice job dealing with the problem that you have. You can easily pass your dataset to the cor.table function and get a table of correlations and p-values for all of your variables. You can specify Pearson's r or Spearman in the function. See this link for help:
http://www.inside-r.org/packages/cran/picante/docs/cor.table
Also remember to remove any non-numeric columns from your dataset prior to running the function. Here's an example piece of code:
install.packages("picante")
library(picante)
#Insert the name of your dataset in the code below
cor.table(dataset, cor.method="pearson")
You can use the sjt.corr function of the sjPlot-package, which gives you a nicely formatted correlation table, ready for use in your Office application.
Simplest function call is just to pass the data frame:
sjt.corr(df)
See examples here.

Resources