producing a grid of results in R - r

I am writing some code to do a maximum likelihood estimation of some parameter values, and I am trying to create a surface plot of parameter values taken from the optim function, and need to create a grid to do so. It is the part whereby I need to create a grid that is confounding me,
My MLE function looks like:
loglike<-function(par,dat,scale)
{ ptp<-dat[1:length(dat)-1]
ptp1<-dat[2:length(dat)]
r<-par['r']
k<-par['k']
sigma<-par['sigma']
if(scale=='log')
{
return(sum(dnorm(log(ptp1)-log(ptp)*exp(r-(ptp/k)),mean=0,sd=sigma,log=T)))
}
if (scale=='sqrt')
{
return(sum(dnorm(sqrt(ptp1)-sqrt(ptp)*exp(r-(ptp/k)),mean=0,sd=sigma,log=T)))
}
if (scale=='linear')
{
return(sum(dnorm(ptp1-ptp*exp(r-(ptp/k)),mean=0,sd=sigma,log=T)))
}
}
I have already created some data from the optim giving me corresponding parameter values
I have tried taking output from the optim function and putting it into the expand.grid function like:
gridlog<-expand.grid(logs[,"r"],logs[,"sigma"],logs[,"k"])
But all this is doing is creating a large matrix filled with all the same values.
Where the data going into the expand.grid function is filled from :
logs<-list()
for(i in seq(1,300,0.1)){
logs[i]<-optim(par=c(r=i,k=i,sigma=i),fn=loglike,dat=dat,scale='log',method='Nelder-Mead',control=list(fnscale=-1))
}
logs<-do.call(rbind,logs)
This creates a 300 long matrix of corresponding sigma's r's and k's
My data is:
c(100, 128.675595618645, 75.436115414503, 146.398449792328, 102.419994706974,
207.397726741841, 23.4579309898438, 42.4085746569567, 119.498216389673,
59.7845591706614, 119.37201616882, 252.047672957539, 28.3165331949818,
57.4918213065119, 311.615538092141, 8.53779749227741, 31.5382580618134,
115.617013730077, 43.6907812963781, 70.9139870053552, 123.004040266686,
132.575148404208, 114.813947981006, 115.950032495637, 120.891472762661,
97.0207348527786, 235.618894638631, 17.0936655960759, 49.4419128844531,
112.476950569973, 58.3241789008329, 80.0300102105128, 103.248819284132,
99.1968765946717, 113.905769052605, 143.181386861766, 62.962989192695,
174.054591300157, 39.9156352770331, 81.8344415290292, 176.631480374326,
51.5564038694108, 131.542259464434, 72.5981749979889, 38.9733086158719,
126.808054274927, 73.6960412245896, 62.5484608101147, 55.539355637003,
137.888502803112, 106.921926717155, 140.000738390606, 162.512046122238,
26.2949484171288, 80.4110888678422, 74.0481779531392, 33.9890286552257,
142.477859644323, 55.1820570626643, 107.242498924143, 56.8497685792794,
143.676120209843, 84.2334844367379, 67.0330079913484, 109.96246704725,
157.216290273118, 59.4585552091703, 67.2986524284706, 55.2529503291083,
38.932960005221, 62.7454169122216, 210.687014199037, 38.7348882392115,
75.6645116341029, 115.924283193145, 117.772958122253, 45.5313134644358,
112.306998515583, 38.7001172906923, 66.1308507048062, 122.516808638813,
38.8283932430479, 168.014298040365, 38.0902373313928, 117.414876109978,
168.615976661456, 66.5037228223079, 94.4482610053865, 505.254990783834,
1.05181785078369, 1.77594058056118, 4.36034444400473, 12.1485473106491,
82.2373017835424, 58.9775202042162, 132.907299665772, 51.2346939236555,
123.251093218535, 143.077217943039, 96.1524852870813)
Any help anyone could give would be greatly appreciated!!

#find optimum:
fit<-optim(par=c(r=1,k=1,sigma=1),fn=loglike,dat=dat,scale='log',
method='Nelder-Mead',control=list(fnscale=-1))
fit$par
r k sigma
0.3911590 254.4989317 0.5159761
# make grid around optimum with few selected sigma values:
rs<-seq(0.01,1,length=30)
ks<-seq(230,280,length=30)
sigmas<-c(0.25,0.5159761,0.75)
# this will contains all parameter combinations
# and the corresponding likelihood values
mlegrid<-cbind(as.matrix(expand.grid(rs,ks,sigmas)),0) #Matrix
colnames(mlegrid)<-c('r','k','sigma','likelihood')
for(i in 1:nrow(mlegrid)){ #go through all combinations
mlegrid[i,4]<- loglike(par=mlegrid[i,1:3],dat=dat,scale='log')
}
mlegrid[which.max(mlegrid[,4]),]
r k sigma likelihood
0.3855172 257.5862069 0.5159761 -74.9940496
# almost the same as from optim
# (differences due to sparse grid, more dense gives more accurate results)
#for interactive plots, static versions with `persp` function
library(rgl)
persp3d(x=rs,y=ks,
z=matrix(mlegrid[mlegrid[,3]==sigmas[1],4],nrow=length(rs)),col=2)
#with sigma from optim
persp3d(x=rs,y=ks,
z=matrix(mlegrid[mlegrid[,3]==sigmas[2],4],nrow=length(rs)),col=2)
persp3d(x=rs,y=ks,
z=matrix(mlegrid[mlegrid[,3]==sigmas[3],4],nrow=length(rs)),col=2)

Related

Grid Search in R for Nonparametric Quantile Regression

I use a library called "quantreg" in R and try to estimate full nonparametric quantile regression on time series basis. To get statistically significant results I try lots of variables and smoothing parameter values (lambda). But it's exhausting and very time consuming. Therefore, I want to apply grid search, however it is a little bit hard for me. I want to determine best smoothing values, so I should construct a for loop. But I want that loop to try every combination. At the I want to have the lambda values of best model or models (all variables' p values<0.05 condition).
For example if I have three variables in my equation I've written something like that:
lambdas1<-rbind(1,2,3)
lambdas2<-rbind(1,2,3)
lambdas3<-rbind(1,2,3)
mylist<-list()
for (i in 1:3) {
for (j in 1:3) {
for (n in 1:3) {
f <-try(rqss(Y~qss(X1,lambda = lambdas1[i])+qss(X2,lambda = lambdas2[j])+qss(X3,lambda = lambdas3[n]), tau=0.05))
sf<-summary(f)
if( (sf[["qsstab"]]['X1','Pr(>F)']<0.05)&(sf[["qsstab"]]['X2','Pr(>F)']<0.05)&(sf[["qsstab"]]['X3','Pr(>F)']<0.05) ){
mylist[[i]]<-f$lambdas
}
}
}
}
How can I rearrange this code?
Is there any shortcut?
Any help will be appreciated.
Thank you in advance.
You can use baseR expand.grid to create a data.frame of all the possible combinations and then use apply(grid, MARGIN=2, ...) to loop through its rows, also I "optimized" the code you were looking if each p-value I changed it to use all(p.vals < .05)
lambdas <- expand.grid(1:3,1:3,1:3)
check_lambdas <- function(lambdas){
f <-try(rqss(Y~qss(X1,lambda = lambdas[1])+qss(X2,lambda = lambdas[2])+qss(X3,lambda = lambdas[3]), tau=0.05))
if( all(summary(f)$qsstab[,'Pr(>F)']<0.05) ) f$lambdas else NULL
}
apply(lambdas, 2, check_lambdas)

R `cor()` style results but with `Kendall's W`

I have a dataframe and want to calculate Kendall's W for each pair of variables.
This function kendall(df, correct = TRUE) calculates the W for the entire dataframe. I would like a table more like cor(df, method=c("kendall")) which compares every pair of variables, however that function uses Kendall's tau and not Kendall's W.
I would like to do this calculation:
install.packages("irr")
library(irr)
df<-iris
kendall(df[,1:4], correct=TRUE)
In this way:
cor(df[,1:4])
I am not sure whether I understand it correctly, is the code below helpful?
my.kendall <- function(df) {
func<-Vectorize(function(i,j){kendall(df[,c(i,j)])$value})
outer(X=1:ncol(df), Y=1:ncol(df), FUN=func)
}

R : library "parallel" necessary?

i am sorry to ask again .
I am doing simulation study regarding combination factors of sample sizes,variances and different distribution.
Now, i am wondering do i need to include this
library(parallel)
in the beginning of the code?
##########################################################################
#to evaluate the same R function on many different sets of data
library(parallel)
rm(list=ls()) # clean the workspace
nSims<-10000
alpha<-0.05
#set nrow =nsims because wan storing every p-value simulated
#for gamma distribution with equal skewness
matrix2_equal <-matrix(0,nrow=nSims,ncol=5)
matrix5_unequal<-matrix(0,nrow=nSims,ncol=5)
matrix8_mann <-matrix(0,nrow=nSims,ncol=5)
# to ensure the reproducity of the result
#here we declare the random seed generator
set.seed(1)
## Put the samples sizes into matrix then use a loop for sample sizes
sample_sizes<-matrix(c(10,10,10,25,25,25,25,50,25,100,50,25,50,100,100,25,100,100),
nrow=2)
#shape parameter for gamma distribution for equal skewness
#forty five cases for each skewness!!!!
sp1<-matrix(rep(c(16/9),each=45),ncol=1)
scp <- c(1,1.5,2,2.5,3)
##(use expand.grid)to create a data frame
ss_scp<- expand.grid(sample_sizes[2,],scp)
#create a matrix combining the forty five cases of combination of sample sizes,shape and scale parameter
all <- cbind(rep(sample_sizes[1,], 5),ss_scp[,1],sp1,ss_scp[,2])
# name the column samples 1 and 2 and standard deviation
colnames(all) <- c("m","n","sp","scp")
#set empty vector of length no.of simulation(10000) to store p-value
equal<-unequal<-mann<-c(rep(0,nrow(all)))
#set nrow =nsims because wan storing every p-value simulated
#for gamma distribution with equal skewness
matrix2_equal <-matrix(0,nrow=nSims,ncol=5)
matrix5_unequal<-matrix(0,nrow=nSims,ncol=5)
matrix8_mann <-matrix(0,nrow=nSims,ncol=5)
##for the samples sizes into matrix then use a loop for sample sizes
# this loop steps through the all_combine matrix
for(ss in 1:nrow(all))
{
#generate samples from the first column and second column
m<-all[ss,1]
n<-all[ss,2]
for (sim in 1:nSims)
{
#generate 2 random samples from gamma distribution with equal skewness
gamma1<-rgamma(m,1.777778,scale=all[ss,4])
gamma2<-rgamma(n,1.777778,scale=1)
#extract p-value out and store every p-value into matrix
p<-t.test(gamma1,gamma2,var.equal=TRUE)$p.value
q<-t.test(gamma1,gamma2,var.equal=FALSE)$p.value
r<-wilcox.test(gamma1,gamma2)$p.value
matrix2_equal[sim,1]<- p
matrix5_unequal[sim,1]<- q
matrix8_mann[sim,1] <- r
}
##store the result
equal[ss]<- sum(matrix2_equal[,1]<alpha)
unequal[ss]<-sum(matrix5_unequal[,1]<alpha)
mann[ss]<- sum(matrix8_mann[,1]<alpha)
}
Remove the package parallel (detach("package:parallel", unload=TRUE)) and run your code without loading the package. If you get an error stating that function xy could not be found, you might need the package.
However, I do not see any line in your code that seems to require the parallel package.

Different results when performing PCA in R with princomp() and principal ()

I tried to use princomp() and principal() to do PCA in R with data set USArressts. However, I got two different results for loadings/rotaion and scores.
First, I centered and normalised the original data frame so it is easier to compare the outputs.
library(psych)
trans_func <- function(x){
x <- (x-mean(x))/sd(x)
return(x)
}
A <- USArrests
USArrests <- apply(USArrests, 2, trans_func)
princompPCA <- princomp(USArrests, cor = TRUE)
principalPCA <- principal(USArrests, nfactors=4 , scores=TRUE, rotate = "none",scale=TRUE)
Then I got the results for the loadings and scores using the following commands:
princompPCA$loadings
principalPCA$loadings
Could you please help me to explain why there is a difference? and how can we interprete these results?
At the very end of the help document of ?principal:
"The eigen vectors are rescaled by the sqrt of the eigen values to produce the component loadings more typical in factor analysis."
So principal returns the scaled loadings. In fact, principal produces a factor model estimated by the principal component method.
In 4 years, I would like to provide a more accurate answer to this question. I use iris data as an example.
data = iris[, 1:4]
First, do PCA by the eigen-decomposition
eigen_res = eigen(cov(data))
l = eigen_res$values
q = eigen_res$vectors
Then the eigenvector corresponding to the largest eigenvalue is the factor loadings
q[,1]
We can treat this as a reference or the correct answer. Now we check the results by different r functions.
First, by function 'princomp'
res1 = princomp(data)
res1$loadings[,1]
# compare with
q[,1]
No problem, this function actually just return the same results as 'eigen'. Now move to 'principal'
library(psych)
res2 = principal(data, nfactors=4, rotate="none")
# the loadings of the first PC is
res2$loadings[,1]
# compare it with the results by eigendecomposition
sqrt(l[1])*q[,1] # re-scale the eigen vector by sqrt of eigen value
You may find they are still different. The problem is the 'principal' function does eigendecomposition on the correlation matrix by default. Note: PCA is not invariant with rescaling the variables. If you modify the code as
res2 = principal(data, nfactors=4, rotate="none", cor="cov")
# the loadings of the first PC is
res2$loadings[,1]
# compare it with the results by eigendecomposition
sqrt(l[1])*q[,1] # re-scale the eigen vector by sqrt of eigen value
Now, you will get the same results as 'eigen' and 'princomp'.
Summarize:
If you want to do PCA, you'd better apply 'princomp' function.
PCA is a special case of the Factor model or a simplified version of the factor model. It is just equivalent to eigendecomposition.
We can apply PCA to get an approximation of a factor model. It doesn't care about the specific factors, i.e. epsilons in a factor model. So, if you change the number of factors in your model, you will get the same estimations of the loadings. It is different from the maximum likelihood estimation.
If you are estimating a factor model, you'd better use 'principal' function, since it provides more functions, like rotation, calculating the scores by different methods, and so on.
Rescale the loadings of a PCA model doesn't affect the results too much. Since you still project the data onto the same optimal direction, i.e. maximize the variation in the resulting PC.
ev <- eigen(R) # R is a correlation matrix of DATA
ev$vectors %*% diag(ev$values) %*% t(ev$vectors)
pc <- princomp(scale(DATA, center = F, scale = T),cor=TRUE)
p <-principal(DATA, rotate="none")
#eigen values
ev$values^0.5
pc$sdev
p$values^0.5
#eigen vectors - loadings
ev$vectors
pc$loadings
p$weights %*% diag(p$values^0.5)
pc$loading %*% diag(pc$sdev)
p$loadings
#weights
ee <- diag(0,2)
for (j in 1:2) {
for (i in 1:2) {
ee[i,j] <- ev$vectors[i,j]/p$values[j]^0.5
}
};ee
#scores
s <- as.matrix(scale(DATA, center = T, scale = T)) %*% ev$vectors
scale(s)
p$scores
scale(pc$scores)

How to get N values along with pearson correlation?

I am using the code below to calculate the correlation map between two datasets.this code worked fine and I got the results which look like:![enter image description here]![enter image description here][1].
I would like also to get another map displaying how many pairs were used in calculation of each pixel so I get map of N a long with map of correlation.
as per Paul Hiemstra this function gave cor and N:
cor_withN = function(...) {
cor_obj = cor.test(...)
print(sprintf("N = %s", cor_obj$parameter + 2))
return(data.frame(cor = cor_obj$estimate, N = cor_obj$parameter + 2))
}
cor_withN(runif(100), runif(100))
[1] "N = 100"
cor N
cor 0.1718225 100
when I simply replaced cor by cor_withN I got this error:
Error in cor.test.default(...) : not enough finite observations
How can I imply this function in my code to get two maps of correlation and N values ?
1. Error
Error in cor.test.default(...) : not enough finite observations
According to corr.test source (http://svn.r-project.org/R/trunk/src/library/stats/R/cor.test.R) this error can appear in two cases:
You are using Pearson's correlation and have less than 3 finite pairs of observations.
You are using Kendall's or Spearman's correlation and have less than 2 pairs.
Indeed, cor.test(c(1,2), c(2,3)) causes exactly the same error, while cor(c(1,2), c(2,3)) gives an answer.
Note, that cor.test uses complete.cases(x,y) for calculations. So, look into your data - probably there are not enough pairs somewhere.
2. Function
cor returns numeric value, your function corr_withN returns data.frame. So, it doesn't look like you can simply replace one by another.
As I understand you need just a matrix of size 1440x720 which will be plotted over the map. In this case you can just use cor for the first plot, and simple function returning the number of pairs used to calculate correlation for the second. The function itself can be as simple as:
cor_withN <- function(...) {
cor.test(...)$parameter+2
}
UPDATE: After comment
If cor_withN must return NA when there are less than 3 pairs it should be modified:
cor_withN <- function(...) {
res <- try(cor.test(...)$parameter+2, silent=TRUE)
ifelse(class(res)=="try-error", NA, res)
}
This function tries to compute correlation and, if it fails, returns NA or number of pairs otherwise.

Resources