passing multiple arguments via mapply to a formula function - r

I would like to run two loess regressions. The data is provided as a list which contains two elements. Each element itself contains a pair of columns (x and y for regression) for which I would like to run the loess regression. I would like to do so by employing the apply family specifically maply. However loess regression takes the formula expression y ~ x and I believe you can not directly reference for x and y in the formula format as you would for a non formula function where the variables could be provided via mapply.
X <- c(3,4,3,2,3,4,5,6,7,7,6,5,4,3,3,5,3,6,3,5,6,3,6,3,4,5,5,4,3,4,5,3,5,5,4)
Y <- c(3,2,1,3,4,2,1,2,3,5,4,3,2,1,1,3,4,5,6,7,6,5,4,3,2,3,4,3,4,2,4,3,NA,NA,NA)
mydata<-data.frame(X,Y)
L <- seq(1:length(mydata))
n <- function(x) length(na.omit(mydata[,x]))
n <- lapply(L,n)
# sequence each (Variable time)
x <- function(x) seq(1:n[[x]])
x <- lapply(L,x)
y <- function(x) na.omit(mydata[,x])
y <- lapply(L,y)
# create a list with pairs of columns each
Data <- function (p) data.frame(y[[p]], x[[p]])
Data <- lapply(L,Data)
# In a writen function you would do (where p will correspond to the sequence of L and d is the number of columns in each element) and use mapply passing the arguments
d <- seq(1:length(mydata))
p <- seq(1:length(Data))
W1<-expand.grid(p=p,d=d)
# However the formula framework y ~ x for loess does not allow to pass multiple variable arguments to x and y and use mapply to do so which I would like to do as to automate this process. I wrote in the same format as x and y variables will be passed to a non formula function which does not work.
mapply(function(p,d) ((y ~ x, span = 0.75, degree = 2,parametric = FALSE, drop.square = FALSE, normalize = FALSE,family = c("gaussian")),W1$p,W1$d)
I am wondering how could I pass using the mapply function the different variables to the loess function.

Related

Seeking an lapply like function for a list of lists

I have a list in R which looks something like this
b0=5;b1=2
f <- function(x) b0 + b1*x
Nsim <- 100
my.list <- vector("list", Nsim)
for(i in 1:Nsim){
x <- rep(0,1000)
y <- x
y[1] <- f(x[1])
for(j in 2:1000){
x[j] <- x[j-1] + rnorm(1,0,0.1)
y[j] < f(x[j])
}
my.list[[i]]$x <- x
my.list[[i]]$y <- y
}
In reality, f is the result of an optimisation routine and x tracks the input value over time and y is the function values which are generated. So in essence, I have Nsim time series. I want to plot metrics of these time series over time by averaging over the index i. For instance, the average performance of the algorithm over time.
At the moment I'm doing this with a bespoke function for each metric I want to calculate (e.g. mean squared error of x from the true value of x, another for generating error bars and so on). I want to use something like lapply to average over i so I can visualise how x and y evolve over time but that doesn't do the right thing.
Is what I want to output is a pointwise summary of the results. As an analogy, if my.list[[i]]$x was instead stored as a matrix, I could take colMeans() to see the average value of x over "time".
Is there a function/package which is good for working with lists of lists?
At least for what has been presented there is no real reason to use a list of lists. The x's are all the same and equal to 1, 2, 3, ... so this could be represented by a matrix with the x component being implicit or represented by row names or we could represent this as a ts object or zoo object. In the last two cases if X is the object time(X) is the common x.
mat <- sapply(my.list, "[[", "y")
ts(mat)
library(zoo); zoo(mat)
Alternately, get rid of my.list and construct one of these directly in the code.

Creating a Loss Function

I was trying to creating a loss function below.
Where tts is the total sum of squares and x is values 1-100 and t is a given y hat. W0+W1 is supposedly par(0,1) but I'm having issues with getting the function correct but I'm not sure why.
x
t
loss <- function(par){
th<-w0+w1*x
tts<-(t-th)^2
return(sum(tts))
}
```{r, error = TRUE}
results <- optim(par = c(0,1), fn = loss, method = 'BFGS')
results$par
The first argument to any function that you want to optimize with optim must be the vector of parameters that optim will search over. You named this vector par but then you didn't use par anywhere in your function. In my example below, I'm going to call the vector of parameters params so as not to mix it up with the first argument to optim and you'll see it gets used (ie, the loss function uses params[1], etc.):
# define loss function
loss <- function(params, x, y) {
yhat <- params[1] + params[2]*x
tss <- (y - yhat)^2
return(sum(tss))
}
# generate fake data
n <- 100
x <- 1:n
w0_true <- 2
w1_true <- 3
y <- w0_true + w1_true*x + rnorm(n)
# find w0_hat and w1_hat with optim
optim(par=c(0,1), fn=loss, x=x, y=y)
# check with lm
summary(lm(y ~ x))

Calculation of DFFITS as diagnostic for Leverage and Influence in regression

I am trying to calculate DFFITS by hand. The value obtained should be equal to the first value obtained by dffits function. However there must be something wrong with my own calculation.
attach(cars)
x1 <- lm(speed ~ dist, data = cars) # all observations
x2 <- lm(speed ~ dist, data = cars[-1,]) # without first obs
x <- model.matrix(speed ~ dist) # x matrix
h <- diag(x%*%solve(crossprod(x))%*%t(x)) # hat values
num_dffits <- x1$fitted.values[1] - x2$fitted.values[1] #Numerator
denom_dffits <- sqrt(anova(x2)$`Mean Sq`[2]*h[1]) #Denominator
df_fits <- num_dffits/denom_dffits #DFFITS
dffits(x1)[1] # DFFITS function
Your numerator is wrong. As you have removed first datum from the second model, corresponding predicted value is not in fitted(x2). We need to use predict(x2, cars[1, ]) in place of fitted(x2)[1].
Hat values can be efficiently computed by
h <- rowSums(qr.Q(x1$qr) ^ 2)
or using its R wrapper function
h <- hat(x1$qr, FALSE)
R also has a generic function for getting hat values, too:
h <- lm.influence(x1, FALSE)$hat
or its wrapper function
h <- hatvalues(x1)
You also don't have to call anova to get MSE:
c(crossprod(x2$residuals)) / x2$df.residual

quantreg lm.recursive.fit in simple regression without constant

I try to use the function lm.fit.recursive in R's quantreg package to construct recursive residuals for a simple regression without constant.
Here is a minimal example of an approach that does not work:
# some data
n <- 20
z <- rnorm(n)
x <- rnorm(n)
x.mat <- matrix(rnorm(2*n),ncol=2)
lm.fit.recursive(x, z, int=T) # works WITH intercept with one regressor
lm.fit.recursive(x.mat, z, int=F) # works WITHOUT intercept with two regressors
lm.fit.recursive(x, z, int=F) # what I actually want but which returns Error in 1:p : argument of length 0
My hunch is that the error is related to the regressor matrix in this case not being a matrix but a vector, which leads R to treat this variable differently.
Is that correct, or am I using the function incorrectly?
Indeed,
> lm.fit.recursive
function (X, y, int = TRUE)
{
if (int)
X <- cbind(1, X)
p <- ncol(X)
n <- nrow(X)
D <- qr(X[1:p, ])
...
}
so that ncol(x)=0 for a vector. Hence,
lm.fit.recursive(as.matrix(x,ncol=1), z, int=F)
provides a workaround.

Integration of a vector return one value

I am using R to do some multivariate analysis. For this work I need to integrate the trivariate PDF.Since I want to use this in a MLE, a want a vector of integration. Is there a way to make Integratebring a vector instead of one value.
Here is simple example:
f1=function(x, y, z) {dmvnorm(x=as.matrix(cbind(x,y,z)), mean=c(0,0,0), sigma=sigma)}
f1(x=c(1,1,1), y=c(1,1,1), z=c(1,1,1))
integrate(Vectorize(function(x) {f1(x=c(1,1,1), y=c(1,1,1), z=c(1,1,1))}), lower = - Inf, upper = -1)$value
Error in integrate(Vectorize(function(x) { : evaluation of function gave a result of wrong length
To integrate a function of one variable, with vector values,
you can transform the function into n functions with real values,
and integrate each of them.
This is very inefficient (when integrating the i-th function,
I evaluate all the functions, and discard all but one value).
# Function to integrate
d <- rnorm(10)
f <- function(x) dnorm(d, mean=x)
# Integrate those n functions separately.
n <- length(f(1))
r <- sapply( 1:n,
function(i) integrate(
Vectorize(function(x) f(x)[i]),
lower=-Inf, upper=0
)$value
)
r
For 2-dimensional integrals, you can check pracma::integral2,
but the same manipulation (transforming a bivariate function with vector values
into n bivariate functions with real values) will probably be needed.

Resources