Linear regression using a list of function - r

I've a dataset with X and Y value obtained from a calibration and I have to interpolate them with a predefined list of polynomial functions and choose the one with the best R2.
The most silly function should be
try<-function(X,Y){
f1<- x + I(x^2.0) - I(x^3.0)
f2<- x + I(x^1.5) - I(x^3.0)
...
f20<- I(x^2.0) - I(x^2.5) + I(x^0.5)
r1<- lm(y~f1)
r2<- lm(y~f2)
...
r20<-lm(y~f20)
v1<-summary(r1)$r.squared
v2<-summary(r2)$r.squared
...
v20<-summary(r20)$r.squared
v<-c(v1,v2,...,v20)
return(v)
}
I'd like then to make this function shorter and smarter (especially from the definition of r1 to the end). I'd also like to give the user the possibility to choose a function among f1 to f20 (typing the desired row number of v) and see the output of the function print and plot on it.
Please, could you help me?
Thank you.
#mso: the idea of using sapply is nice but unfortunately in this way I don't use a polynome for the regression: my x vector is transformed in the f1 vector according to the formula and then used for the regression. I obtain just one parameter instead of 3 (in this case).

Create F as a list and proceed:
F = list(f1, f2, ...., f20)
r = sapply(F, function(x) lm(y~x))
v = sapply(r, function(x) summary(x)$r.squared)
return v
sapply will take each element of F and perform lm with y and put results in vector r. In next line, sapply will take every element of r and get summary and put the results in the vector v. Hopefully, it should work. You could also try lapply (instead of sapply) which is very similar.

Related

Returning 'traditional' notations of functions in the context of fourier interpolation

in numerical analysis we students are obligated to implement code in R that given a function f(x) finds its Fourier interpolation tN(x) and computes the interpolation error
$||f(x)-t_{N}(x)||=\int_{0}^{2\pi}$ $|f(x)-t_{N}(x)|^2$
or a variety of different $N$
I first tried to compute the d-coefficients according to this formular:
$d = \frac 1N M y$
with M denoting the DFT matrix and y denoting a series of equidistant function values with
$y_j = f(x_j)$ and
$x_j = e^{\frac{2*pi*i}N*j}$
for $j = 1,..,N-1$.
My goal was to come up with a sum that can be described by:
$t_{N}(x) = \Sigma_{k=0}^{N-1} d_k * e^{i*k*x}$
Which would be easier to later integrate in sort of a subsequently additive notation.
f <- function(x) 3/(6+4*cos(x)) #first function to compare with
g <- function(x) sin(32*x) #second one
xj <- function(x,n) 2*pi*x/n
M <- function(n){
w = exp(-2*pi*1i/n)
m = outer(0:(n-1),0:(n-1))
return(w^m)
}
y <- function(n){
f(xj(0:(n-1),n))
}
transformFunction <- function(n, f){
d = 1/n * t(M(n)) %*% f(xj(0:(n-1),n))
script <- paste(d[1])
for(i in 2:n)
script <- paste0(script,paste0("+",d[i],"*exp(1i*x*",i,")"))
#trans <- sum(d[1:n] * exp(1i*x*(0:(n-1))))
return(script)
}
The main purpose of the transform function was, initially, to return a function - or rather: a mathematical expression - which could then be used in order to declarate my Fourier Interpolation Function. Problem is, based on my fairly limited knowledge, that I cannot integrate functions that still have sums nested in them (which is why I commented the corresponding line in the code).
Out of absolute desperation I then tried to paste each of the summands in form of text subsequently, only to parse them again as an expression.
So the main question that remains is: how do I return mathmatical expressions in a manner that allow me to use them as a function and later on integrate them?
I am sincerely sorry for any misunderstanding or confusion, as well as my seemingly amateurish coding.
Thanks in advance!
A function in R can return any class, so specifically also objects of class function. Hence, you can make trans a function of x and return that.
Since the integrate function requires a vectorized function, we use Vectorize before outputting.
transformFunction <- function(n, f){
d = 1/n * t(M(n)) %*% f(xj(0:(n-1),n))
## Output function
trans <- function(x) sum(d[1:n] * exp(1i*x*(0:(n-1))))
## Vectorize output for the integrate function
Vectorize(trans)
}
To integrate, now simply make a new variable with the output of transformFunction:
myint <- transformFunction(n = 10,f = f)
Test: (integrate can only handle real-valued functions)
integrate(function(x) Re(myint(x)),0,2)$value +
1i*integrate(function(x) Im(myint(x)),0,2)$value
# [1] 1.091337-0.271636i

Problems with Gaussian Quadrature in R

I'm using the the gaussquad package to evaluate some integrals numerically.
I thought the ghermite.h.quadrature command worked by evaluating a function f(x) at points x1, ..., xn and then constructing the sum w1*f(x1) + ... + wn*f(xn), where x1, ..., xn and w1, ..., wn are nodes and weights supplied by the user.
Thus I thought the commands
ghermite.h.quadrature(f,rule)
sum(sapply(rule$x,f)*rule$w)
would yield the same output for any function f, where ''rule'' is a dataframe which stores the nodes in a column labelled ''x'' and the weights in a column labelled "w". For many functions the output is indeed the same, but for some functions I get very different results. Can someone please help me understand this discrepancy?
Thanks!
Code:
n.quad = 50
rule = hermite.h.quadrature.rules(n.quad)[[n.quad]]
f <- function(z){
f1 <- function(x,y) pnorm(x+y)
f2 <- function(y) ghermite.h.quadrature(f1,rule,y = y)
g <- function(x,y) x/(1+y) / f2(y)*pnorm(x+y)
h <- function(y) ghermite.h.quadrature(g,rule,y=y)
h(z)
}
ghermite.h.quadrature(f,rule)
sum(sapply(rule$x,f)*rule$w)
Ok, that problem got me interested.
I've looked into gaussquad sources, and clearly author is not running sapply internally, because all integrands/function shall return vector on vector argument.
It is clearly stated in documentation:
functn an R function which should take a numeric argument x and possibly some parameters.
The function returns a numerical vector value for the given argument x
In case where you're using some internal functions, they're written that way, so everything works.
You have to rewrite your function to work with vector argument and return back a vector
UPDATE
Vectorize() works for me to rectify the problem, as well as simple wrapper with sapply
vf <- function(z) {
sapply(z, f)
}
After either of those changes, results are identical: 0.2029512

Integration of a vector return one value

I am using R to do some multivariate analysis. For this work I need to integrate the trivariate PDF.Since I want to use this in a MLE, a want a vector of integration. Is there a way to make Integratebring a vector instead of one value.
Here is simple example:
f1=function(x, y, z) {dmvnorm(x=as.matrix(cbind(x,y,z)), mean=c(0,0,0), sigma=sigma)}
f1(x=c(1,1,1), y=c(1,1,1), z=c(1,1,1))
integrate(Vectorize(function(x) {f1(x=c(1,1,1), y=c(1,1,1), z=c(1,1,1))}), lower = - Inf, upper = -1)$value
Error in integrate(Vectorize(function(x) { : evaluation of function gave a result of wrong length
To integrate a function of one variable, with vector values,
you can transform the function into n functions with real values,
and integrate each of them.
This is very inefficient (when integrating the i-th function,
I evaluate all the functions, and discard all but one value).
# Function to integrate
d <- rnorm(10)
f <- function(x) dnorm(d, mean=x)
# Integrate those n functions separately.
n <- length(f(1))
r <- sapply( 1:n,
function(i) integrate(
Vectorize(function(x) f(x)[i]),
lower=-Inf, upper=0
)$value
)
r
For 2-dimensional integrals, you can check pracma::integral2,
but the same manipulation (transforming a bivariate function with vector values
into n bivariate functions with real values) will probably be needed.

Vectorize function to avoid loop

I'm trying to speed up my code because it's running very long. I already found out where the problem lies. Consider the following example:
x<-c((2+2i),(3+1i),(4+1i),(5+3i),(6+2i),(7+2i))
P<-matrix(c(2,0,0,3),nrow=2)
out<-sum(c(0.5,0.5)%*%mtx.exp(P%*%(matrix(c(x,0,0,x),nrow=2)),5))
I have a vector x with complex values, the vector has 12^11 entries and then I want to calculate the sum in the third row. (I need the function mtx.exp because it's a complex matrix power (the function is in the package Biodem). I found out that the %^% function does not support complex arguments.)
So my problem is that if I try
sum(c(0.5,0.5)%*%mtx.exp(P%*%(matrix(c(x,0,0,x),nrow=2)),5))
I get an error: "Error in pot %*% pot : non-conformable arguments." So my solution was to use a loop:
tmp<-NULL
for (i in 1:length(x)){
tmp[length(tmp)+1]<-sum(c(0.5,0.5)%*%mtx.exp(P%*%matrix(c(x[i],0,0,x[i]),nrow=2),5))
}
But as said, this takes very long. Do you have any ideas how to speed up the code? I also tried sapply but that takes just as long as the loop.
I hope you can help me, because i have to run this function approximatly 500 times and this took in first try more than 3 hours. Which is not very satisfying..
Thank u very much
The code can be sped up by pre-allocating your vector,
tmp <- rep(NA,length(x))
but I do not really understand what you are trying to compute:
in the first example,
you are trying to take the power of a non-square matrix,
in the second, you are taking the power of a diagonal matrix
(which can be done with ^).
The following seems to be equivalent to your computations:
sum(P^5/2) * x^5
EDIT
If P is not diagonal and C not scalar,
I do not see any easy simplification of mtx.exp( P %*% C, 5 ).
You could try something like
y <- sapply(x, function(u)
sum(
c(0.5,0.5)
%*%
mtx.exp( P %*% matrix(c(u,0,0,u),nrow=2), 5 )
)
)
but if your vector really has 12^11 entries,
that will take an insanely long time.
Alternatively, since you have a very large number
of very small (2*2) matrices,
you can explicitely compute the product P %*% C
and its 5th power (using some computer algebra system:
Maxima, Sage, Yacas, Maple, etc.)
and use the resulting formulas:
these are just (50 lines of) straightforward operations on vectors.
/* Maxima code */
p: matrix([p11,p12], [p21,p22]);
c: matrix([c1,0],[0,c2]);
display2d: false;
factor(p.c . p.c . p.c . p.c . p.c);
I then copy and paste the result in R:
c1 <- dnorm(abs(x),0,1); # C is still a diagonal matrix
c2 <- dnorm(abs(x),1,3);
p11 <- P[1,1]
p12 <- P[1,2]
p21 <- P[2,1]
p22 <- P[2,2]
# Result of the Maxima computations:
# I just add all the elements of the resulting 2*2 matrix,
# but you may want to do something slightly different with them.
c1*(c2^4*p12*p21*p22^3+2*c1*c2^3*p11*p12*p21*p22^2
+2*c1*c2^3*p12^2*p21^2*p22
+3*c1^2*c2^2*p11^2*p12*p21*p22
+3*c1^2*c2^2*p11*p12^2*p21^2
+4*c1^3*c2*p11^3*p12*p21+c1^4*p11^5)
+
c2*p12
*(c2^4*p22^4+c1*c2^3*p11*p22^3+3*c1*c2^3*p12*p21*p22^2
+c1^2*c2^2*p11^2*p22^2+4*c1^2*c2^2*p11*p12*p21*p22
+c1^3*c2*p11^3*p22+c1^2*c2^2*p12^2*p21^2
+3*c1^3*c2*p11^2*p12*p21+c1^4*p11^4)
+
c1*p21
*(c2^4*p22^4+c1*c2^3*p11*p22^3+3*c1*c2^3*p12*p21*p22^2
+c1^2*c2^2*p11^2*p22^2+4*c1^2*c2^2*p11*p12*p21*p22
+c1^3*c2*p11^3*p22+c1^2*c2^2*p12^2*p21^2
+3*c1^3*c2*p11^2*p12*p21+c1^4*p11^4)
+
c2*(c2^4*p22^5+4*c1*c2^3*p12*p21*p22^3
+3*c1^2*c2^2*p11*p12*p21*p22^2
+3*c1^2*c2^2*p12^2*p21^2*p22
+2*c1^3*c2*p11^2*p12*p21*p22
+2*c1^3*c2*p11*p12^2*p21^2+c1^4*p11^3*p12*p21)

R: looping to search for max of non-monotonic function

Refer to the R code below. The function (someRfunction) operates on a vector and returns a scalar value. The data are pairs (x,y), where x and y are vectors of length n, which may be large.
I want to know the value of x* such that the result of someRfunction on y where {x>x*} is maximized. The function operates on y values and is non-monotonic in x*. I need to evaluate for all x* (i.e. each element of x). Speed is not an issue if executed once, but the code would be executed many times in a simulation. Is there any way to make this code more efficient/faster?
### x and y are vectors of length n
### sort x and y such that they are ordered by descending x
xord <- x[order(-x)]
yord <- y[order(-x)]
maxf <- -99999
maxcut <- NA
for (i in 1:n) {
### yi is a subvector of y that corresponds to y[x>x{i}]
### where x{i} is the (n-i+1)th order statistic of x
yi <- yord[1:(i-1)]
fxi <- someRfunction(yi)
if (fxi>maxf) {
maxf <- fxi
maxcut <- xord[i]
}
}
Thanks.
Edit: let someRfunction(yi)=t.test(yi)$statistic.
If you can say anything more about the function, particularly whether it is smooth and whether its gradient can be determine, you will get a better answer. At the moment the only increase in speed will be modest due to the ability to pre-specify a vector to hold the results, omit that if-max clause and then use which.max() on the vector. You might want to look at the function optimx in package "optimx".

Resources