'Non-conformable arguments' in R code [closed]

'Non-conformable arguments' in R code [closed] - r

Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 4 years ago.
Improve this question
: ) I previously wrote an R function that will compute a least-squares polynomial of arbitrary order to fit whatever data I put into it. "LeastSquaresDegreeN.R" The code works because I can reproduce results I got previously. However, when I try to put new data into it I get a "Non-conformable arguments" error.
"Error in Conj(t(Q))%*%t(b) : non-conformable arguments"
An extremely simple example of data that should work:
t <- seq(1,100,1)
fifthDegree <- t^5
LeastSquaresDegreeN(t,fifthDegree,5)
This should output and plot a polynomial f(t) = t^5 (up to rounding errors).
However I get "Non-conformable arguments" error even if I explicitly make these vectors:
t <- as.vector(t)
fifthDegree <- as.vector(fifthDegree)
LeastSquaresDegreeN(t,fifthDegree,5)
I've tried putting in the transpose of these vectors too - but nothing works.
Surely the solution is really simple. Help!? Thank you!
Here's the function:
LeastSquaresDegreeN <- function(t, b, deg)
{
# Usage: t is independent variable vector, b is function data
# i.e., b = f(t)
# deg is desired polynomial order
# deg <- deg + 1 is a little adjustment to make the R loops index correctly.
deg <- deg + 1
t <- t(t)
dataSize <- length(b)
A <- mat.or.vec(dataSize, deg) # Built-in R function to create zero
# matrix or zero vector of arbitrary size
# Given basis phi(z) = 1 + z + z^2 + z^3 + ...
# Define matrix A
for (i in 0:deg-1) {
A[1:dataSize,i+1] = t^i
}
# Compute QR decomposition of A. Pull Q and R out of QRdecomp
QRdecomp <- qr(A)
Q <- qr.Q(QRdecomp, complete=TRUE)
R <- qr.R(QRdecomp, complete=TRUE)
# Perform Q^* b^T (Conjugate transpose of Q)
c <- Conj(t(Q))%*%t(b)
# Find x. R isn't square - so we have to use qr.solve
x <- qr.solve(R, c)
# Create xPlot (which is general enough to plot any degree
# polynomial output)
xPlot = x[1,1]
for (i in 1:deg-1){
xPlot = xPlot + x[i+1,1]*t^i
}
# Now plot it. Least squares "l" plot first, then the points in red.
plot(t, xPlot, type='l', xlab="independent variable t", ylab="function values f(t)", main="Data Plotted with Nth Degree Least Squares Polynomial", col="blue")
points(t, b, col="red")
} # End

Related

I am beginner in R and I'm trying to solve a system of equations but when i run i get error in R [duplicate]

This question already has an answer here:
Simple for loop in R producing "replacement has length zero" in R
(1 answer)
Closed 4 years ago.
# my error : Error in F[1] <- n/(X[0]) - sum(log(1 + Y^exp(X[1] + X[2] * x))) : replacement has length zero
set.seed(16)
#Inverse Transformation on CDF
n=100
SimRRR.f <- function(100, lambda=1,tau)) {
x= rnorm(100,0,1)
tau= exp(-1-x)
u=runif(100)
y= (1/(u^(1/lambda)-1))^(1/tau)
y
}
Y<-((1/u)-1)^exp(-1-x)
# MLE for Simple Linear Regresion
# System of equations
library(rootSolve)
library(nleqslv)
model <- function(X){
F <- numeric(length(X))
F[1] <- n/(X[0])-sum(log(1+Y^exp(X[1]+X[2]*x)))
F[2] <- 2*n -(X[0]+1)*sum(exp(X[1]+X[2]*x))*Y^( exp(X[1]+X[2]*x))*log(Y)/(1+ Y^( exp(X[1]+X[2]*x)))
F[3] <- sum(x) + sum(x*log(Y))*exp(X[1]+X[2]*x) -(X[0]+1)*X[1]*sum(exp(X[1]+X[2]*x)*Y^(exp(X[1]+X[2]*x)*log(Y)))/(1+ Y^( exp(X[1]+X[2]*x)))
# Solution
F
}
startx <- c(0.5,3,1) # start the answer search here
answers<-as.data.frame(nleqslv(startx,model))

The problem is that you define x, u, tau and y inside the SimRRR function, but are trying to define Y in terms of them outside the function.
Using a function, you give it input, and you get back output. All the other variables defined in the course of the function doing its job go away at the end. As it stands, Y should be a series of NAs (unless you defined the above variables in the global environment as you were working on your function...)
Try the following functions, see if they do the job:
# I usually put all my library calls together at the beginning of the script.
library(rootSolve)
library(nleqslv)
x = rnorm(n,0,1) # see below for why this is pulled out.
SimRRR.f <- function(x, lambda=1,tau)) { # 100 can't be by itself in the function call. everything in there needs to be attached to a variable.
n <- length(x)
tau= exp(-1-x)
u=runif(n)
y= (1/(u^(1/lambda)-1))^(1/tau)
y
}
Y_sim = SimRRR.f(n = 100, lambda = 1, tau = 1) # pick the right tau, it's never defined here.
Your second function has more issues. Namely, it relies on x, which is not defined anywhere that can be found. Either you need x from the previous function, or you really meant X. I'm going to assume you do need the values of x, since X is only of length 3. This is why I pulled it out of the last function call - we need it now.
[Update]
It's also been pointed out in the comments that the indexing here is wrong. I didn't catch that previously (and the F elements are defined correctly). I think I've fixed the indexing issues too now:
model <- function(X, Y, x){ # If you use x and Y in the function, define them here.
n <- length(x)
F <- numeric(length(X))
F[1] <- n/(X[1])-sum(log(1+Y^exp(X[2]+X[3]*x)))
F[2] <- 2*n -(X[1]+1)*sum(exp(X[2]+X[3]*x))*Y^( exp(X[2]+X[3]*x))*log(Y)/(1+ Y^( exp(X[2]+X[3]*x)))
F[3] <- sum(x) + sum(x*log(Y))*exp(X[2]+X[3]*x) -(X[1]+1)*X[2]*sum(exp(X[2]+X[3]*x)*Y^(exp(X[2]+X[3]*x)*log(Y)))/(1+ Y^( exp(X[2]+X[3]*x)))
# Solution
F
}
I'm not familiar with the nleqslv package, but unless there is a method defined to convert it to a data frame, that might not go so well. I'd make sure everything else is working before the conversion.
startx <- c(0.5,3,1) # start the answer search here
answers <- nleqslv(startx,model, Y = Y_sim, x = x)
answer_df <- as.data.frame(answers)

Problems in Numerical Integration through R [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I have the following function
f(x)∝|x| exp(-1/2 |x| )+1/(1+(x-40)^4 ),xϵR
I want to find out E(X) and E(X^3) through Simpson's method (numerical integration), Standard Monte Carlo approach, Acceptance-rejection sampling, Importance sampling, Metropolis-Hasting Algorithm, Gibbs sampling and then Bayesian model using MCMC (I have not decided yet).
How can I validate my results obtained from different methods?
I have tried to solve E(X) mathematically but fail to find any close form. This function can be divided over different parts as
absolute(x)*double exponential density + another function utilizing higher power (4) of X in inverse form.
Due to absolute (x) and range [-Inf, Inf] We always have to divide it over [-Inf, 0] and [0, Inf]. Through Integration by parts I was able to see first part as (absolute (x) + (x^2/2) over infinite range) + Integral of this part can't be found mathematically.
So I make use of the following code to get numerical integration result as
Library(stats)
integrand <- function(x) {x*(abs(x)* exp(-0.5*abs(x))+(1/(1+(x-40)^4)))}
integrate(integrand, lower = -Inf, upper = Inf)
thus the result is E(X)= 88.85766 with absolute error < 0.004
The results which I obtain from these methods are not similar for instance
(i) Through Simpsons method I got E(X) = 0.3222642 and E(X^3)=677.0711..
simpson_v2 <- function(fun, a, b, n=100) {
# numerical integral using Simpson's rule
# assume a < b and n is an even positive integer
if (a == -Inf & b == Inf) {
f <- function(t) (fun((1-t)/t) + fun((t-1)/t))/t^2
s <- simpson_v2(f, 0, 1, n)
} else if (a == -Inf & b != Inf) {
f <- function(t) fun(b-(1-t)/t)/t^2
s <- simpson_v2(f, 0, 1, n)
} else if (a != -Inf & b == Inf) {
f <- function(t) fun(a+(1-t)/t)/t^2
s <- simpson_v2(f, 0, 1, n)
} else {
h <- (b-a)/n
x <- seq(a, b, by=h)
y <- fun(x)
y[is.nan(y)]=0
s <- y[1] + y[n+1] + 2*sum(y[seq(2,n,by=2)]) + 4 *sum(y[seq(3,n-1, by=2)])
s <- s*h/3
}
return(s)
}
EX <- function(x) x*(abs(x)* exp(-0.5*abs(x))+(1/(1+(x-40)^4)))
simpson_v2(EX, -Inf, Inf, n=100)
EX3 <- function(x) (x^3)*(abs(x)* exp(-0.5*abs(x))+(1/(1+(x-40)^4)))
simpson_v2(EX3, -Inf, Inf, n=100)
(ii) Importance Sampling
My proposal density is Normal with mean=0 and standard deviation =4. The summary of the Importance sampling process I am applying is as follows
Suppose I can't sample from f(x) which is true as it has no well-known form and no built-in function is available in R to use for sampling. So, I propose another log cancave tail distribution N(0, 4) to take samples such that instead of estimating E(x) I estimate E(x*f(x)/N(0,1)). I use the following code for this which takes 100000 samples from N(0,4)
X <- rnorm(1e5, sd=4)
Y <- (X)*(abs(X)*exp(-0.5*abs(X))+(1/(1+(x-40)^4)))/(dnorm(X, sd=4))
mean(Y)
Since this code needs random sampling from Normal distribution therefore, each time I got different answers but it is something around -0.1710694 which is almost similar to 0.3222642. I got from Simpsons method. But these results are very different E(X)= 88.85766 from integrate(). Note that integrate() use the Adaptive quadrature method. Is this method different from Simpsons and Importance sampling? What similarity in results I should expect while comparing these methods

First, EX and EX3 definition is wrong, you miss minus under exponent
Well, here are some simplifications
If you integrate this part x*(abs(x)* exp(-0.5*abs(x))), from -infinity...infinity result would be 0
If you integrate this part x^3*(abs(x)* exp(-0.5*abs(x))), from -infinity...infinity result would be 0
Integral x^3/(1+(x-40)^4) from -infinity...infinity would be infinity, I would venture, you'll get logarithms which are infinite at infinity, see http://integrals.wolfram.com/index.jsp?expr=%28xxx%29%2F%281+%2B+%28x-a%29%5E4%29&random=false
Integral x/(1+(x-40)^4) looks like something resembling inverse tangent, though online integrator provides ugly output http://integrals.wolfram.com/index.jsp?expr=x%2F%281+%2B+%28x-a%29%5E4%29&random=false
UPDATE
Looks like your EX would be 40*\pi / \sqrt{2}
And EX3 is not infinity, I might be wrong here
UPDATE 2
Yep, EX3 is finite, should be a^2*EX + \pi*a*3/\sqrt{2}, where a is equal to 40
UPDATE 3
As noted, there is also a normalization required to get true values of EX and EX3
N = 8 + \pi/\sqrt{2}
Computed integrals to be divided by N to get proper moments.

To find the distance between two roots in R

Suppose I have a function f(x) that is well defined on an interval I. I want to find the greatest and smallest roots of f(x), then taking the difference of them. What is a good way to program it?
To be precise, f can at worst be a rational function like (1+x)/(1-x). It should be a (high degree) polynomial most of the times. I only need to know the result numerically to some precision.
I am thinking about the following:
Convert f(x) into a form recognizable by R. (I can do)
Use R to list all roots of f(x) on I (I found the uniroot function only give me one root)
Use R to to find the maximum and minimum elements in the list (should be possible once I converted it to a vector)
Taking the difference of the two roots. (should be trivial)
I am stuck on step (2) and I do not know what to do. My professor give a brutal force solution, suggesting me to do:
Divide interval I into one million pieces.
Evaluate f on each end points, find the end points where f>=0.
Choose the maximum and minimum elements from the set formed in step 2.
Take the difference between them.
I feel this way is not very efficient and might not work for all f in general, but I am having trouble to implement it even for quadratics. I do not know how to do step (2) as well. So I want to ask for a hint or some toy examples.
At this point I am trying to implement the following code:
Y=rep(0,200)
dim(Y)=c(100,2)
for(i in 1:100){
X=rnorm(9,0,1)
Z=rnorm(16,0,1)
a=0.64
b=a*sum(Z^2)/sum(X^2)
root_intervals <- function(f, interval, n = 1e6) {
xvals <- seq(interval[1], interval[2], length = n)
yvals <- f(xvals)
ypos <- yvals > 0
x1 <- which(tail(ypos, -1) != head(ypos, -1))
x2 <- x1 + 1
## so all the zeroes we can see are between x1 and x2
return(cbind(xvals[x1], xvals[x2]))
}
at here everything is okay, but when I try to extract the roots to Y[i,1], Y[i,2] by
Y[i,1]=(ri<-root intervals(function(x)(x/(a*x+b))^{9/2}*(1/((1-a)+a*(1-a)/b*x))^4-0.235505, c(0,40),n=1e6)[1]
I found I cannot evaluate it anymore. R keep telling me
Error: unexpected symbol in:
"}
Y[i,1]=(ri<-root intervals"
and I got stuck. I really appreciate everyone's help as I am feeling lost.
I checked the function's expression many times using the plot function and it has no grammar mistakes. Also I believe it is well defined for all X in the interval.

This should give you a good start on the brute force solution. You're right, it's not elegant, but for relatively simple univariate functions, evaluating 1 million points is trivial.
root_intervals <- function(f, interval, n = 1e6) {
xvals <- seq(interval[1], interval[2], length = n)
yvals <- f(xvals)
ypos <- yvals > 0
x1 <- which(ypos[-1] != head(ypos, -1))
x2 <- x1 + 1
## so all the zeroes we can see are between x1 and x2
return(cbind(xvals[x1], xvals[x2]))
}
This function returns a two column matrix of x values, where the function changes sign between column 1 and column 2:
f1 <- function (x) 0.05 * x^5 - 2 * x^4 + x^3 - x^2 + 1
> (ri <- root_intervals(f1, c(-10, 10), n = 1e6))
[,1] [,2]
[1,] -0.6372706 -0.6372506
[2,] 0.8182708 0.8182908
> f1(ri)
[,1] [,2]
[1,] -3.045326e-05 6.163467e-05
[2,] 2.218895e-05 -5.579081e-05
Wolfram Alpha confirms results on the specified interval.
The top and bottom rows will be the min and max intervals found. These intervals (over which the function changes sign) are precisely what uniroot wants for it's interval, so you could use it to solve for the (more) exact roots. Of course, if the function changes sign twice within one interval (or any even number of times), it won't be picked up, so choose a big n!
Response to edited question:
Looks like your trying to define a bunch of functions, but your edits have syntax errors. Here's what I think you're trying to do: (this first part might take some more work to work right)
my_funs <- list()
Y=rep(0,200)
dim(Y)=c(100,2)
for(i in 1:100){
X=rnorm(9,0,1)
Z=rnorm(16,0,1)
a=0.64
b=a*sum(Z^2)/sum(X^2)
my_funs[[i]] <- function(x){(x/(a*x+b))^{9/2}*(1/((1-a)+a*(1-a)/b*x))^4-0.235505}
}
Here's using the root_intervals on the first of your generated functions.
> root_intervals(my_funs[[1]], interval = c(0, 40))
[,1] [,2]
[1,] 0.8581609 0.8582009
[2,] 11.4401314 11.4401714
Notice the output, a matrix, with the roots of the function being between the first and second columns. Being a matrix, you can't assign it to a vector. If you want a single root, use uniroot using each row to set the upper and lower bounds. This is left as an exercise to the reader.

Generating random sample from the quantiles of unknown density in R [duplicate]

This question already has answers here:
How do I best simulate an arbitrary univariate random variate using its probability function?
(4 answers)
Closed 9 years ago.
How can I generate random sample data from the quantiles of the unknown density f(x) for x between 0 and 4 in R?
f = function(x) ((x-1)^2) * exp(-(x^3/3-2*x^2/2+x))

If I understand you correctly (??) you want to generate random samples with the distribution whose density function is given by f(x). One way to do this is to generate a random sample from a uniform distribution, U[0,1], and then transform this sample to your density. This is done using the inverse cdf of f, a methodology which has been described before, here.
So, let
f(x) = your density function,
F(x) = cdf of f(x), and
F.inv(y) = inverse cdf of f(x).
In R code:
f <- function(x) {((x-1)^2) * exp(-(x^3/3-2*x^2/2+x))}
F <- function(x) {integrate(f,0,x)$value}
F <- Vectorize(F)
F.inv <- function(y){uniroot(function(x){F(x)-y},interval=c(0,10))$root}
F.inv <- Vectorize(F.inv)
x <- seq(0,5,length.out=1000)
y <- seq(0,1,length.out=1000)
par(mfrow=c(1,3))
plot(x,f(x),type="l",main="f(x)")
plot(x,F(x),type="l",main="CDF of f(x)")
plot(y,F.inv(y),type="l",main="Inverse CDF of f(x)")
In the code above, since f(x) is only defined on [0,Inf], we calculate F(x) as the integral of f(x) from 0 to x. Then we invert that using the uniroot(...) function on F-y. The use of Vectorize(...) is needed because, unlike almost all R functions, integrate(...) and uniroot(...) do not operate on vectors. You should look up the help files on these functions for more information.
Now we just generate a random sample X drawn from U[0,1] and transform it with Z = F.inv(X)
X <- runif(1000,0,1) # random sample from U[0,1]
Z <- F.inv(X)
Finally, we demonstrate that Z is indeed distributed as f(x).
par(mfrow=c(1,2))
plot(x,f(x),type="l",main="Density function")
hist(Z, breaks=20, xlim=c(0,5))

Rejection sampling is easy enough:
drawF <- function(n) {
f <- function(x) ((x-1)^2) * exp(-(x^3/3-2*x^2/2+x))
x <- runif(n, 0 ,4)
z <- runif(n)
subset(x, z < f(x)) # Rejection
}
Not the most efficient but it gets the job done.

Use sample . Generate a vector of probablities from your existing function f, normalized properly. From the help page:
sample(x, size, replace = FALSE, prob = NULL)
Arguments
x Either a vector of one or more elements from which to choose, or a positive integer. See ‘Details.’
n a positive number, the number of items to choose from. See ‘Details.’
size a non-negative integer giving the number of items to choose.
replace Should sampling be with replacement?
prob A vector of probability weights for obtaining the elements of the vector being sampled.

In R, how do I find the optimal variable to minimise the correlation between two datasets [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
In R, how do I find the optimal variable to maximize or minimize correlation between several datasets
This can be done in Excel, but my dataset has gotten too large. In excel, I would use solver.
I have 5 variables and I want to recreate a weighted average of these 5 variables so that they have the lowest correlation to a 6th variable.
Column A,B,C,D,E = random numbers
Column F = random number (which I want to minimise the correlation to)
Column G = Awi1+Bwi2+C*2i3+D*wi4+wi5*E
where wi1 to wi5 are coefficients resulted from solver In a separate cell, I would have correl(F,G)
This is all achieved with the following constraints in mind:
1. A,B,C,D, E have to be between 0 and 1
2. A+B+C+D+E= 1
I'd like to print the results of this so that I can have an efficient frontier type chart.
How can I do this in R? Thanks for the help.

I looked at the other thread mentioned by Vincent and I think I have a better solution. I hope it is correct. As Vincent points out, your biggest problem is that the optimization tools for such non-linear problems do not offer a lot of flexibility for dealing with your constraints. Here, you have two types of constraints: 1) all your weights must be >= 0, and 2) they must sum to 1.
The optim function has a lower option that can take care of your first constraint. For the second constraint, you have to be a bit creative: you can force your weights to sum to one by scaling them inside the function to be minimized, i.e. rewrite your correlation function as function(w) cor(X %*% w / sum(w), Y).
# create random data
n.obs <- 100
n.var <- 6
X <- matrix(runif(n.obs * n.var), nrow = n.obs, ncol = n.var)
Y <- matrix(runif(n.obs), nrow = n.obs, ncol = 1)
# function to minimize
correl <- function(w)cor(X %*% w / sum(w), Y)
# inital guess
w0 <- rep(1 / n.var, n.var)
# optimize
opt <- optim(par = w0, fn = correl, method = "L-BFGS-B", lower = 0)
optim.w <- opt$par / sum(opt$par)