The following code chunk is for defining and integrating a function f1 involving matrix exponentials.
library(expm)
Lambdahat=rbind(c(-0.57,0.21,0.36,0,0),
c(0,-7.02,7.02,0,0),
c(1,0,-37.02,29,7.02),
c(0.03,0,0,-0.25,0.22),
c(0,0,0,0,0));
B=rbind(c(-1,1,0,0,0),c(0,0,0,0,0),c(0,0,0,0,0),c(0,0,0,0,0),c(0,0,0,0,0))
f1<-function(tau1)
{
A=(expm(Lambdahat*tau1)%*%B%*%expm(Lambdahat*(5-tau1)));
return(A[1,5]);
}
out=integrate(f1,lower=0,upper=5)#integration of f1
The integration in the above line gives the following error:
Error in integrate(f1, lower = 0, upper = 5) :
evaluation of function gave a result of wrong length
In addition: Warning messages:
1: In Lambdahat * tau1 :
longer object length is not a multiple of shorter object length
2: In Lambdahat * (t[i] - tau1) :
longer object length is not a multiple of shorter object length
To check for if the function outputs and inputs are of function f1 different length, 10 evenly spaced inputs and corresponding outputs of f1 are reported below. Input and output length for all the test cases were recorded as equal to 1.
sapply(X=seq(from=0,to=5,by=0.5),FUN=f1)
[1] 2.107718e-01 1.441219e-01 0.000000e+00 2.023337e+06 1.709569e+14
[6] 1.452972e+22 1.243012e+30 1.071096e+38 9.302178e+45 8.146598e+53
[11] 7.197606e+61
If anyone could share any hint or directions where the code may be going erroneous, it would be very helpful. Thanks very much!
The problem is that the function passed to integrate need to be vectorized, i.e. it should be able to receive a vector of input values and to return a vector of output values. I think f1 <- Vectorize(f1) could solve your problem.
Related
I'm trying to calculate crps using the verification package in R. The data appears to read in ok, but I get an error when trying to compute the CRPS itself: "invalid 'times' argument", however all values are real, no negative values and I'm testing for nan/na values and ignoring those. Having searched around I can't find any solution which explains why I'm getting this error. I'm reading the data in from netcdf files into larger arrays, and then computing CRPS for each grid cell in those arrays.
Any help would be greatly appreciated!
The relevant snipped from the code I'm using is:
##for each grid cell, get obs (wbarray) and 25 ensemble members of forecast eps (fcstarray)
for(x in 1:3600){
for(y in 1:1500){
obs=wbarray[x,y]
eps=fcstarray[x,y,1:25]
if(!is.na(obs)){
print(obs)
print(eps)
print("calculating CRPS - real value found")
crpsfcst=(crpsDecomposition(obs,eps)$CRPS)
CRPSfcst[x,y,w]=crpsfcst}}}
(w is specified in an earlier loop)
And the output I get:
obs: 0.3850737
eps: 0.3382506 0.3466184 0.3508921 0.3428135 0.3416993 0.3423528 0.3307764
0.3372431 0.3394377 0.3398165 0.3414395 0.3531360 0.3319155 0.3453161
0.3362813 0.3449474 0.3340050 0.3278898 0.3380596 0.3379150 0.3429202
0.3467927 0.3419354 0.3472489 0.3550797
"calculating CRPS - real value found"
Error in rep(0, nObs * (nMember +1)) : invalid 'times' argument
Calls: crpsDecomposition
Execution halted
If you type crpsDecomposition on your R command prompt you'll get the source code for the function. The first few lines show:
function (obs, eps)
{
nMember = dim(eps)[2]
nObs <- length(obs)
Since your eps data object appears to be (from your output) a one-dimensional vector, the second element of its dimension is going to be NULL, which sets nMember to NULL. Thus nObs*(nMember + 1) gets evaluated to 0. I imagine you simply need to re-examine what form eps should take because it would appear that it needs to be a matrix where each column corresponds to a different "member" (whatever that means in this context).
I reproduce a warning I get in integrate in R.
r <- list()
r$t <- 1:4
xsq <- function(x,r_obj){
r_obj$t == x
return(x^2)
}
integrate(f=xsq,0,1, r_obj = r)
The fourth line has no function in this toy example (that I constructed from a more complex function in which the line is needed). The problem is that the code above gives a warning related to this line:
0.3333333 with absolute error < 3.7e-15
Warning message:
In r_obj$t == x :
longer object length is not a multiple of shorter object length
Note that evaluating xsq at any value is not problematic, e.g. xsq(1, r_obj = r) does not give any warning. Also when r$t has less than 4 elements, e.g. r$t <- 1:3, the problem disappears. However, from ?integrate f is
An R function taking a numeric first argument and returning a numeric vector of the same length.
In my understanding xsq does just that. Where do I go wrong or why do I get a warning?
I'm trying to figure out how to integrate the following function in R:
item.fill.rate <- function(x, lt, ib, S){
1-((((1/(factorial(S)))*((x*lt*ib)^S)))/
(sum(((1/(factorial(0:S)))*((x*lt*ib)^(0:S))))))}
Where x is a variable and lt, ib and S are input parameters
Based on a previous topic on here, I tried the following:
int.func <- function(lt, ib, S){
item.fill.rate <- function(x){
1-((((1/(factorial(S)))*((x*lt*ib)^S)))/(sum(((1/(factorial(0:S)))*((x*lt*ib)^(0:S))))))
}
return(item.fill.rate)
}
integrate(int.func(0.25, 1, 1), lower=0.25, upper=0.75)$value
When applying this, I get the following error:
> integrate(int.func(0.25, 1, 1), lower=0.25, upper=0.75)$value
[1] 0.4947184
Warning messages:
1: In (x * lt * ib)^(0:S) :
longer object length is not a multiple of shorter object length
2: In (1/(factorial(0:S))) * ((x * lt * ib)^(0:S)) :
longer object length is not a multiple of shorter object length
I evaluated the length of those objects, but that did not give me any indication where the error must be.
I tried to be as specific as possible, so hopefully someone is able to help me out with this!
The sum function is notorious for returning single items when a longer vector was expected, so integrand functions that have a call to sum generally need to be "vectorized" so they deliver the expected results (a vector of the the same length as a provided "x"-vector) for integrate to succeed. The Vectorize function is a wrapper for sapply and is quite handy for this process. You can set the parameters in the call to integrate. (At the moment I think you may be integrating a constant over a domain of length 1/2.)
item.fill.rate <- function(x,lt, ib, S){
1-((((1/(factorial(S)))*((x*lt*ib)^S)))/(sum(((1/(factorial(0:S)))*((x*lt*ib)^(0:S))))))
}
vint <- Vectorize(item.fill.rate)
integrate(vint, S=1, lt=0.25, ib= 1, lower=0.25, upper=0.75)$value
#[1] 0.4449025
I try to run this line :
knn(mydades.training[,-7],mydades.test[,-7],mydades.training[,7],k=5)
but i always get this error :
Error in knn(mydades.training[, -7], mydades.test[, -7], mydades.training[, :
NA/NaN/Inf in foreign function call (arg 6)
In addition: Warning messages:
1: In knn(mydades.training[, -7], mydades.test[, -7], mydades.training[, :
NAs introduced by coercion
2: In knn(mydades.training[, -7], mydades.test[, -7], mydades.training[, :
NAs introduced by coercion
Any idea please ?
PS : mydades.training and mydades.test are defined as follow :
N <- nrow(mydades)
permut <- sample(c(1:N),N,replace=FALSE)
ord <- order(permut)
mydades.shuffled <- mydades[ord,]
prop.train <- 1/3
NOMBRE <- round(prop.train*N)
mydades.training <- mydades.shuffled[1:NOMBRE,]
mydades.test <- mydades.shuffled[(NOMBRE+1):N,]
I suspect that your issue lies in having non-numeric data fields in 'mydades'. The error line:
NA/NaN/Inf in foreign function call (arg 6)
makes me suspect that the knn-function call to the C language implementation fails. Many functions in R actually call underlying, more efficient C implementations, instead of having an algorithm implemented in just R. If you type just 'knn' in your R console, you can inspect the R implementation of 'knn'. There exists the following line:
Z <- .C(VR_knn, as.integer(k), as.integer(l), as.integer(ntr),
as.integer(nte), as.integer(p), as.double(train), as.integer(unclass(clf)),
as.double(test), res = integer(nte), pr = double(nte),
integer(nc + 1), as.integer(nc), as.integer(FALSE), as.integer(use.all))
where .C means that we're calling a C function named 'VR_knn' with the provided function arguments. Since you have two of the errors
NAs introduced by coercion
I think two of the as.double/as.integer calls fail, and introduce NA values. If we start counting the parameters, the 6th argument is:
as.double(train)
that may fail in cases such as:
# as.double can not translate text fields to doubles, they are coerced to NA-values:
> as.double("sometext")
[1] NA
Warning message:
NAs introduced by coercion
# while the following text is cast to double without an error:
> as.double("1.23")
[1] 1.23
You get two of the coercion errors, which are probably given by 'as.double(train)' and 'as.double(test)'. Since you did not provide us with exact details of how 'mydades' is, here are some of my best guesses (and an artificial multivariate normal distribution data):
library(MASS)
mydades <- mvrnorm(100, mu=c(1:6), Sigma=matrix(1:36, ncol=6))
mydades <- cbind(mydades, sample(LETTERS[1:5], 100, replace=TRUE))
# This breaks knn
mydades[3,4] <- Inf
# This breaks knn
mydades[4,3] <- -Inf
# These, however, do not introduce the coercion for NA-values error message
# This breaks knn and gives the same error; just some raw text
mydades[1,2] <- mydades[50,1] <- "foo"
mydades[100,3] <- "bar"
# ... or perhaps wrongly formatted exponential numbers?
mydades[1,1] <- "2.34EXP-05"
# ... or wrong decimal symbol?
mydades[3,3] <- "1,23"
# should be 1.23, as R uses '.' as decimal symbol and not ','
# ... or most likely a whole column is non-numeric, since the error is given twice (as.double problem both in training AND test set)
mydades[,1] <- sample(letters[1:5],100,replace=TRUE)
I would not keep both the numeric data and class labels in a single matrix, perhaps you could split the data as:
mydadesnumeric <- mydades[,1:6] # 6 first columns
mydadesclasses <- mydades[,7]
Using calls
str(mydades); summary(mydades)
may also help you/us in locating the problematic data entries and correct them to numeric entries or omitting non-numeric fields.
The rest of the run code (after breaking the data), as provided by you:
N <- nrow(mydades)
permut <- sample(c(1:N),N,replace=FALSE)
ord <- order(permut)
mydades.shuffled <- mydades[ord,]
prop.train <- 1/3
NOMBRE <- round(prop.train*N)
mydades.training <- mydades.shuffled[1:NOMBRE,]
mydades.test <- mydades.shuffled[(NOMBRE+1):N,]
# 7th column seems to be the class labels
knn(train=mydades.training[,-7],test=mydades.test[,-7],mydades.training[,7],k=5)
Great answer by#Teemu.
As this is a well-read question, I will give the same answer from an analytics perspective.
The KNN function classifies data points by calculating the Euclidean distance between the points. That's a mathematical calculation requiring numbers. All variables in KNN must therefore be coerce-able to numerics.
The data preparation for KNN often involves three tasks:
(1) Fix all NA or "" values
(2) Convert all factors into a set of booleans, one for each level in the factor
(3) Normalize the values of each variable to the range 0:1 so that no variable's range has an unduly large impact on the distance measurement.
I would also point out that the function seems to fail when using integers. I needed to convert everything into "num" type prior to calling the knn function. This includes the target feature, which most methods in R use the factor type. Thus, as.numeric(my_frame$target_feature) is required.
I want to integrate a function fun_integrate that has a vector vec as an input parameter:
fun_integrate <- function(x, vec) {
y <- sum(x > vec)
dnorm(x) + y
}
#Works like a charm
fun_integrate(0, rnorm(100))
integrate(fun_integrate, upper = 3, lower = -3, vec = rnorm(100))
300.9973 with absolute error < 9.3e-07
Warning message:
In x > vec :
longer object length is not a multiple of shorter object length
As far as I can see, the problem is the following: integrate calls fun_integrate for a vector of x that it computes based on upper and lower. This vectorized call seems not to work with another vector being passed as an additional argument. What I want is that integrate calls fun_integrate for each x that it computes internally and compares that single x to the vector vec and I'm pretty sure my above code doesn't do that.
I know that I could implement an integration routine myself, i.e. compute nodes between lower and upper and evaluate the function on each node separately. But that wouldn't be my preferred solution.
Also note that I checked Vectorize, but this seems to apply to a different problem, namely that the function doesn't accept a vector for x. My problem is that I want an additional vector as an argument.
integrate(Vectorize(fun_integrate,vectorize.args='x'), upper = 3, lower = -3, vec = rnorm(100),subdivisions=10000)
304.2768 with absolute error < 0.013
#testing with an easier function
test<-function(x,y) {
sum(x-y)
}
test(1,c(0,0))
[1] 2
test(1:5,c(0,0))
[1] 15
Warning message:
In x - y :
longer object length is not a multiple of shorter object length
Vectorize(test,vectorize.args='x')(1:5,c(0,0))
[1] 2 4 6 8 10
#with y=c(0,0) this is f(x)=2x and the integral easy to solve
integrate(Vectorize(test,vectorize.args='x'),1,2,y=c(0,0))
3 with absolute error < 3.3e-14 #which is correct
Roland's answer looks good. Just wanted to point out that it's sum , not integrate that is throwing the warning message.
Rgames> xf <- 1:10
Rgames> vf <- 4:20
Rgames> sum(xf>vf)
[1] 0
Warning message:
In xf > vf :
longer object length is not a multiple of shorter object length
The fact that the answer you got is not the correct value is what suggests that integrate is not sending the x-vector you expected to your function.