I observed the following phenomenon when using R and Matlab.
When I apply log to a negative number in R, I get the following error message:
Warning message:
In log(-1) : NaNs produced
However, when I apply log to a negative number in Matlab, I get e.g., the following complex numbers:
log(-1): 0.0000 + 3.1416i
log(-5): 1.6094 + 3.1416i
Is there any way to achieve the same behavior in R? Or is there anything in favor of the default option in R?
log gives you complex when you give it complex in the first place.
log(-1+0i)
# [1] 0+3.141593i
log(-5+0i)
# [1] 1.609438+3.141593i
I don't know why it doesn't give an option to do this by default, but then again I don't work in complex numbers all the time.
If you want to do this programmatically, you can use as.complex:
log(as.complex(-1))
# [1] 0+3.141593i
or even make a helper function simplify it for you:
mylog <- function(x, ...) log(as.complex(x), ...)
mylog(-1)
# [1] 0+3.141593i
Related
The function callmultmoments computes moments of the normal distribution.
The function automatically prints "Sum of powers is odd. Moment is 0." if the sume of the powers is odd. Is there any way to supress that under the condition that the original function should stay untouched.
Ex:
require(symmoments)
# Compute the moment for the 4-dimensional moment c(1,1,3,4):
m.1134 <- callmultmoments(c(1,1,3,4))
EDIT:
As described here we can use
## Windows
sink("nul")
...
sink()
## UNIX
sink("/dev/null") # now suppresses
.... # do stuff
sink() # to undo prior suppression, back to normal now
However, I am writing a package so I want it to be platform independent. Any ideas what to do instead?
The issue is due to the fact that the function has multiple print statements, where stop, warning, or message would have been appropriate so that people can use suppressWarnings or suppressMessages.
You can work arount it using invisible(capture.output()) around your whole assignment (not just the right side).
f1 <- function(n, ...){
print("Random print statement")
cat("Random cat statement\n")
rnorm(n = n, ...)
}
f1(2)
#> [1] "Random print statement"
#> Random cat statement
#> [1] -0.1115004 -1.0830523
invisible(capture.output(x <- f1(2)))
x
#> [1] 0.0464493 -0.1453540
See also suppress messages displayed by "print" instead of "message" or "warning" in R.
This message from callmultmoments can be suppressed by simply avoiding a moment that is not even. Any odd central moment, such as c(1,1,3,4) as in your example will have an expected value of 0 mathematically. That is, the expected value of a CENTRAL moment such as E[X^1 Y^1 Z^3 W^4], where the sum of powers, such as 1+1+3+4, is odd is automatically 0.
The following code chunk is for defining and integrating a function f1 involving matrix exponentials.
library(expm)
Lambdahat=rbind(c(-0.57,0.21,0.36,0,0),
c(0,-7.02,7.02,0,0),
c(1,0,-37.02,29,7.02),
c(0.03,0,0,-0.25,0.22),
c(0,0,0,0,0));
B=rbind(c(-1,1,0,0,0),c(0,0,0,0,0),c(0,0,0,0,0),c(0,0,0,0,0),c(0,0,0,0,0))
f1<-function(tau1)
{
A=(expm(Lambdahat*tau1)%*%B%*%expm(Lambdahat*(5-tau1)));
return(A[1,5]);
}
out=integrate(f1,lower=0,upper=5)#integration of f1
The integration in the above line gives the following error:
Error in integrate(f1, lower = 0, upper = 5) :
evaluation of function gave a result of wrong length
In addition: Warning messages:
1: In Lambdahat * tau1 :
longer object length is not a multiple of shorter object length
2: In Lambdahat * (t[i] - tau1) :
longer object length is not a multiple of shorter object length
To check for if the function outputs and inputs are of function f1 different length, 10 evenly spaced inputs and corresponding outputs of f1 are reported below. Input and output length for all the test cases were recorded as equal to 1.
sapply(X=seq(from=0,to=5,by=0.5),FUN=f1)
[1] 2.107718e-01 1.441219e-01 0.000000e+00 2.023337e+06 1.709569e+14
[6] 1.452972e+22 1.243012e+30 1.071096e+38 9.302178e+45 8.146598e+53
[11] 7.197606e+61
If anyone could share any hint or directions where the code may be going erroneous, it would be very helpful. Thanks very much!
The problem is that the function passed to integrate need to be vectorized, i.e. it should be able to receive a vector of input values and to return a vector of output values. I think f1 <- Vectorize(f1) could solve your problem.
I am trying to do an association network using some expression data I have, the data is really huge: 300 samples and ~30,000 genes. I would like to apply a Gaussian graphical model to my data using the huge R package.
Here is the code I am using
dim(data)
#[1] 317 32291
huge.out <- huge.npn(data)
huge.stars <- huge.select(huge.out, criterion="stars")
However in this last step I got an error:
Error in cor(x) : ling....in progress:10%
Missing values present in input variable 'x'. Consider using use = 'pairwise.complete.obs'
Any help would be very appreciated.
You posted this exact question on Rhelp today. Both SO and Rhelp deprecate cross-posting but if you do choose to switch venues it is at the very least courteous to inform the readership.
You responded to the suggestion here on SO that there were missing data in your data-object named 'data' by claiming there were no missing data. So what does this code return:
lapply(data , function(x) sum(is.na(x)))
That would be a first level check, but there could also be an error caused by a later step that encountered a missing value in the matrix of correlation coefficients in the matrix 'huge.out". That could happen if there were: a) infinities in the calculations or b) if one of the columns were constant:
> cor(c(1:10,Inf), 1:11)
[1] NaN
> cor(rep(2,7), rep(2,7))
[1] NA
Warning message:
In cor(rep(2, 7), rep(2, 7)) : the standard deviation is zero
So the next check is:
sum( is.na(huge.out) )
That will at least give you some basis for defending your claim of no missings and will also give you a plausible theory as to the source of the error. To locate a column that is entirely constant you might do something like this (assuming it were a dataframe):
which(sapply(sapply(data, unique), length) > 1)
If it's a matrix, you need to use apply.
I try to run this line :
knn(mydades.training[,-7],mydades.test[,-7],mydades.training[,7],k=5)
but i always get this error :
Error in knn(mydades.training[, -7], mydades.test[, -7], mydades.training[, :
NA/NaN/Inf in foreign function call (arg 6)
In addition: Warning messages:
1: In knn(mydades.training[, -7], mydades.test[, -7], mydades.training[, :
NAs introduced by coercion
2: In knn(mydades.training[, -7], mydades.test[, -7], mydades.training[, :
NAs introduced by coercion
Any idea please ?
PS : mydades.training and mydades.test are defined as follow :
N <- nrow(mydades)
permut <- sample(c(1:N),N,replace=FALSE)
ord <- order(permut)
mydades.shuffled <- mydades[ord,]
prop.train <- 1/3
NOMBRE <- round(prop.train*N)
mydades.training <- mydades.shuffled[1:NOMBRE,]
mydades.test <- mydades.shuffled[(NOMBRE+1):N,]
I suspect that your issue lies in having non-numeric data fields in 'mydades'. The error line:
NA/NaN/Inf in foreign function call (arg 6)
makes me suspect that the knn-function call to the C language implementation fails. Many functions in R actually call underlying, more efficient C implementations, instead of having an algorithm implemented in just R. If you type just 'knn' in your R console, you can inspect the R implementation of 'knn'. There exists the following line:
Z <- .C(VR_knn, as.integer(k), as.integer(l), as.integer(ntr),
as.integer(nte), as.integer(p), as.double(train), as.integer(unclass(clf)),
as.double(test), res = integer(nte), pr = double(nte),
integer(nc + 1), as.integer(nc), as.integer(FALSE), as.integer(use.all))
where .C means that we're calling a C function named 'VR_knn' with the provided function arguments. Since you have two of the errors
NAs introduced by coercion
I think two of the as.double/as.integer calls fail, and introduce NA values. If we start counting the parameters, the 6th argument is:
as.double(train)
that may fail in cases such as:
# as.double can not translate text fields to doubles, they are coerced to NA-values:
> as.double("sometext")
[1] NA
Warning message:
NAs introduced by coercion
# while the following text is cast to double without an error:
> as.double("1.23")
[1] 1.23
You get two of the coercion errors, which are probably given by 'as.double(train)' and 'as.double(test)'. Since you did not provide us with exact details of how 'mydades' is, here are some of my best guesses (and an artificial multivariate normal distribution data):
library(MASS)
mydades <- mvrnorm(100, mu=c(1:6), Sigma=matrix(1:36, ncol=6))
mydades <- cbind(mydades, sample(LETTERS[1:5], 100, replace=TRUE))
# This breaks knn
mydades[3,4] <- Inf
# This breaks knn
mydades[4,3] <- -Inf
# These, however, do not introduce the coercion for NA-values error message
# This breaks knn and gives the same error; just some raw text
mydades[1,2] <- mydades[50,1] <- "foo"
mydades[100,3] <- "bar"
# ... or perhaps wrongly formatted exponential numbers?
mydades[1,1] <- "2.34EXP-05"
# ... or wrong decimal symbol?
mydades[3,3] <- "1,23"
# should be 1.23, as R uses '.' as decimal symbol and not ','
# ... or most likely a whole column is non-numeric, since the error is given twice (as.double problem both in training AND test set)
mydades[,1] <- sample(letters[1:5],100,replace=TRUE)
I would not keep both the numeric data and class labels in a single matrix, perhaps you could split the data as:
mydadesnumeric <- mydades[,1:6] # 6 first columns
mydadesclasses <- mydades[,7]
Using calls
str(mydades); summary(mydades)
may also help you/us in locating the problematic data entries and correct them to numeric entries or omitting non-numeric fields.
The rest of the run code (after breaking the data), as provided by you:
N <- nrow(mydades)
permut <- sample(c(1:N),N,replace=FALSE)
ord <- order(permut)
mydades.shuffled <- mydades[ord,]
prop.train <- 1/3
NOMBRE <- round(prop.train*N)
mydades.training <- mydades.shuffled[1:NOMBRE,]
mydades.test <- mydades.shuffled[(NOMBRE+1):N,]
# 7th column seems to be the class labels
knn(train=mydades.training[,-7],test=mydades.test[,-7],mydades.training[,7],k=5)
Great answer by#Teemu.
As this is a well-read question, I will give the same answer from an analytics perspective.
The KNN function classifies data points by calculating the Euclidean distance between the points. That's a mathematical calculation requiring numbers. All variables in KNN must therefore be coerce-able to numerics.
The data preparation for KNN often involves three tasks:
(1) Fix all NA or "" values
(2) Convert all factors into a set of booleans, one for each level in the factor
(3) Normalize the values of each variable to the range 0:1 so that no variable's range has an unduly large impact on the distance measurement.
I would also point out that the function seems to fail when using integers. I needed to convert everything into "num" type prior to calling the knn function. This includes the target feature, which most methods in R use the factor type. Thus, as.numeric(my_frame$target_feature) is required.
I am experimenting with clustering in R for the first time and have been looking at the basic R help online and tried to compare the outcome of 2 cluster solutions.
I copied and pasted the script being careful to make sure that I had named the relevant data sets correctly first, but keep getting an error message that i don't understand.
Any ideas?
The script is simply:
comparing 2 cluster solutions
library(fpc)
cluster.stats (d, fit1$cluster, fit2$cluster)
and the error message I am getting is:
> library(fpc)
> cluster.stats(d, fit1$cluster, fit2$cluster)
Error in as.matrix.dist(d) :
length of 'dimnames' [1] not equal to array extent
In addition: Warning messages:
1: In as.dist.default(d) : NAs introduced by coercion
2: In as.dist.default(d) : non-square matrix
3: In as.matrix.dist(d) :
number of items to replace is not a multiple of replacement length
Thanks
the d object should contain a matrix of distances (usually symetrical matrix with zeroes on diagonal). in R can obtain the distance matrix using
d <- dist(clustering_result)