comparing two numerical variables in R - r

I have run the R function stl() function and use its generated residuals for grubbs test. The code is the following:
stl.res = stl(dataset, s.windows='periodic')
residuals = as.numeric(strl.res$time.series[, "remainder"])
grubbs.result = grubbs.test(residuals)
strsplit(grubbs.result$alternative," ")[[1]][3]
## [1] "38.4000349179783"
outlier = as.numeric(strsplit(grubbs.result$alternative," ")[[1]][3])
outlier
## [1] 38.40003492
which(residuals == outlier)
## integer(0)
My question is why the return value of which() is 0. Actually residuals[1920] = 38.4000349179783. So the call of which() should return a value of 1921, not 0. I guess this is a problem with precision. I have tried many ways, but not managed to solve it.

If it's really a precision issue (which would be R FAQ 7.31), then there are various ways to get around it.
# this is an approximation of your test
x <- c(1, 2, 38.4000349179783, 4, 5)
y <- 38.40003492
> x == y
[1] FALSE FALSE FALSE FALSE FALSE
# so which doesn't return anything
# one basic approach
> which(abs(x - y) < .00001)
[1] 3
You could also rig up something using all.equal(), but checking for a difference less than your pre-selected limit is probably easiest.

Maybe you can use isTrue() and all.equal() as below:
which(sapply(residuals, function(v) isTrue(all.equal(v,outlier))) == T)

Related

How do I solve a function for a given x?

I have been browsing through quite some help pages, but I did not find the solution for my - probably - simple problem.
I defined a function
funB <- function(x) (0.8042851 +
((3.9417843-0.8042851)/(1+((x/0.4039609)^(-3.285016)))))
and would like to solve it for a given x (say, x = 0.2).
How do I do that? I have looked at uniroot() and polyroot(), but they did not seem to fit my function.
Just to be sure that there is a root where you are expecting it, plot the graph of funB.
curve(funB)
Define an auxiliary function, f, taking an extra argument and solve this new function for a = <target_value>.
f <- function(x, a) funB(x) - a
uniroot(f, interval = c(0, 1e3), a = 2)
#$root
#[1] 0.3485097
#
#$f.root
#[1] -0.0001305644
#
#$iter
#[1] 12
#
#$init.it
#[1] NA
#
#$estim.prec
#[1] 6.103516e-05
If you wanted to find the value of x such that funB(x) was equal to 0.2, you would do something like this:
funB <- function(x) (0.8042851 +
((3.9417843-0.8042851)/(1+((x/0.4039609)^(-3.285016)))))
target <- 0.2
uniroot(function(x) funB(x)-target, interval=c(-5,10))
but there's a problem. It's up to you to pick an interval value that brackets a root (i.e. funB(x)<0.2 for the lower value and >0.2 for the upper value, or vice versa. funB is NaN for x<0, 0.8042851 for x==0, and increasing for x>0 (try curve(funB, from=-5, to=100, n=1001) for example). So the solution you want (if I've guessed right about the meaning of your question) doesn't seem to exist.
note: in general a negative value raised to a negative power is NaN in R (even in cases where the answer "should" be defined, e.g. (-8)^(1/3) is the cube root of -8, which is -2 ...). If you're sure you know what you're doing you could replace (x/a)^b with sign(x)*(abs(x)/a)^b) ... (if you make this change, the function appears well-behaved for x>-0.4 and funB(x)-0.2 does have a root between -0.3 and -0.2 ... but I have no idea if this makes sense for your application or not)
Well, I guess I must like doing things the hard way. I just rearranged your function to find its inverse:
funC <- function(y) (((3.137499)/(y - 0.8042851) - 1)^(-1/3.285016)) * 0.4039609
So if I want to know when funB(x) == 3.7 I can do:
funC(3.7)
#> [1] 0.860193
and sure enough
funB(0.860193)
#> [1] 3.7
or indeed
funB(funC(1))
#> [1] 1
And as others have pointed out, x doesn't have a real value at funB(x) == 0.2 as you can see in this plot:
curve(funC, 0, 4)
Now, if you really want to know the complex root where funB(x) == 0.2 then you can modify funC like so:
funC <- function(y) (((3.137499)/(as.complex(y) - 0.8042851) - 1)^(-1/3.285016)) * 0.4039609
So now:
funC(0.2)
#> [1] 0.1336917+0.1894797i
And therefore the answer to your question is 0.1336917 +/- 0.1894797i
funB(complex(real = 0.133691691, imaginary = 0.1894797))
[1] 0.1999996+0i
Close enough.
funB <- function(x) (0.8042851 + ((3.9417843-0.8042851)/(1+((x/0.4039609)^(-3.285016)))))
# call the function with desired input
funB(0.2)
...and the output:
> funB(0.2)
[1] 1.087758
>

How to transform the object of a function in r?

I want to create a function that transforms its object.
I have tried to transform the variable as you would normally, but within the function.
This works:
vec <- c(1, 2, 3, 3)
vec <- (-1*vec)+1+max(vec, na.rm = T)
[1] 3 2 1 1
This doesn't work:
vec <- c(1, 2, 3, 3)
func <- function(x){
x <- (-1*x)+1+max(x, na.rm = T))
}
func(vec)
vec
[1] 1 2 3 3
R is functional so normally one returns the output. If you want to change
the value of the input variable to take on the output value then it is normally done by the caller, not within the function. Using func from the question it would normally be done like this:
vec <- func(vec)
Furthermore, while you can overwrite variables it is, in general, not a good
idea. It makes debugging difficult. Is the current value of vec the
input or output and if it is the output what is the value of the input? We
don't know since we have overwritten it.
func_ovewrite
That said if you really want to do this despite the comments above then:
# works but not recommended
func_overwrite <- function(x) eval.parent(substitute({
x <- (-1*x)+1+max(x, na.rm = TRUE)
}))
# test
v <- c(1, 2, 3, 3)
func_overwrite(v)
v
## [1] 3 2 1 1
Replacement functions
Despite R's functional nature it actually does provide one facility for overwriting although the function in the question is not really a good candidate for it so let us change the example to provide a function incr which increments the input variable by a given value. That is, it does this:
x <- x + b
We can write this in R as:
`incr<-` <- function(x, value) x + value
# test
xx <- 3
incr(xx) <- 10
xx
## [1] 13
T vs. TRUE
One other comment. Do not use T for true. Always write it out. TRUE is a reserved name in R but T is a valid variable name so it can lead to hard to find errors such as when someone uses T for temperature.

Is there an elegant, built-in way to do modulo indexing in R?

Currently, I have
extract_modulo = function(x, n, fn=`[`) fn(x, (n-1L) %% length(x) + 1L)
`%[mod%` = function (x, n) extract_modulo(x, n)
And then:
seq(12) %[mod% 14
#[1] 2
Is this already built into R somewhere? I would think so, because R has several functions that recycle values (e.g., paste). However, I'm not finding anything with help('[['), ??index, or ??mod. I would think an R notation for this would be something like seq(12)[/14/] or as.list(seq(12))[[/14/]], for example.
rep_len() is a fast .Internal function, and appropriate for this use or when recycling arguments in your own function. For this particular case, where you're looking for the value at an index position beyond the length of a vector, rep_len(x, n)[n] will always do what you're looking for, for any nonnegative whole number 'n', and any non NULL x.
rep_len(seq(12), 14)[14]
# [1] 2
rep_len(letters, 125)[125]
# [1] "u"
And if it turns out you didn't need to recycle x, it works just as fine with an n value that is less than length(x)
rep_len(seq(12), 5)[5]
# [1] 5
rep_len(seq(12), 0)[0]
# integer(0)
# as would be expected, there is nothing there
You could of course create a wrapper if you'd like:
recycle_index <- function(x, n) rep_len(x, n)[n]
recycle_index(seq(12), 14)
# [1] 2

Reduce with less than symbol

I never think to use Reduce but I have a problem I thought it would be good for. I want to make sure the size of each iterative element of a vector is equal to or larger than the previous element. I can do this with sapply but my attempt with Reduce fails. How can I use this with Reduce?
#This works
y <- c(1,2,3,2,4,4)
sapply(seq_along(y)[-length(y)], function(i) y[i] <= y[i+1])
#attempts
Reduce('<', c(1,2,3,2,4,4)), accumulate = TRUE)
Reduce('<', c(1,2,3,2,4,4)))
The diff() function would be a logical choice here (others having explained nicely why Reduce() is not appropriate). It is already set up to compare the differences between elements of a vector and is already vectorised.
> !diff(y) < 0
[1] TRUE TRUE FALSE TRUE TRUE
Desparately bored? I was:
myFun <- function(x,z){
if(is.null(names(z))) names(z) <- z
if(is.null(names(x))) names(x) <- x
if(as.numeric(names(x)) < as.numeric(names(z))) res <- TRUE else res <- FALSE
names(res) <- names(z)
return(res)
}
as.logical(Reduce(myFun, y, accumulate = TRUE)[-1])
# [1] TRUE TRUE FALSE TRUE TRUE
It is my understanding from ?Reduce that Reduce compares the first and second element. Since 1 < 2 returns 1. It will reuse 1 and then compare it to the third element and so on. This means you will always compare 1 < y[3:length(y)] which turns out to be always true. Alternatively you could try:
head(y,-1) < tail(y, -1)
I don't think it can be used as Reduce will in general end up with something like f(f(x[1],x[2]),x[3]), so your comparison for the third element will be TRUE < 3.
identical(y,sort(y))
would appear to be a more efficient solution for this problem.

How to count TRUE values in a logical vector

In R, what is the most efficient/idiomatic way to count the number of TRUE values in a logical vector? I can think of two ways:
z <- sample(c(TRUE, FALSE), 1000, rep = TRUE)
sum(z)
# [1] 498
table(z)["TRUE"]
# TRUE
# 498
Which do you prefer? Is there anything even better?
The safest way is to use sum with na.rm = TRUE:
sum(z, na.rm = TRUE) # best way to count TRUE values
which gives 1.
There are some problems with other solutions when logical vector contains NA values.
See for example:
z <- c(TRUE, FALSE, NA)
sum(z) # gives you NA
table(z)["TRUE"] # gives you 1
length(z[z == TRUE]) # f3lix answer, gives you 2 (because NA indexing returns values)
Additionally table solution is less efficient (look at the code of table function).
Also, you should be careful with the "table" solution, in case there are no TRUE values in the logical vector. See for example:
z <- c(FALSE, FALSE)
table(z)["TRUE"] # gives you `NA`
or
z <- c(NA, FALSE)
table(z)["TRUE"] # gives you `NA`
Another option which hasn't been mentioned is to use which:
length(which(z))
Just to actually provide some context on the "which is faster question", it's always easiest just to test yourself. I made the vector much larger for comparison:
z <- sample(c(TRUE,FALSE),1000000,rep=TRUE)
system.time(sum(z))
user system elapsed
0.03 0.00 0.03
system.time(length(z[z==TRUE]))
user system elapsed
0.75 0.07 0.83
system.time(length(which(z)))
user system elapsed
1.34 0.28 1.64
system.time(table(z)["TRUE"])
user system elapsed
10.62 0.52 11.19
So clearly using sum is the best approach in this case. You may also want to check for NA values as Marek suggested.
Just to add a note regarding NA values and the which function:
> which(c(T, F, NA, NULL, T, F))
[1] 1 4
> which(!c(T, F, NA, NULL, T, F))
[1] 2 5
Note that which only checks for logical TRUE, so it essentially ignores non-logical values.
Another way is
> length(z[z==TRUE])
[1] 498
While sum(z) is nice and short, for me length(z[z==TRUE]) is more self explaining. Though, I think with a simple task like this it does not really make a difference...
If it is a large vector, you probably should go with the fastest solution, which is sum(z). length(z[z==TRUE]) is about 10x slower and table(z)[TRUE] is about 200x slower than sum(z).
Summing up, sum(z) is the fastest to type and to execute.
Another option is to use summary function. It gives a summary of the Ts, Fs and NAs.
> summary(hival)
Mode FALSE TRUE NA's
logical 4367 53 2076
>
which is good alternative, especially when you operate on matrices (check ?which and notice the arr.ind argument). But I suggest that you stick with sum, because of na.rm argument that can handle NA's in logical vector.
For instance:
# create dummy variable
set.seed(100)
x <- round(runif(100, 0, 1))
x <- x == 1
# create NA's
x[seq(1, length(x), 7)] <- NA
If you type in sum(x) you'll get NA as a result, but if you pass na.rm = TRUE in sum function, you'll get the result that you want.
> sum(x)
[1] NA
> sum(x, na.rm=TRUE)
[1] 43
Is your question strictly theoretical, or you have some practical problem concerning logical vectors?
There's also a package called bit that is specifically designed for fast boolean operations. It's especially useful if you have large vectors or need to do many boolean operations.
z <- sample(c(TRUE, FALSE), 1e8, rep = TRUE)
system.time({
sum(z) # 0.170s
})
system.time({
bit::sum.bit(z) # 0.021s, ~10x improvement in speed
})
I've been doing something similar a few weeks ago. Here's a possible solution, it's written from scratch, so it's kind of beta-release or something like that. I'll try to improve it by removing loops from code...
The main idea is to write a function that will take 2 (or 3) arguments. First one is a data.frame which holds the data gathered from questionnaire, and the second one is a numeric vector with correct answers (this is only applicable for single choice questionnaire). Alternatively, you can add third argument that will return numeric vector with final score, or data.frame with embedded score.
fscore <- function(x, sol, output = 'numeric') {
if (ncol(x) != length(sol)) {
stop('Number of items differs from length of correct answers!')
} else {
inc <- matrix(ncol=ncol(x), nrow=nrow(x))
for (i in 1:ncol(x)) {
inc[,i] <- x[,i] == sol[i]
}
if (output == 'numeric') {
res <- rowSums(inc)
} else if (output == 'data.frame') {
res <- data.frame(x, result = rowSums(inc))
} else {
stop('Type not supported!')
}
}
return(res)
}
I'll try to do this in a more elegant manner with some *ply function. Notice that I didn't put na.rm argument... Will do that
# create dummy data frame - values from 1 to 5
set.seed(100)
d <- as.data.frame(matrix(round(runif(200,1,5)), 10))
# create solution vector
sol <- round(runif(20, 1, 5))
Now apply a function:
> fscore(d, sol)
[1] 6 4 2 4 4 3 3 6 2 6
If you pass data.frame argument, it will return modified data.frame.
I'll try to fix this one... Hope it helps!
I've just had a particular problem where I had to count the number of true statements from a logical vector and this worked best for me...
length(grep(TRUE, (gene.rep.matrix[i,1:6] > 1))) > 5
So This takes a subset of the gene.rep.matrix object, and applies a logical test, returning a logical vector. This vector is put as an argument to grep, which returns the locations of any TRUE entries. Length then calculates how many entries grep finds, thus giving the number of TRUE entries.

Resources