R: f(x) != sapply(x,f) -- bug or feature? - r

> f = function(x) as.Date(as.character(x), format='%Y%m%d')
> f(20110606)
[1] "2011-06-06"
> sapply(20110606, f)
[1] 15131
Why 2 returned values are not the same. I need to apply this function to a long vector of dates, but I'm not getting dates with sapply()!

The functions you use to create f are already vectorized. There's no need to use sapply, unless you work for the Department of Redundancy Department.
> f <- function(x) as.Date(as.character(x), format='%Y%m%d')
> d <- 20110606 + 0:10
> f(d)
[1] "2011-06-06" "2011-06-07" "2011-06-08" "2011-06-09"
[5] "2011-06-10" "2011-06-11" "2011-06-12" "2011-06-13"
[9] "2011-06-14" "2011-06-15" "2011-06-16"

> lapply(20110606, f)
[[1]]
[1] "2011-06-06"
> unlist(lapply(20110606, f))
[1] 15131
sapply unlists lapply and in doing so unclasses the date
> unclass(lapply(20110606, f)[[1]])
[1] 15131
> class(lapply(20110606, f)[[1]])
[1] "Date"
as #Joshua Ulrich noted there is no need to use apply type functions however for interest
d <- 20110606 + 0:10
do.call("c",lapply(d, f))
would be one possible way to "unlist" the dates

Related

Looping efficiently over several lists without creating x for loops in R

I am trying to loop simultaneously over several lists. However, I am using several for-loops to reach the desired result. I am wondering if there is a better, more efficient way to do this.
Here's my code
for (i in list(3,6)){
for (n in list("rmse.","mape.", "mpe.")){
for (b in list("base.", "rev.")){
a <- paste0(b, "fcst")
c <- paste0(n, a)
x <- paste0(c, i)
print(x)
}
}
}
And here the result:
[1] "rmse.base.fcst3"
[1] "rmse.rev.fcst3"
[1] "mape.base.fcst3"
[1] "mape.rev.fcst3"
[1] "mpe.base.fcst3"
[1] "mpe.rev.fcst3"
[1] "rmse.base.fcst6"
[1] "rmse.rev.fcst6"
[1] "mape.base.fcst6"
[1] "mape.rev.fcst6"
[1] "mpe.base.fcst6"
[1] "mpe.rev.fcst6"
I am pretty sure that there is a more efficient way, maybe with lapply?
Storing the results in one list, would be very helpful as well, so any suggestions are more than welcome.
This will generate a dataframe with the last column x being what you need,
df = expand.grid(i = c(3,6),
n = c("rmse.", "mape.", "mpe."),
b = c("base.", "rev.")) %>%
mutate(x = paste0(n, b, "fcst", i))
Just do
df %>% select(x)
to view them. You can store it somewhere as well to refer to it.
zz <- expand.grid(n,b,"fcst",i)
do.call(paste0, zz)
# [1] "rmse.base.fcst3" "mape.base.fcst3" "mpe.base.fcst3" "rmse.rev.fcst3"
# [5] "mape.rev.fcst3" "mpe.rev.fcst3" "rmse.base.fcst6" "mape.base.fcst6"
# [9] "mpe.base.fcst6" "rmse.rev.fcst6" "mape.rev.fcst6" "mpe.rev.fcst6"
Input:
n <- c("rmse.","mape.", "mpe.")
b <- c("base.", "rev.")
i <- c(3,6)

Rounding Error when converting from character to numeric

I have a data.table of data numbers in character format that I am trying to convert to numeric numbers. However the issue is that the numbers are very long and I want to retain all of the numbers without any rounding from R. For examle the first 5 elements of the data.table:
> TimeO[1]
[1] "20110630224701281482"
> TimeO[2]
[1] "20110630224701281523"
> TimeO[3]
[1] "20110630224701281533"
> TimeO[4]
[1] "20110630224701281548"
> TimeO[5]
[1] "20110630224701281762"
I wrote a function to convert from a character into numeric:
convert_time_fast <- function(tim){
b <- tim - tim%/%10^12*10^12
# hhmmssffffff
ms <- b%%10^6; b <-(b-ms)/10^6
ss <- b%%10^2; b <-(b-ss)/10^2
mm <- b%%10^2; hh <-(b-mm)/10^2
# if hours>=22, subtract 24 (previous day)
hh <- hh - (hh>=22)*24
return(hh+mm/60+ss/3600+ms/(3600*10^6))
}
However the rounding occurs in R so datapoints now have the same time. See first 5 elements after converting:
TimeOC <--convert_time_fast(as.numeric(TimeO))
> TimeOC[1]
[1] 1.216311
> TimeOC[2]
[1] 1.216311
> TimeOC[3]
[1] 1.216311
> TimeOC[4]
[1] 1.216311
> TimeOC[5]
[1] 1.216311
Any help figuring this out would be greatly appreciated!
You should test to see if they are really equal (all.equal()).
Usually R limits the number of digits it prints (usually to 7), but they are still there.
See also this example:
> as.numeric("1.21631114")
[1] 1.216311
> as.numeric("1.21631118")
[1] 1.216311
> all.equal(as.numeric("1.21631114"), as.numeric("1.21631118"))
[1] "Mean relative difference: 3.288632e-08" # which indicates they're not the same

ifelse() with unexpected results in R

I have a several bins in my data frame.
[1] "bin1" "bin2" "bin3" "bin4" "bin5" "bin6"
I have a bin number, and would like to exclude everything EXCEPT that bin and the previous bin. If bin=1, I would only like to exclude everything except bin1 (bin0 does not exist).
To produce a vector of names of bins to exclude later from my data frame, I produce:
BinsToDelete <- ifelse(i>1, paste("bin",1:6,sep="")[-((i-1):i)],paste("bin",1:6,sep="")[-i])
For ease of understanding
> i=3
> paste("bin",1:6,sep="")[-((i-1):i)]
[1] "bin1" "bin4" "bin5" "bin6"
> paste("bin",1:6,sep="")[-i]
[1] "bin1" "bin2" "bin4" "bin5" "bin6"
Weirdly an ifelse statement produces this:
> i=3
> BinsToDelete <- ifelse(i==1, paste("bin",1:6,sep="")[-i],paste("bin",1:6,sep="")[-((i-1):i)])
> BinsToDelete
[1] "bin1"
What happened there?
A normal if-else statement gives the desired results:
> if(i==1){
BinsToDelete <- paste("bin",1:6,sep="")[-i]
} else { BinsToDelete <- paste("bin",1:6,sep="")[-((i-1):i)]}
> BinsToDelete
[1] "bin1" "bin4" "bin5" "bin6"
Thanks for helping me understand how ifelse arrives to this conclusion.
From ?ifelse
Value:
A vector of the same length and attributes (including dimensions
and ‘"class"’) as ‘test’ and data values from the values of ‘yes’
or ‘no’.
In your case:
> i <- 3
> length(i)
[1] 1
So you got a length 1 output
I generally avoid ifelse when possible because I think the resulting code is aesthetically unpleasing. The fact that it removes class attributes and makes handling factors and Dates and data-times difficult is a further reason to avoid it. The grep` funciton is designed to return a vector suitable for indexed selection:
> z=3
> grep( paste0("bin",(z-1):z, collapse="|") , x)
[1] 2 3
> x[ grep( paste0("bin",(z-1):z, collapse="|") , x)]
[1] "bin2" "bin3"
> z=1
> x[ grep( paste0("bin",(z-1):z, collapse="|") , x)]
[1] "bin1"
My understanding is that the dplyr if_else function addresses some of those issues.

How to compare two vectors in R

I want to compare two vectors but it is not working, kindly tell me how two vectors can be compared:
x <- c(1,2,3,4)
y <- c(5,6,7,8)
if (x==y) print("same") else print("different")
Use all can work here.
> all(x==y)
[1] FALSE
> y1=c(5,6,7,8)
> all(y==y1)
[1] TRUE
EDIT
best is to use isTRUE(all.equal(x,y)) to avoid recycling
recycling
> x=c(5,6,5,6)
> y=c(5,6)
> all(x==y)
[1] TRUE
better way
> isTRUE(all.equal(x,y))
[1] FALSE
> isTRUE(all.equal(y,y1))
[1] TRUE
> x=c(5,6,5,6)
> y=c(5,6)
>isTRUE(all.equal(x,y))
[1] FALSE
When it comes to array comparison, all and any are your friends. If you do not really mean geometric vector but array of values, sort should also be necessary:
> all(sort(x)==sort(y))
Try:
x <- c(1,2,3,4)
y <- c(5,6,7,8)
if(identical(x,y)) print("identical") else print("not identical")

p.adjust error: 'oderVector1'

I'm having a problem with the function p.adjust. I have a list containing 741 p-values and I want to use the p.adjust() function to correct for multiple testing (FDR testing). This is what I have so far:
> x <- as.vector(pvalues1)
> p.adjust(x, method="fdr" n=length(x))
But I get the following error
Error in order (p, decreasing = TRUE) :
unimplemented type 'list' in 'orderVector1'
Can anyone help me with this?
The problem you have is the your list containing the p-values is a vector already. What you wanted was a numeric vector. A list is just a general vector:
> l <- list(A = runif(1), B = runif(1))
> l
$A
[1] 0.7053136
$B
[1] 0.7053284
> as.vector(l)
$A
[1] 0.7053136
$B
[1] 0.7053284
> is.vector(l)
[1] TRUE
One option is to unlist() the list, to produce a numeric vector:
> unlist(l)
A B
0.7053136 0.7053284
the benefit of that is that it preserves the names. An alternative is plain old as.numeric(), which looses the names, but is otherwise the same as unlist():
> as.numeric(l)
[1] 0.7053136 0.7053284
For big vectors, you might not want to use the names in unlist(), so an alternative that will speed that version up is:
> unlist(l, use.names = FALSE)
[1] 0.7053136 0.7053284

Resources