I wondering why my first if statement returns Error when my input data is an object of class numeric?
I have clearly stated for the first if statement to only turn on IF the data class is "data.frame", but when data class is numeric, this first if statement return an error! am I missing anything here?
Update:
I have changed instances of & to && but when data is a data.frame, the function doesn't produce any output? For example, run: standard(mtcars)
standard <- function(data){
if(class(data) == "data.frame" && ncol(data) > 1){
data[paste0(names(data), ".s")] <- scale(data)
data
}
if(class(data) == "data.frame" && ncol(data) == 1){
colnames(data) <- paste0(names(data), ".s")
data <- scale(data)
data
}
if(class(data) != "data.frame"){
d <- as.data.frame(data)
colnames(d) <- paste0("Var", ncol(d), ".s")
data <- scale(d)
data
}
}
###### EXAMPLES: #######
standard(mtcars[,2]) ##Problem: `Error in if(class(data) == "data.frame" & ncol(data) > 1)`
standard(mtcars["wt"]) ## OK
standard(mtcars) ## after UPDATE, doesn't give any output
am I missing anything here?
& evaluate both elements while && does not
FALSE && stop("boh")
#R> [1] FALSE
TRUE && stop("boh")
#R> Error: boh
FALSE & stop("boh")
#R> Error: boh
See help("Logic")
& and && indicate logical AND and | and || indicate logical OR. The shorter form performs elementwise comparisons in much the same way as arithmetic operators. The longer form evaluates left to right examining only the first element of each vector. Evaluation proceeds only until the result is determined.
After your edits
You do not get any results because you do not call return or use if else. See help("function") and help("if"). Here is a small example
f1 <- function(x){
if(x < 0){
x <- -1
x
}
if(x > 0){
x <- 1
x
}
}
f1(-1)
f2 <- function(x){
if(x < 0){
x <- -1
x
}
else if(x > 0){
x <- 1
x
}
}
f2(-1)
#R> [1] -1
f3 <- function(x){
if(x < 0){
x <- -1
return(x)
}
if(x > 0){
x <- 1
return(x)
}
}
f3(-1)
#R> [1] -1
tl;dr you should use && rather than & when doing flow-control, because & always evaluates its second argument, while && short-circuits if the first argument is false. If the argument isn't a data frame (or matrix) then ncol(x) doesn't make sense: see e.g. this question for more information.
Go ahead and unpack it with a simple example.
x <- 1:5
The first part is fine:
class(x) ## "integer"
class(x)=="data.frame" ## TRUE
(although note that you have to be careful, because class(x) might be a vector with more than one element: inherits(x,"data.frame") is safer).
The second part causes the problem:
ncol(x) ## NULL (uh-oh)
ncol(x)>1 ## numeric(0) (uh-oh)
Put them together:
class(x)=="data.frame" & ncol(x)>1 ## logical(0)
What does this do?
if (logical(0)) print("hello")
Error in if (logical(0)) print("hello") : argument is of length zero
Related
logically this code should make sense, I'm primarily a python programmer but I'm unsure why this is not working. It is not returning any errors. What I want is for this vector of primarily zeros to be changed to a vector of only 1's and -1's (hence using the sample function). My issue is that the values of the vector are not being updated, they are just staying as 0 and I'm not sure why.
Y = numeric(100)
for (i in 100){
x <- sample(1:2, 1)
if (x == 2){
Y[i] = 1
}
else{
Y[i] = -1
}
}
I've also changed the Y[i] = 1 to Y[i] <- 1 but this has not helped. I also know that x is either 1 or 2 because I test it manually using x == 2...etc
The only other issue I could think of is that x is an integer while the numbers sample returns are not but per checking this: (Note that x = 2L after the loop exited)
> typeof(x)
[1] "integer"
> typeof(2)
[1] "double"
> x == 2
[1] TRUE
I don't think it is the problem.
Any suggestions?
Because the loop is just run once i.e. the last iteration. It did change in the output vector Y
tail(Y)
#[1] 0 0 0 0 0 -1
Instead it would be 1:100
for(i in 1:100)
The second issue raised is with the typeof 'x'. Here, we are sampleing an integer with 1:2 instead of a numeric vector and that returns the same type as the input. According to ?':'
For numeric arguments, a numeric vector. This will be of type integer if from is integer-valued and the result is representable in the R integer type, otherwise of type "double"
typeof(1:2)
#[1] "integer"
typeof(c(1, 2))
#[1] "double"
Another option if it is a range (:) is to wrap with as.numeric
for (i in 1:100){
x <- sample(as.numeric(1:2), 1)
if (x == 2){
Y[i] = 1
}
else{
Y[i] = -1
}
}
check the type
typeof(Y)
#[1] "double"
typeof(x)
#[1] "double"
Also, R is a vectorized language so this:
x<-sample(1:2, 100, replace = TRUE)
Y<-ifelse(x==2, 1, -1)
will run about 1000 times faster than your loop.
x <- c(1:10)
only_even <- function(x){
if(x %% 2 == 0 && is.na(x) < 1){
return(x)
}else{
print("Not even or real")
}
}
only_even(x)
Returns
"Not even or real"
even though there are clearly even numbers in X (1:10).
x <- c(1:10)
only_even <- function(x){
if(x %% 2 == 0){
return(x)
}else{
print("Not even or real")
}
}
only_even(x)
Returns
Warning message:
In if (x%%2 == 0) { :
the condition has length > 1 and only the first element will be used
IM confused by both results. Especially the second error "the condition has length >1 and only the first element will be used". When creating if statements, does it only apply to the vector/input as a whole? Instead of individually going through each value? Is that why im getting the error about condition has length > 1?
As was mentioned in the comments, ifelse() is the vectorized version of if(). You're right that if() is for evaluating a single condition -- specifically, it's for evaluating the first condition if it is supplied with a boolean vector input.
x <- 1:5
y <- rep(3, 5)
ifelse(x > y, "yes", "no")
## [1] "no" "no" "no" "yes" "yes"
if(x > y) "yes" else "no"
## [1] "no"
## Warning message:
## In if (x > y) "yes" else "no" :
## the condition has length > 1 and only the first element will be used
Of course, things like any(), all(), etc. can be used to collapse a boolean vector into a single boolean element for use with vanilla if().
I have created a function that converts "YYYYQQ" to integer YYYYMMDD. The function works well with individual values in a list but not on the whole list. I am not unable to understand the warning message.
GetProperDate <- function(x) {
x <- as.character(x)
q<-substr(x, 5, 6)
y<-substr(x, 1,4) %>% as.numeric()
if(q=="Q1"){
x <- as.integer(paste0(y,"03","31"))
}
if(q=="Q2"){
x <- as.integer(paste0(y,"06","30"))
}
if(q=="Q3"){
x <- as.integer(paste0(y,"09","30"))
}
if(q=="Q4"){
x <- as.integer(paste0(y,"12","31"))
}
return(x)
}
> GetProperDate("2019Q1")
[1] 20190331
> GetProperDate("2019Q2")
[1] 20190630
> GetProperDate("2019Q3")
[1] 20190930
> GetProperDate("2019Q4")
[1] 20191231
> date.list<-c("2019Q1","2019Q2","2019Q3","2019Q4")
> date.list.converted<- date.list %>% GetProperDate()
Warning messages:
1: In if (q == "Q1") { :
the condition has length > 1 and only the first element will be used
2: In if (q == "Q2") { :
the condition has length > 1 and only the first element will be used
3: In if (q == "Q3") { :
the condition has length > 1 and only the first element will be used
4: In if (q == "Q4") { :
the condition has length > 1 and only the first element will be used
> date.list.converted
[1] 20190331 20190331 20190331 20190331
>
As shown above I am getting a warning message and the output is not as expected.
The issue is you have written a function GetProperDate which is not vectorised. if is used for scalar inputs and not vector. You may switch to ifelse which is vectorised and rewrite your function.
Apart from that you can also use as.yearqtr from zoo which is used to handle quarterly dates and get the last date of the quarter by using frac = 1.
as.Date(zoo::as.yearqtr(date.list), frac = 1)
#[1] "2019-03-31" "2019-06-30" "2019-09-30" "2019-12-31"
When you pass a vector to the function,it is comparing vector with a scalar. R automatically takes the first element of the vector. thats why you get warning as the condition has length > 1 and only the first element will be used..Try this
date.list<-c("2019Q1","2019Q2","2019Q3","2019Q4")
date.list.converted <- sapply(date.list, function(s) GetProperDate(s))
Try this:
library(tidyverse)
GetProperDate <- function(x) {
x <- as.character(x)
q <- substr(x, 5, 6)
y <- substr(x, 1,4) %>%
as.numeric()
x <- case_when(
q=="Q1" ~ as.integer(paste0(y,"03","31")),
q =="Q2" ~ as.integer(paste0(y,"06","30")),
q == "Q3" ~ as.integer(paste0(y,"09","30")),
TRUE ~ as.integer(paste0(y,"12","31")))
return(x)
}
date.list<-c("2019Q1","2019Q2","2019Q3","2019Q4")
GetProperDate(date.list)
> GetProperDate(date.list)
[1] 20190331 20190630 20190930 20191231
I want to identify which values in one vector are present in another vector. Sometimes, in my application, none of the values of the first vector are present; in such cases I would like NA. My current approach returns integer(0) when this occurs:
l <- 1:3
m <- 2:5
n <- 4:6
l[l %in% m]
1] 2 3
l[l %in% n]
integer(0)
This post discusses how to capture integer(0) using length, but is there a way to avoid integer(0) in the first place, and do this operation in just one step? Answers to the previous question suggest that any could be used but I fail to see how that would work in this example.
You could catch the integer(0) with a custom function:
l <- 1:3
m <- 2:5
n <- 4:6
returnsafe <- function(a, b) {
result <- a[a %in% b]
if(is.integer(result) && length(result) == 0L) {
return(NA)
} else {
return(result)
}
}
> returnsafe(l, n)
[1] NA
You can do:
l[match(l, n)]
[1] NA NA NA
Or:
any(l[match(l, n)])
[1] NA
Main question
In what practical programming situations or R "idioms" would you only want to check the first element of each of two vectors for logical comparison? (I.e. disregarding the rest of each vector as in && and ||.)
I can see the use of & and | in R, where they do element-wise logical comparison of two vectors. But I cannot see a real life practical use of their sibling operators && and ||. Can anyone provide a clear example of their use?
The documentation ,help("&&"), says:
The longer form evaluates left to right examining only the first element of each vector.
Evaluation proceeds only until the result is determined.
The longer form is appropriate for
programming control-flow and typically preferred in if clauses.
The issue for me is the following: I interpret the documentation of && and || to say that for logical vectors x and y, the && and || operators only use x[1] and y[1] to provide a result.
> c(TRUE, FALSE, FALSE) && c(TRUE, FALSE)
[1] TRUE
> c(TRUE, FALSE, FALSE) && c(FALSE, FALSE)
[1] FALSE
> c(FALSE, FALSE, FALSE) && c(TRUE, FALSE)
[1] FALSE
> c(FALSE, FALSE, FALSE) && c(FALSE, FALSE)
[1] FALSE
I don't see any "programming control-flow" situations where I would have two logical vectors and I would disregard any values past the first element of each.
It seems that x && y acts like x[1] & y[1], and x || y acts like x[1] | y[1].
Benchmarks
Here's a test function that evaluates how often these formulations return the same result using randomly generated logical vectors of different lengths. This suggests that they are doing the same thing.
> test <- function( n, maxl=10 ) {
foo <- lapply( X=seq_len( n ), FUN=function(i) {
x <- runif( n=sample( size=1, maxl ) ) > 0.5
y <- runif( n=sample( size=1, maxl ) ) > 0.5
sameres <- all.equal( (x||y), (x[1]|y[1]) )
sameres
} )
table( unlist( foo ) )
}
test( 10000 )
Yields:
TRUE
10000
Here's a benchmarking test on which is faster. It start by creating a list of lists, where each of N items in dat is a list containing two randomly generated logical vectors. Then we apply each of the variants on the same data to see which is faster.
library(rbenchmark)
N <- 100
maxl <- 10
dat <- lapply( X=seq_len(N), FUN=function(i) {
list( runif( n=sample( size=1, maxl ) ) > 0.5,
runif( n=sample( size=1, maxl ) ) > 0.5) } )
benchmark(
columns=c("test","replications","relative"),
lapply(dat, function(L){ L[[1]] || L[[2]] } ),
lapply(dat, function(L){ L[[1]][1] | L[[2]][1] } )
)
Yields the following output (removed the \n characters and extra whitespace):
test replications relative
2 lapply(dat, function(L) { L[[1]][1] | L[[2]][1] }) 100 1.727
1 lapply(dat, function(L) { L[[1]] || L[[2]] }) 100 1.000
Clearly, the || formulation is faster than cherry picking the first element of each argument. But I'm still curious as to why one would need such an operator.
I guess that there are a couple of reasons, but probably the most important one is the short-circuit behavior. If a evaluates to FALSE in a && b, then b is not evaluated. Similarly, if a evaluates to TRUE in a || b, then b is not evaluated. This allows writing code like
v <- list(1, 2, 3, 4, 5)
idx <- 6
if (idx < length(v) && v[[idx]] == 5) {
foo
} else {
bar
}
Otherwise one needs to write this (maybe) as
if (idx < length(v)) {
if (v[idx] == 5) {
foo
} else {
bar
}
} else {
bar
}
which is 1) much less readable, and 2) repeats bar, which is bad if bar is a bigger piece of code.
You cannot use & in the if condition, because your index would be out of bounds, and this is not allowed for lists in R:
if (idx < length(v) & v[[idx]] == 5) {
foo
} else {
bar
}
# Error in v[[idx]] : subscript out of bounds
Here is a small illustration of the short-circuit behavior:
t <- function() { print("t called"); TRUE }
f <- function() { print("f called"); FALSE }
f() && t()
# [1] "f called"
# [1] FALSE
f() & t()
# [1] "f called"
# [1] "t called"
# [1] FALSE
t() || f()
# [1] "t called"
# [1] TRUE
t() | f()
# [1] "t called"
# [1] "f called"
# [1] TRUE