lapply needs boolean after if-statement condition - r

I'm new to R. I wrote a function that applies to numbers and want to apply it to a numeric of length 400. It goes
EGIDS.to.IUCN <- function(x){
if(x==10){return(NA)} # 10 (Extinct)
if(x==9){return(NA)} # 9 (Dormant)
if(x==8.5){return(4)} # 8.5 (Nearly Extinct) → 4 (Critically endangered)
# 10 more similar lines here (no more NAs)
else{stop}
}
I tried using lapply but then I get
> austroIUCN <- lapply(austroEGIDS, EGIDS.to.IUCN)
Error in if (x == 10) { : missing value where TRUE/FALSE needed
Where austroEGIDS is a list of 400 numbers from 0 to 10. I'm totally lost here. Why does it expect a boolean after closing the if condition?

It would be more efficient if you use a numeric vector and work with vectorized statements:
austroIUCN <- unlist(austroEGIDS)
austroIUCN[austroIUCN==10 | austroIUCN==9] <- NA
austroIUCN[austroIUCN==8.5] <- 4
...
Each statements sets all entries with the given level.

Without the stop this should work,
EGIDS.to.IUCN <- function(x) {
if (is.na(x)){ NA } else
if (x == 10) { NA } else
if (x == 9) { NA } else
if(x == 8.5) { 4 } else
NA
}
or, more readable and faster,
EGIDS.to.IUCN <- function(x){
switch (x, 'NA'=NA, '10'=NA, '9'=NA, '8.5'=4, NA)
}
austroEGIDS <- sample(seq(1, 10, .5), 400, replace = TRUE)
austroIUCN <- sapply(austroEGIDS, EGIDS.to.IUCN)
table(unlist(austroIUCN), useNA = "ifany")
austroIUCN
4 <NA>
23 377
Or if you want it to stop and throw an error if not a match,
EGIDS.to.IUCN <- function(x){
switch (x, 'NA'=NA, '10'=NA, '9'=NA, '8.5'=4, stop("Not a match!"))
}

Related

how to create function with input from dataframe and apply it over all rows?

I try to write a function in R which takes several variables from a dataframe as input and gives a vector with results as output.
Based on this post below I did write the function below.
How can create a function using variables in a dataframe
Although I receive this warning message:
the condition has length > 1 and only the first element will be used
I have tried to solve it by the post below using sapply in the function although I do not succeed.
https://datascience.stackexchange.com/questions/33351/what-is-the-problem-with-the-condition-has-length-1-and-only-the-first-elemen
# a data frame with columns a, x, y and z:
myData <- data.frame(a=1:5,
x=(2:6),
y=(11:15),
z=3:7)
myFun3 <- function(df, col1 = "x", col2 = "y", col3 = "z"){
result <- 0
if(df[,col1] == 2){result <- result + 10
}
if(df[,col2] == 11){result <- result + 100
}
return(result)
}
myFun3(myData)
> Warning messages:
> 1: In if (df[, col1] == 2) { :
> the condition has length > 1 and only the first element will be used
> 2: In if (df[, col2] == 11) { :
> the condition has length > 1 and only the first element will be used
Can someone explain me how I can apply the function over all rows of the dataframe?
Thanks a lot!
We need ifelse instead of if/else as if/else is not vectorized
myFun3 <- function(df, col1 = "x", col2 = "y", col3 = "z"){
result <- numeric(nrow(df))
ifelse(df[[col1]] == 2, result + 10,
ifelse(df[[col2]] == 11, result + 100, result))
}
myFun3(myData)
#[1] 10 0 0 0 0
Or the OP's code can be Vectorized after making some changes i.e. remove the second if with an else if ladder
myFun3 <- Vectorize(function(x, y){
result <- 0
if(x == 2) {
result <- result + 10
} else if(y == 11){
result <- result + 100
} else result <- 0
return(result)
})
myFun3(myData$x, myData$y)
#[1] 10 0 0 0 0
Regarding the OP's doubts about when multiple conditions are TRUE, then want only the first to be executed, the ifelse (nested - if more than two) or if/else if/else (else if ladder or if/else nested) both works because it is executed in that same order we specified the condition and it stops as soon as a TRUE condition occurred i.e. suppose we have multiple conditions
if(expr1) {
1
} else if(expr2) {
2
} else if(expr3) {
3
} else if(expr4) {
4
} else {
5}
checks the first expression ('expr1') first, followed by second, and so on. The moment it return TRUE, it exit i.e. it is a nested condition
if(expr1) {
1
} else {
if(expr2) {
2
} else {
if(expr3) {
3
} else {
if(expr4) {
4
} else 5
}
}
}
There is a cost for this i.e.. whereever we have the more values that matches the 1, only the expr1 is executed and thus saves time, but if there are more 5 values, then all those conditions are checked

Why subscript is out of bounds during for loops execution?

In my script is it possible to get d[[x]] "empty". I tried to do it with else, but it does not work.
How to write else so that it can give a result of checking zero?
for (x in 1:licznik3)
{
if(a[[x]] > d[[x]])
{
out3[[x]] <- wspolne3[[x]]
}
else (a[[x]] < d[[x]])
{
out3[[x]] <- NA
}
}
variables:
> a
[1] 0.1
> d
numeric(0)
> licznik3
[1] 16
Error in d[[x]] : subscript out of bounds
Example:
I have 3 loops.
If a[[x]] is greater than d[[x]]
this value goes to out3
and the next loop checks a similar condition.
My problem is that in the second loop (shown code) d[[x]] can be empty (in the previous loop no value was greater than a[[x]])
Then we have how
> a
  [1] 0.1
> d
numeric (0)
Just add additional check that your counter aka licznik3 ;) would not exceed length of vector d. If it exceeds break for-loop.
a <- 1:10
licznik3 <- 7
out3 <- rep_along(a, NA)
wspolne3 <- 2:12
d <- -c(1, 4, 2)
for (x in 1:licznik3) {
if (x > length(d)) {
break
}
if (a[[x]] > d[[x]]) {
out3[[x]] <- wspolne3[[x]]
} else {
out3[[x]] <- NA
}
}
out3

About missing value where TRUE/FALSE needed in R

I want to return the number of times in string vector v that the element at the next successive index has more characters than the current index.
Here's my code
BiggerPairs <- function (v) {
numberOfTimes <- 0
for (i in 1:length(v)) {
if((nchar(v[i+1])) > (nchar(v[i]))) {
numberOfTimes <- numberOfTimes + 1
}
}
return(numberOfTimes)
}
}
missing value where TRUE/FALSE needed.
I do not know why this happens.
The error you are getting is saying that your code is trying to evaluate a missing value (NA) where it expects a number. There are likely one of two reasons for this.
You have NA's in your vector v (I suspect this is not the actual issue)
The loop you wrote is from 1:length(v), however, on the last iteration, this will try the loop to try to compare v[n+1] > v[n]. There is no v[n+1], thus this is a missing value and you get an error.
To remove NAs, try the following code:
v <- na.omit(v)
To improve your loop, try the following code:
for(i in 1:(length(v) -1)) {
if(nchar(v[i + 1]) > nchar(v[i])) {
numberOfTimes <- numberOfTimes + 1
}
}
Here is some example dummy code.
# create random 15 numbers
set.seed(1)
v <- rnorm(15)
# accessing the 16th element produces an NA
v[16]
#[1] NA
# if we add an NA and try to do a comparison, we get an error
v[10] <- NA
v[10] > v[9]
#[1] NA
# if we remove NAs and limit our loop to N-1, we should get a fair comparison
v <- na.omit(v)
numberOfTimes <- 0
for(i in 1:(length(v) -1)) {
if(nchar(v[i + 1]) > nchar(v[i])) {
numberOfTimes <- numberOfTimes + 1
}
}
numberOfTimes
#[1] 5
Is this what you're after? I don't think there is any need for a for loop.
I'm generating some sample data, since you don't provide any.
# Generate some sample data
set.seed(2017);
v <- sapply(sample(30, 10), function(x)
paste(sample(letters, x, replace = T), collapse = ""))
v;
#[1] "raalmkksyvqjytfxqibgwaifxqdc" "enopfcznbrutnwjq"
#[3] "thfzoxgjptsmec" "qrzrdwzj"
#[5] "knkydwnxgfdejcwqnovdv" "fxexzbfpampbadbyeypk"
#[7] "c" "jiukokceniv"
#[9] "qpfifsftlflxwgfhfbzzszl" "foltth"
The following vector marks the positions with 1 in v where entries have more characters than the previous entry.
# The following vector has the same length as v and
# returns 1 at the index position i where
# nchar(v[i]) > nchar(v[i-1])
idx <- c(0, diff(nchar(v)) > 0);
idx;
# [1] 0 0 0 0 1 0 0 1 1 0
If you're just interested in whether there is any entry with more characters than the previous entry, you can do this:
# If you just want to test if there is any position where
# nchar(v[i+1]) > nchar(v[i]) you can do
any(idx == 1);
#[1] TRUE
Or count the number of occurrences:
sum(idx);
#[1] 3

Vectorized (non-loop) solution returns wrong result (solution with for-loop returns correct result)

I have theoretically identical solutions, one is vectorized solution and another is with for-loop. But vectorized solution returns wrong result and I want to understand why. Solution's logic is simple: need to replace NA with previous non-NA value in the vector.
# vectorized
f1 <- function(x) {
idx <- which(is.na(x))
x[idx] <- x[ifelse(idx > 1, idx - 1, 1)]
x
}
# non-vectorized
f2 <- function(x) {
for (i in 2:length(x)) {
if (is.na(x[i]) && !is.na(x[i - 1])) {
x[i] <- x[i - 1]
}
}
x
}
v <- c(NA,NA,1,2,3,NA,NA,6,7)
f1(v)
# [1] NA NA 1 2 3 3 NA 6 7
f2(v)
# [1] NA NA 1 2 3 3 3 6 7
The two pieces of code are different.
The first one replace NA with the previous element if this one is not NA.
The second one replace NA with the previous element if this one is not NA, but the previous element can be the result of a previous NA substitution.
Which one is correct really depends on you. The second behaviour is more difficult to vectorize, but there are some already implemented functions like zoo::na.locf.
Or, if you only want to use base packages, you could have a look at this answer.
These two solutions are not equivalent. The first function is rather like:
f2_as_f1 <- function(x) {
y <- x # a copy of x
for (i in 2:length(x)) {
if (is.na(y[i])) {
x[i] <- y[i - 1]
}
}
x
}
Note the usage of the y vector.

FOR loop in R; not getting what I want

Just a general question:
When I run:
ok<-NULL
for (i in 1:3) {
ok[i]=i^2
i=i+1
}
The loop works (as expected).
> ok
[1] 1 4 9
Now when I try to do something like:
ok<-NULL
for (i in 1:3) {
ok[i]=i^2
x[i]<-ok[i]+1
y[i]<-cbind(ok[i],x)
i=i+1
}
And I want:
y = 1
2
4
5
9
10
Instead I get:
Warning messages:
1: In y[i] <- rbind(ok[i], x) :
number of items to replace is not a multiple of replacement length
2: In y[i] <- rbind(ok[i], x) :
number of items to replace is not a multiple of replacement length
3: In y[i] <- rbind(ok[i], x) :
number of items to replace is not a multiple of replacement length
4: In y[i] <- rbind(ok[i], x) :
number of items to replace is not a multiple of replacement length
5: In y[i] <- rbind(ok[i], x) :
number of items to replace is not a multiple of replacement length
Thanks in advance.
You should read up on R basics before starting to program.
You don't have to increment i in the loop (actually its quite confusing).
You don't cbind or rbind vectors this is for data.frame columns and rows.
y <- NULL
for(i in 1:3){ ok <- i^2; x <- ok + 1; y <- c(y, ok, x) }
or:
as.vector(sapply(1:3, function(i){ ok <- i^2; x <- ok + 1; c(ok, x) }))
With this command y[i]<-cbind(ok[i],x) you attempt to replace one element in the vector with several. This causes an error.
If you want to to get 1:3 squared, you would use:
ok <- (1:3)^2
ok
# [1] 1 4 9
If you want to get 1:3 squared, along with the numbers right after them, you might try:
as.vector(rbind(ok, ok+1))
[1] 1 2 4 5 9 10
for loops in R are often the wrong solution to your problem.

Resources