I would like to write a function with multiple conditions within lapply. I know how to define multiple conditions using a for loop. However I would like to avoid looping this time.
For instance:
let's assume there is a vector (vctr) with numbers from -3 to 5:
set.seed(10)
vctr <- c(sample(-3:5), rep(0,3), sample(-3:5), 0)
and let's define two conditions:
condition_1 <- if the number is equal to 0 -> add 1 to the initial
value
condition_2 <- if the number is not equal to 0 -> leave it be
And this works perfectly fine:
test_list <- lapply(1:length(vctr), function(x) if(vctr[x]==0) vctr[x] +1 else vctr[x])
However, what about the situation in which there would be multiple conditions? For instance:
condition_1 <- if the number is equal to 0 -> add 1
condition_2 <- if the number is negative -> replace it with absolute value
condition_3 <- if the number is greater than 0 but lower than 3 ->
add 2 to the initial value
condition_4 <- if the number is equal or greater than 3 -> leave it be
I tried the following with conditions 1, 2 and "leave it be" however this syntax does not work.
test_list2 <- lapply(1:length(vctr), function(x) if(vctr[x]==0) vctr[x] +1 if else(vctr[x]<0) abs(vctr[x]) else vctr[x])
EDIT: I would like to ask for non-dplyr solutions.
you can replace sapply with lapply if you want a list output
sapply(vctr, function(x) {
if (x == 0) y <- x + 1
if (x < 0) y <- abs(x)
if (x > 0 & x < 3) y <- x + 2
if (x >= 3) y <- x
return(y)
})
[1] 5 2 1 4 1 3 3 3 4 1 1 1 3 2 4 3 1 5 1 3 4 1
Related
H_D<-function(level, zero, ...){
special<-c(0,0,0)
D<-list(special,...)
cell <- do.call(expand.grid, lapply(level, seq)) # create all cell
support <- apply(cell, 1, function(x) +(x != zero)) # create all support set
# provide subset H_D (support sets and given vectors matches
hd<-lapply(D, function (x) cell[colSums(support==x)==length(x),])
h_D<-do.call(rbind, hd)
rownames(h_D)<-1:nrow(h_D)
return(h_D)
}
level<-c(3,2,4)
zero<-c(1,2,1)
y<-c(0,1,1)
H_D(level,zero,y)
> H_D(level,zero,y)
Var1 Var2 Var3
1 1 2 1
2 1 1 2
3 1 1 3
4 1 1 4
My function works fine for the above situation as colSums works for data frame. But if my argument is a vector instead of data frame this is not working. I am getting the following errors. My input argument could a vector or a data frame. How can I incorporate both in my above mention function?
level = 3
zero = 2
y<-1
H_D(level,zero,y)
> H_D(level,zero,y)
Error in colSums(support == x) :
'x' must be an array of at least two dimensions
I tried drop=FALSE, but not working!
We could change the function with an if/else based on the number of columns of 'cell'. If it is one column, then just do the subset or else do the other part of computation
H_D <- function(level, zero, ...){
special <- c(0,0,0)
D <- list(special,...)
cell <- do.call(expand.grid, lapply(level, seq)) # create all cell
if(ncol(cell) == 1) {
h_D <- subset(cell, Var1 != zero)
} else {
support <- apply(cell, 1, function(x) +(x != zero)) # create all support set
# provide subset H_D (support sets and given vectors matches
hd <- lapply(D, function (x) cell[colSums(support==x)==length(x),])
h_D <- do.call(rbind, hd)
rownames(h_D) <- 1:nrow(h_D)
}
return(h_D)
}
-testing
level <- 3
zero <- 2
y <- 1
H_D(level, zero, y)
# Var1
#1 1
#3 3
and the first case
level <- c(3,2,4)
zero <- c(1,2,1)
y <- c(0,1,1)
H_D(level,zero,y)
# Var1 Var2 Var3
#1 1 2 1
#2 1 1 2
#3 1 1 3
#4 1 1 4
I am trying to code the following statement in R with if and ifelse.The sample data is trial and x,y,and z are columns of trial).
Statements to be coded
if (x>0) {
if (y>0) {
l=2
}else{
l=5
}
if (z>0) {
m=l+2
}else{
m=5
}
}
The R code using ifelse
trial$l<-with(trial, ifelse((x>0 &y>0),2,ifelse((x>0 &y<=0),5,???)))
trial$m<-with (trial,ifelse((x>0 &z>0),l+2,ifelse((x>0 &z<=0),5,???)))
where, ??? specifies that there are no values according to the above statement. In other words for x<0 and y there are no values.
Next, I use combination of if and ifelse to see that works:
if(trial$z>0){
trial$l<-with(trial, ifelse(y>0,2,5))
trial$m<-with(trial, ifelse(z>0,l+2,5))
}
This code is ok but there is a warning message (since z is a column vector)
In if (trial$z>0){
the condition has length>1 and only the first element will be used
I want to focus only on using ifelse since I am dealing with only vector. But, I have no luck in this regard. Any idea?
If you want to use ifelse and nest things you could do something like this
test <- data.frame(x = 2, y = 5, z = 3)
with(test, ifelse(z > 0 & x > 0 | y > 3, "yes", "no"))
In this case you're using logical operators to guard the output. You'll still get "no" if z <= 0, but you can deal with that pretty easily.
with(test, ifelse(z > 0, ifelse(x > 0 | y > 3, "yes", "no"), NA))
Nested ifelse statements can get hard to follow in any language, so consider matching or switch statements if you end up with more than 3 of them.
I would use transform twice for example:
trial <- data.frame(x=c(-1,1,2),y=c(1,-2,3),z=c(1,-5,5))
trial <- transform(trial,l = ifelse(x>0,ifelse(y > 0,2,5),NA))
transform(trial,m = ifelse(x>0,ifelse(z>0,l+2,5),NA))
x y z l m
1 -1 1 1 NA NA
2 1 -2 -5 5 5
3 2 3 5 2 4
Note that I assign NA for case x < 0. You can use a one transform like this for example:
trial <- data.frame(x=c(-1,1,2),y=c(1,-2,3),z=c(1,-5,5))
transform(trial,l <- ifelse(x>0,ifelse(y > 0,2,5),NA),
m = ifelse(x>0,ifelse(z>0,l+2,5),NA))
x y z c.NA..5..2. m
1 -1 1 1 NA NA
2 1 -2 -5 5 5
3 2 3 5 2 4
But personally I would prefer the first one for readability besides the fact you need maybe change column names.
I am trying to get all the indexes that meet a condition in a colum. I've already done this in the case of having one column like this:
# Get a 10% of samples labeled with a 1
indexPositive = sample(which(datafsign$result == 1), nrow(datafsign) * .1)
It is possible to do the same operation vectoriced for any number of columns in one line as well? I imagine that in that case indexPositive would be a list or array with the indexes of each column.
Data
The data frame is as follow:
x y f1 f2 f3 f4
1 76.71655 60.74299 1 1 -1 -1
2 -85.73743 -19.67202 1 1 1 -1
3 75.95698 -27.20154 1 1 1 -1
4 -82.57193 39.30717 1 1 1 -1
5 -45.32161 39.44898 1 1 -1 -1
6 -46.76636 -35.30635 1 1 1 -1
The seed I am using is set.seed(1000000007)
What I want is the set of indexes with value 1. In the case of only one column the result is:
head(indexPositive)
[1] 1398 873 3777 2140 133 3515
Thanks in advance.
Answer
Thanks to #David Arenburg I finally did it. Based on his comment I created this function:
getPercentageOfData <- function(x, condition = 1, percentage = .1){
# Get the percentage of samples that meet condition
#
# Args:
# x: A vector containing the data
# condition: Condition that the data need to satisfy
# percentaje: What percentage of samples to get
#
# Returns:
# Indexes of the percentage of the samples that meet the condition
meetCondition = which(x == condition)
sample(meetCondition, length(meetCondition) * percentage)
}
And then I used like this:
# Get a 10% of samples labeled with a 1 in all 4 functions
indexPositive = lapply(datafunctions[3:6], getPercentageOfData)
# Change 1 by -1
datafunctions$f1[indexPositive$f1] = -1
datafunctions$f2[indexPositive$f2] = -1
datafunctions$f3[indexPositive$f3] = -1
datafunctions$f4[indexPositive$f4] = -1
It would be great to also assign the values -1 to each column at once instead of writing 4 lines, but I do not know how.
You can define your function as follows (you can also add replacement as a partameter)
getPercentageOfData <- function(x, condition = 1, percentage = .1, replacement = -1){
meetCondition <- which(x == condition)
replace(x, sample(meetCondition, length(meetCondition) * percentage), replacement)
}
Then select the columns you want to operate on and update datafunctions directly (without creating indexPositive and then manually updating)
cols <- 3:6
datafunctions[cols] <- lapply(datafunctions[cols], getPercentageOfData)
You can of course play around with the functions parameters within lapply as in (for example)
datafunctions[cols] <- lapply(datafunctions[cols],
getPercentageOfData, percentage = .8, replacement = -100)
I am trying to code the following statement in R with if and ifelse.The sample data is trial and x,y,and z are columns of trial).
Statements to be coded
if (x>0) {
if (y>0) {
l=2
}else{
l=5
}
if (z>0) {
m=l+2
}else{
m=5
}
}
The R code using ifelse
trial$l<-with(trial, ifelse((x>0 &y>0),2,ifelse((x>0 &y<=0),5,???)))
trial$m<-with (trial,ifelse((x>0 &z>0),l+2,ifelse((x>0 &z<=0),5,???)))
where, ??? specifies that there are no values according to the above statement. In other words for x<0 and y there are no values.
Next, I use combination of if and ifelse to see that works:
if(trial$z>0){
trial$l<-with(trial, ifelse(y>0,2,5))
trial$m<-with(trial, ifelse(z>0,l+2,5))
}
This code is ok but there is a warning message (since z is a column vector)
In if (trial$z>0){
the condition has length>1 and only the first element will be used
I want to focus only on using ifelse since I am dealing with only vector. But, I have no luck in this regard. Any idea?
If you want to use ifelse and nest things you could do something like this
test <- data.frame(x = 2, y = 5, z = 3)
with(test, ifelse(z > 0 & x > 0 | y > 3, "yes", "no"))
In this case you're using logical operators to guard the output. You'll still get "no" if z <= 0, but you can deal with that pretty easily.
with(test, ifelse(z > 0, ifelse(x > 0 | y > 3, "yes", "no"), NA))
Nested ifelse statements can get hard to follow in any language, so consider matching or switch statements if you end up with more than 3 of them.
I would use transform twice for example:
trial <- data.frame(x=c(-1,1,2),y=c(1,-2,3),z=c(1,-5,5))
trial <- transform(trial,l = ifelse(x>0,ifelse(y > 0,2,5),NA))
transform(trial,m = ifelse(x>0,ifelse(z>0,l+2,5),NA))
x y z l m
1 -1 1 1 NA NA
2 1 -2 -5 5 5
3 2 3 5 2 4
Note that I assign NA for case x < 0. You can use a one transform like this for example:
trial <- data.frame(x=c(-1,1,2),y=c(1,-2,3),z=c(1,-5,5))
transform(trial,l <- ifelse(x>0,ifelse(y > 0,2,5),NA),
m = ifelse(x>0,ifelse(z>0,l+2,5),NA))
x y z c.NA..5..2. m
1 -1 1 1 NA NA
2 1 -2 -5 5 5
3 2 3 5 2 4
But personally I would prefer the first one for readability besides the fact you need maybe change column names.
For each row of my dataframe, I want to calculate a value from numbers taken from columns of this dataframe. If the calculated value is above 2, I want to set another columns value to 0, else to 1.
x=(df$firstnumber+df$secondnumer)/2
if(x>2){
df$binaryValue=0}
else{ df$binaryValue=1}
this throws the error
the condition has length > 1 and only the first element will be used
because x is a vector
How can I solve this? One way would be to write this as a function and to apply it to the dataframe - are there any other options?
Also, how could I write this to work with appl() ?
Thanks in advance
You could simply do...
df$BinaryValue <- ifelse( x > 2 , 0 , 1 )
So you get...
df <- data.frame( x = 1:5 , y = -2:2 )
x <- df$x + df$y
df$BinaryValue <- ifelse( x > 2 , 0 , 1 )
df
# x y BinaryValue
# 1 1 -2 1
# 2 2 -1 1
# 3 3 0 0
# 4 4 1 0
# 5 5 2 0
transform(df, BinaryValue = as.numeric(firstnumber + secondnumber > 4))
There's no need to divide by two in the first place. You could check whether the sum is greater than four. The function as.numeric is employed to transform boolean to numeric (0 and 1) values.