Ifelse condition across multiple variables using paste0 function to call up variables - r

I want to use an ifelse condition across multiple variables using paste0("Var",c(1,3,5)) to call up variables.
Here is some data.
set.seed(123)
df <- data.frame(Var1 = sample(1:5,10,replace = T),
Var2 = sample(1:5,10,replace = T),
Var3 = sample(1:5,10,replace = T),
Var4 = sample(1:5,10,replace = T),
Var5 = sample(1:5,10,replace = T))
df
Var1 Var2 Var3 Var4 Var5
1 3 5 2 1 5
2 3 3 1 1 5
3 2 3 3 2 4
4 2 1 4 3 5
5 3 4 1 4 2
6 5 1 3 5 1
7 4 1 5 5 1
8 1 5 4 3 3
9 2 3 2 1 1
10 3 2 5 2 5
As an example, i'm interested in Var1, Var3, Var5. Using ifelse, if the value is equal to 4,5 the new variable is given value of 1, else 0.
I'm using code to get the variables I'm interested in.
paste0( "Var", c(1,3,5) )
[1] "Var1" "Var3" "Var5"
I tried both of these, and know this doesn't work but is it possible to write a code that is similar to this
newvar <- ifelse( paste0("Var", c(1,3,5)) %in% c(4,5) , 1, 0)
newvar <- ifelse( df[ , paste0( "Var", c(1:3) ) ] %in% c(4,5) , 1, 0)
Any help greatly appreciated. Thanks
EDITED : Apologies if wasn't clear. I was making the ifelse across multiple variables to create a single variable. "newvar" is single variable, I didnte want to create the ifelse for each variable newvar1, newvar3, newvar5. It is any value of 4 or 5 in any of those variables, newvar is 1 or 0. Thanks

Use sapply:
v <- paste0("Var", c(1, 3, 5)) # or v <- c(1, 3, 5)
cbind(df, new = +sapply(df[v], `%in%`, 4:5))
giving:
Var1 Var2 Var3 Var4 Var5 new.Var1 new.Var3 new.Var5
1 3 5 2 1 5 0 0 1
2 3 3 1 1 5 0 0 1
3 2 3 3 2 4 0 0 1
4 2 1 4 3 5 0 1 1
5 3 4 1 4 2 0 0 0
6 5 1 3 5 1 1 0 0
7 4 1 5 5 1 1 1 0
8 1 5 4 3 3 0 1 0
9 2 3 2 1 1 0 0 0
10 3 2 5 2 5 0 1 1
Added
Regarding the comment below this answer try any of these. They all use v defined in the next line except for the last solution.
v <- paste0("Var", c(1, 3, 5)) # or v <- c(1, 3, 5)
transform(df, newvar = do.call("pmax", lapply(df[v], `%in%`, 4:5)))
transform(df, newvar = apply(df[v] == 4 | df[v] == 5, 1, max))
transform(df, newvar = apply(df[v], 1, function(x) +any(x %in% 4:5)))
transform(df, newvar = pmax(Var1 %in% 4:5, Var3 %in% 4:5, Var5 %in% 4:5))
any of which give:
Var1 Var2 Var3 Var4 Var5 newvar
1 3 5 2 1 5 1
2 3 3 1 1 5 1
3 2 3 3 2 4 1
4 2 1 4 3 5 1
5 3 4 1 4 2 0
6 5 1 3 5 1 1
7 4 1 5 5 1 1
8 1 5 4 3 3 1
9 2 3 2 1 1 0
10 3 2 5 2 5 1

Here's a solution based on the tidyverse.
library(tidyverse)
df %>%
mutate(
across(
c(Var1, Var3, Var5),
~ifelse(.x %in% c(4, 5), 1, 0),
.names="new{.col}"
)
)
Var1 Var2 Var3 Var4 Var5 newVar1 newVar3 newVar5
1 3 5 2 1 5 0 0 1
2 3 3 1 1 5 0 0 1
3 2 3 3 2 4 0 0 1
4 2 1 4 3 5 0 1 1
5 3 4 1 4 2 0 0 0
6 5 1 3 5 1 1 0 0
7 4 1 5 5 1 1 1 0
8 1 5 4 3 3 0 1 0
9 2 3 2 1 1 0 0 0
10 3 2 5 2 5 0 1 1
The across function runs the function defined in its second argument on each of the columns defined by its first argument. The .names argument is optional and provides names for the new columns (as opposed to overwriting the originals). {.col} is a palceholder for the name of the current column.

Here's a base Rsolution close to what OP tried:
df$newVar <- sapply(df[,c(1,3,5)], function(x) ifelse(x == 4|x == 5, 1, 0))
or, even closer:
df$newVar <- sapply(df[,c(1,3,5)], function(x) ifelse(x %in% c(4,5), 1, 0))

If you want to use both paste0() and ifelse(), here is an example:
x <- paste0("Var", c(1,3,5))
newvar <- matrix(NA, nrow(df1), length(x))
for(j in 1:(length(x))){
newvar[,j] <- ifelse(df1[ ,x[j]] %in% c(4,5), 1, 0)}
final.df <- data.frame(df1,
newvar=ifelse(apply(newvar, 1, sum)>0, 1, 0
))
final.df
Var1 Var2 Var3 Var4 Var5 newvar
1 3 5 2 1 5 1
2 3 3 1 1 5 1
3 2 3 3 2 4 1
4 2 1 4 3 5 1
5 3 4 1 4 2 0
6 5 1 3 5 1 1
7 4 1 5 5 1 1
8 1 5 4 3 3 1
9 2 3 2 1 1 0
10 3 2 5 2 5 1

Related

Create a new dataframe with 1's and 0's from summarized data?

I have the below dataset that I am working with in R:
df <- data.frame(day=seq(1,3,1), tot.infected=c(1,2,4), tot.ind=5)
df
And I would like to transform the tot.infected column into a binomial variable with 1's and 0's, such as the following dataframe:
df2 <- data.frame(year = c(rep(1,5), rep(2,5), rep(3,5)), infected = c(rep(1,1), rep(0,4), rep(1,2), rep(0,3), rep(1,4), rep(0,1)))
Is there a more elegant way to do this in R?
Thank you for your help!
I tried hard-coding a dataframe using rep(), but this is extremely time-consuming for large datasets and I was looking for a more elegant way to achieve this.
base R
tmp <- do.call(Map, c(list(f = function(y, inf, ind) data.frame(year = y, infected = replace(integer(ind), seq(ind) <= inf, 1L))), unname(df)))
do.call(rbind, tmp)
# year infected
# 1 1 1
# 2 1 0
# 3 1 0
# 4 1 0
# 5 1 0
# 6 2 1
# 7 2 1
# 8 2 0
# 9 2 0
# 10 2 0
# 11 3 1
# 12 3 1
# 13 3 1
# 14 3 1
# 15 3 0
dplyr
library(dplyr)
df %>%
rowwise() %>%
summarize(tibble(year = day, infected = replace(integer(tot.ind), seq(tot.ind) <= tot.infected, 1L)))
# # A tibble: 15 x 2
# year infected
# <dbl> <int>
# 1 1 1
# 2 1 0
# 3 1 0
# 4 1 0
# 5 1 0
# 6 2 1
# 7 2 1
# 8 2 0
# 9 2 0
# 10 2 0
# 11 3 1
# 12 3 1
# 13 3 1
# 14 3 1
# 15 3 0
We can do it this way:
library(dplyr)
df %>%
group_by(day) %>%
summarise(cur_data()[seq(unique(tot.ind)),]) %>%
#mutate(x = row_number())
mutate(tot.infected = ifelse(row_number() <= first(tot.infected),
first(tot.infected)/first(tot.infected), 0), .keep="used")
day tot.infected
<dbl> <dbl>
1 1 1
2 1 0
3 1 0
4 1 0
5 1 0
6 2 1
7 2 1
8 2 0
9 2 0
10 2 0
11 3 1
12 3 1
13 3 1
14 3 1
15 3 0
Using rep.int and replace, basically.
with(df, data.frame(
year=do.call(rep.int, unname(df[c(1, 3)])),
infected=unlist(Map(replace, Map(rep.int, 0, tot.ind), lapply(tot.infected, seq), 1))
))
# year infected
# 1 1 1
# 2 1 0
# 3 1 0
# 4 1 0
# 5 1 0
# 6 2 1
# 7 2 1
# 8 2 0
# 9 2 0
# 10 2 0
# 11 3 1
# 12 3 1
# 13 3 1
# 14 3 1
# 15 3 0
Data:
df <- structure(list(day = c(1, 2, 3), tot.infected = c(1, 2, 4), tot.ind = c(5,
5, 5)), class = "data.frame", row.names = c(NA, -3L))

To find difference between unique levels

I have a column in which I have unique levels ,I want to find the gap (difference between the levels ).
I have data
x=c(0,0,0,0,0,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4,4,4)
The result for this should be :
1 2 3 4 5 1 2 3 1 2 3 4 1 2 3 4 5 6
This is not very clear code, but it gets the job done:
res = ifelse(x == 0, 0, ifelse(c(0, x[-length(x)]) != 0, 0, NA))
res[is.na(res)] = with(rle(x == 0), lengths[values])
res
# [1] 0 0 0 0 4 0 0 0 0 4 0 0 2 0
This is perhaps better:
res2 = x
res2[x != 0] = diff(c(0, which(x != 0))) - 1
res2
# [1] 0 0 0 0 4 0 0 0 0 4 0 0 2 0
Not the definite answer, but her's an approach using rle...
x=c(0,0,0,0,1,0,0,0,0,2,0,0,3,4)
y <- rle(x)
> y
# Run Length Encoding
# lengths: int [1:7] 4 1 4 1 2 1 1
# values : num [1:7] 0 1 0 2 0 3 4
We can use ave and create a grouping variable with cumsum and diff to capture the difference in unique levels and create a sequence with seq_along
ave(x, c(0, cumsum(diff(x) != 0)), FUN = seq_along)
#[1] 1 2 3 4 5 1 2 3 1 2 3 4 1 2 3 4 1 2 3 4 5 6
For the given example, as suggested by #markus this works
ave(x, x, FUN = seq_along)
#[1] 1 2 3 4 5 1 2 3 1 2 3 4 1 2 3 4 1 2 3 4 5 6
but what if the input is
x=c(0,0,0,0,0,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4,4,4,0,0)
using
ave(x, x, FUN = seq_along) #gives
#[1] 1 2 3 4 5 1 2 3 1 2 3 4 1 2 3 4 1 2 3 4 5 6 6 7
whereas
ave(x, c(0, cumsum(diff(x) != 0)), FUN = seq_along) #gives
#[1] 1 2 3 4 5 1 2 3 1 2 3 4 1 2 3 4 1 2 3 4 5 6 1 2
We can user rleid from data.table
library(data.table)
ave(x, rleid(x), FUN = seq_along)
#[1] 1 2 3 4 5 1 2 3 1 2 3 4 1 2 3 4 1 2 3 4 5 6
Or convert to a data.table and then group by the rleid
data.table(x)[, seq_len(.N), x]$V1
#[1] 1 2 3 4 5 1 2 3 1 2 3 4 1 2 3 4 1 2 3 4 5 6
Or after using rle, extract the lengths and apply sequence
sequence(rle(x)$lengths)
#[1] 1 2 3 4 5 1 2 3 1 2 3 4 1 2 3 4 1 2 3 4 5 6

R: count number of values per row in data set and create a new column

Thanks in advance.
I need in a data frame count number of existing values (o,1,2) per row and create 3 columns for count number for each value
I used the example:
example <- data.frame(var1 = c(2,3,3,2,4,5),
var2 = c(2,3,5,4,2,5),
var3 = c(3,3,4,3,4,5))
example <- cbind(example, apply(example, 1, function(x)length(unique(x))))
But it returns only number of unique values.
Is this what you want?
all_vals = unique(unlist(example))
tt = t(apply(example, 1, function(x) table(factor(x, levels = all_vals))))
cbind(example, tt)
# var1 var2 var3 2 3 4 5
# 1 2 2 3 2 1 0 0
# 2 3 3 3 0 3 0 0
# 3 3 5 4 0 1 1 1
# 4 2 4 3 1 1 1 0
# 5 4 2 4 1 0 2 0
# 6 5 5 5 0 0 0 3
A good next step would be renaming the new columns.

Putting rowwise counts of value occurences into new variables, how to do that in R with dplyr?

I have a large dataframe (df) that looks like this:
structure(list(var1 = c(1, 2, 3, 4, 2, 3, 4, 3, 2), var2 = c(2,
3, 4, 1, 2, 1, 1, 1, 3), var3 = c(4, 4, 2, 3, 3, 1, 1, 1, 4),
var4 = c(2, 2, 2, 2, 3, 2, 3, 4, 1), var5 = c(4, 4, 2, 3,
3, 1, 1, 1, 4)), .Names = c("var1", "var2", "var3", "var4",
"var5"), row.names = c(NA, -9L), class = "data.frame")
var1 var2 var3 var4 var5
1 1 2 4 2 4
2 2 3 4 2 4
3 3 4 2 2 2
4 4 1 3 2 3
5 2 2 3 3 3
6 3 1 1 2 1
7 4 1 1 3 1
8 3 1 1 4 1
9 2 3 4 1 4
Now I need to count the occurence of values rowwise and make new variables of the counts. This should be the result:
var1 var2 var3 var4 var5 n_1 n_2 n_3 n_4
1 1 2 4 2 4 1 2 0 2
2 2 3 4 2 4 0 2 1 2
3 3 4 2 2 2 0 3 1 1
4 4 1 3 2 3 1 1 2 1
5 2 2 3 3 3 0 2 3 0
6 3 1 1 2 1 3 1 1 0
7 4 1 1 3 1 3 0 1 1
8 3 1 1 4 1 3 0 1 1
9 2 3 4 1 4 1 1 1 2
As you can see variable n_1 shows the rowcounts of the 1's, n_2 the row counts of the 2's, etc.
I tried some dplyr functions (because I like their speed), but haven't succeeded yet. I know this is definately ugly code :-), but my approache would be something in this way:
newdf <- mutate(rowwise(df, n_1 = sum(df==1))
Does anyone have an idea about how to deal with this problem?
Many thanks in advance!
This uses rowwise() and do() from dplyr but it's definitely ugly.
Not sure if there is something that can modify from this so that you get a data.frame output directly as shown over # https://github.com/hadley/dplyr/releases.
interim_res <- df %>%
rowwise() %>%
do(out = sapply(min(df):max(df), function(i) sum(i==.)))
interim_res <- interim_res[[1]] %>% do.call(rbind,.) %>% as.data.frame(.)
Then to get intended result:
res <- cbind(df,interim_res)
This is a solution using base functions
dd <- t(apply(df, 1, function(x) table(factor(x, levels=1:4))))
colnames(dd) <- paste("n",1:4, sep="_")
cbind(df, dd)
Just use the table command across rows of your data.frame to get counts of each value from 1-4.
Here is an approach using qdapTools package:
library(qdapTools)
data.frame(dat, setNames(mtabulate(split(dat, id(dat))), paste0("n_", 1:4)))
## var1 var2 var3 var4 var5 n_1 n_2 n_3 n_4
## 1 1 2 4 2 4 1 2 0 2
## 2 2 3 4 2 4 0 2 1 2
## 3 3 4 2 2 2 0 3 1 1
## 4 4 1 3 2 3 1 1 2 1
## 5 2 2 3 3 3 0 2 3 0
## 6 3 1 1 2 1 3 1 1 0
## 7 4 1 1 3 1 3 0 1 1
## 8 3 1 1 4 1 3 0 1 1
## 9 2 3 4 1 4 1 1 1 2

Using `rank` across columns to create new variable

I have a question I can't figure out, which I'm almost certain involves rank. Let's say that I have a df in wide form with 3 variables with integer values.
id var1 var2 var3
1 23 8 30
2 1 2 3
3 4 5 1
4 100 80 60
I'd like to create three new variables with the rank of the values for var1, var2, and var3 from largest to smallest. For example,
id var1 var2 var3 var1_rank var2_rank var3_rank
1 23 8 30 2 3 1
2 1 2 3 3 2 1
3 4 5 1 2 1 3
4 100 80 60 1 2 3
How would I go about doing this? Thanks!
Get the example data:
test <- read.table(text="id var1 var2 var3
1 23 8 30
2 1 2 3
3 4 5 1
4 100 80 60",header=TRUE)
Get the ranks part and rename appropriately (notice the -x to reverse the rank so it relates to decreasing instead of increasing size - this will be generalisable to any size of data.frame used as input):
ranks <- t(apply(test[,-1], 1, function(x) rank(-x) ))
colnames(ranks) <- paste(colnames(ranks), "_rank", sep="")
Join with the old data frame.
data.frame(test, ranks)
Result:
> data.frame(test,ranks)
id var1 var2 var3 var1_rank var2_rank var3_rank
1 1 23 8 30 2 3 1
2 2 1 2 3 3 2 1
3 3 4 5 1 2 1 3
4 4 100 80 60 1 2 3
To get to #mnel's answer using base R, you could also do something like:
testres <- data.frame(test["id"],stack(test[2:4]))
testres$rank <- ave(testres$values,testres$id,FUN=function(x) rank(-x) )
> testres
id values ind rank
1 1 23 var1 2
2 2 1 var1 3
3 3 4 var1 2
4 4 100 var1 1
5 1 8 var2 3
6 2 2 var2 2
7 3 5 var2 1
8 4 80 var2 2
9 1 30 var3 1
10 2 3 var3 1
11 3 1 var3 3
12 4 60 var3 3
I think it is easier to work in long format (and more memory efficient, as apply will coerce to a matrix. Here is an approach using reshape and data.table
library(data.table)
tlong <- reshape(data.table(test), direction ='long', varying = list(2:4),
times = paste0('var',1:3), v.names = 'value')
# calculate the rank within each `id`
tlong[, rank := rank(-value), by = id]
tlong
## id time value rank
## 1: 1 var1 23 2
## 2: 2 var1 1 3
## 3: 3 var1 4 2
## 4: 4 var1 100 1
## 5: 1 var2 8 3
## 6: 2 var2 2 2
## 7: 3 var2 5 1
## 8: 4 var2 80 2
## 9: 1 var3 30 1
## 10: 2 var3 3 1
## 11: 3 var3 1 3
## 12: 4 var3 60 3
# reshape to wide (if you want)
oldname <- paste0('var1',1:3)
twide <- reshape(tlong, direction = 'wide', timevar = 'time', idvar = 'id')
# reorder from value.var1, rank.var1,... to value.var1, value.var2,....rank.var1, rank.var2
setcolorder(twide, c('id', paste('value', oldname, sep ='.'), paste('rank', oldname, sep = '.'))
Here's one approach:
data.frame(dat, 4 - t(apply(dat[, -1], 1, rank)))
## > data.frame(dat, 4 - t(apply(dat[, -1], 1, rank)))
## id var1 var2 var3 var1.1 var2.1 var3.1
## 1 1 23 8 30 2 3 1
## 2 2 1 2 3 3 2 1
## 3 3 4 5 1 2 1 3
## 4 4 100 80 60 1 2 3

Resources