Select for every row between two columns based on condition in another column in R - r

may someone help me to find the answer thread or provide a method for solution? I can not find a solution.
What I want to do:
For every row if the value in column "x" is "A" then select the value in column "y" from the same row and if the value in column "x" is "B" then select the value in column "z" from the same row.
Ideally collected in a vector to include as a new column in the df afterwards.
df <- data.frame(x = c("A", "B", "B", "A"), y = c(1,2,3,4), z = c(4,3,2,1), fix.empty.names = FALSE)
df
x y z
1 A 1 4
2 B 2 3
3 B 3 2
4 A 4 1
result
[1] 1 3 2 4
Thank you very much in advance

If we can assume x is always "A" or "B":
ifelse(df$x == "A", df$y, df$z)
More generally:
ifelse(df$x == "A", df$y, ifelse(df$x == "B", df$z, NA))
You can, of course, assign this directly as a new column: df$result <- ifelse...
If you like dplyr:
library(dplyr)
df %>%
mutate(
result = case_when(
x == "A" ~ y,
x == "B" ~ z,
TRUE ~ NA_real_
)
)

Related

Trying to sort character variable into new variable with new value based on conditions

I want to sort a character variable into two categories in a new variable based on conditions, in conditions are not met i want it to return "other".
If variable x cointains 4 character values "A", "B", "C" & "D" I want to sort them into a 2 categories, 1 and 0, in a new variable y, creating a dummy variable
Ideally I want it to look like this
df <- data.frame(x = c("A", "B", "C" & "D")
y <- if x == "A" | "D" then assign 1 in y
if x == "B" | "C" then assign 0 in y
if x == other then assign NA in y
x y
1 "A" 1
2 "B" 0
3 "C" 0
4 "D" 1
library(dplyr)
df <- df %>% mutate ( y =case_when(
(x %in% df == "A" | "D") ~ 1 ,
(x %in% df == "B" | "C") ~ 1,
x %in% df == ~ NA
))
I got this error message
Error: replacement has 3 rows, data has 2
Here's the proper case_when syntax.
df <- data.frame(x = c("A", "B", "C", "D"))
library(dplyr)
df <- df %>%
mutate(y = case_when(x %in% c("A", "D") ~ 1,
x %in% c("B", "C") ~ 0,
TRUE ~ NA_real_))
df
#> x y
#> 1 A 1
#> 2 B 0
#> 3 C 0
#> 4 D 1
You're combining syntaxes in a way that makes sense in speech but not in code.
Generally you can't use foo == "G" | "H". You need to use foo == "G" | foo == "H", or the handy shorthand foo %in% c("G", "H").
Similarly x %in% df == "A" doesn't make sense x %in% df makes sense. df == "A" makes sense. Putting them together x %in% df == ... does not make sense to R. (Okay, it does make sense to R, but not the same sense it does to you. R will use its Order of Operations which evaluates x %in% df first and gets a result from that, and then checks whether that result == "A", which is not what you want.)
Inside a dplyr function like mutate, you don't need to keep specifying df. You pipe in df and now you just need to use the column x. x %in% df looks like you're testing whether the column x is in the data frame df, which you don't need to do. Instead use x %in% c("A", "D"). Aron's answer shows the full correct syntax, I hope this answer helps you understand why.

How to delete entire row for x if y appears at least once in same column?

I would like to run a code in which I delete the entire row for entries of "x", if "y" appears at least once in the same column of "var4". I can't find any solution in R. Below is what I tried.
In the code below, I tried to tell R that if var4 contains at least one y, all rows containing x should be filtered out/removed.
Example for df:
var1 var2 var3 var4
a b b a
b a b x
a b a x
a a a y
if (all(df$var4 %in% c("y"))) {
df <- filter(!var4 %in% c("x"))
}
So, I would like to delete rows 2&3 because y appears in var4. Unfortunately the code above doesn't return any change in df, even though y appears several times in var4.
Many thanks. I appreciate any kind of recommendation.
In the OP's code, filter statement is not getting the data. Instead, it can be
library(dplyr)
if("y" %in% df$var4) {
df <- df %>%
filter(!var4 %in% "x")
}
df
# var1 var2 var3 var4
#1 a b b a
#2 a a a y
It can be also written as
df %>%
filter("y" %in% var4 & !var4 %in% 'x')
data
df <- structure(list(var1 = c("a", "b", "a", "a"), var2 = c("b", "a",
"b", "a"), var3 = c("b", "b", "a", "a"), var4 = c("a", "x", "x",
"y")), class = "data.frame", row.names = c(NA, -4L))
If you want to use base R commands.
df[!df$var4 == "x", ] should do it.
df$var4 == "x" will return a vector of TRUE/FALSE
> df$var4 == "x"
[1] FALSE TRUE TRUE FALSE
The ! in front of it flips the TRUE FALSE
> !df$var4 == "x"
[1] TRUE FALSE FALSE TRUE
Then the bracket notation refers to subsetting the object by rows, then columns.
df[rows,columns]
Putting it all together, the following will subset rows based on the criteria supplied, and include all columns.
df[!df$var4 == "x", ]
Note that the nothing after the , means include all columns.

Replace strings in variable using lookup vector

I have a dataframe df with a character variable and the fromvec and tovec.
df <- tibble(var = c("A", "B", "C", "a", "E", "D", "b"))
fromvec <- c("A", "B", "C")
tovec <- c("X", "Y", "Z")
Use strings in fromvec, check them in df and then replace them with the corresponding strings in tovec so that "A" in df gets replaced with "X", "B" with "Y" and so on to get the desired_df.
desired_df <- tibble(var = c("X", "Y", "Z", "X", "E", "D", "Y"))
I tried following, but not getting the desired result!
from_vec <- paste(fromvec, collapse="|")
to_vec <- paste(tovec, collapse="|")
undesired_df <- df %>%
mutate(var = str_replace(str_to_upper(var), from_vec, to_vec))
i.e. this
tibble(var = c("X|Y|Z", "X|Y|Z", "X|Y|Z", "X|Y|Z", "E", "D", "X|Y|Z"))
How can I get the desired_df?
You could use chartr :
df$var <- chartr(paste(fromvec,collapse=""),
paste(tovec,collapse=""),
toupper(df$var))
# # A tibble: 7 x 1
# var
# <chr>
# 1 X
# 2 Y
# 3 Z
# 4 X
# 5 E
# 6 D
# 7 Y
Or we can use recode
library(dplyr)
df$var <- recode(toupper(df$var), !!!setNames(tovec,fromvec))
If you really want to use str_replace you could do:
library(purrr)
library(stringr)
df$var <- reduce2(fromvec, tovec, str_replace, .init=toupper(df$var))
The correct way to do this with stringr is with str_replace_all:
mutate(df,str_replace_all(str_to_upper(var),setNames(tovec, fromvec)))
(thanks, #Moody_Mudskipper!)
We can use base R
with(df, ifelse(toupper(var) %in% fromvec,
setNames(tovec, fromvec)[toupper(var)], var))
#[1] "X" "Y" "Z" "X" "E" "D" "Y"
which can be also written in two lines by creating a logical condition
i1 <- toupper(df$var) %in% fromvec
df$var[i1] <- setNames(tovec, fromvec)[toupper(df$var)[i1]]
Or using data.table
library(data.table)
setDT(df)[toupper(var) %in% fromvec, var := setNames(tovec, fromvec)[toupper(var)]]
It's not clear the result should be case insensitive.
In my opinion, replacement (update) operations that involve an indeterminate number of changes are best accomplished using JOINs. In this case, it also cements a good practice of tracking your changes in a separate dataframe.
Unfortunately, the tidyverse has no "update dataframe" function....a glaring omission. That means tidyverse-ers must use a work-around, coalesce.
#JOIN Operation
tibble(fromvec, tovec) %>% #< dataframe of changes
right_join(df, by = c("fromvec" = "var")) %>% #< join operation
transmute(var = coalesce(tovec, fromvec)) #< coalesce work-around
# A tibble: 7 x 1
var
<chr>
1 X
2 Y
3 Z
4 a
5 E
6 D
7 b
If a case insensitive operation is preferred, consider inserting str_to_upper in the pipeline:
tibble(fromvec, tovec) %>%
right_join(df %>% mutate(var = (str_to_upper(var))), #<modify case
by = c("fromvec" = "var")) %>%
transmute(var = coalesce(tovec, fromvec))
# A tibble: 7 x 1
var
<chr>
1 X
2 Y
3 Z
4 X
5 E
6 D
7 Y

How to convert a factor to numeric in a predefined order in R

I have a factor column, with three values: "b", "c" and "free".
I did
df$new_col = as.numeric (df$factor_col)
But it will convert "b" to 1, "c" to 2 and "free" to 3.
But I want to convert "free" to 0, "b" to 2 and "c" to 5. How can I do it in R?
Thanks a lot
f <- factor(c("b", "c", "c", "free", "b", "free"))
You can try renaming the factor levels,
levels(f)[levels(f)=="b"] <- 2
levels(f)[levels(f)=="c"] <- 5
levels(f)[levels(f)=="free"] <- 0
> f
#[1] 2 5 5 0 2 0
#Levels: 2 5 0
One option would be to call the 'factor' again and specify the levels and labels argument based on the custom order and change to numeric after converting to 'character' or through the levels
df$new_col <- as.numeric(as.character(factor(df$factor_col,
levels=c('b', 'c', 'free'), labels=c(2, 5, 0))))
Another option is recode from library(car). The output will be factor class. If we need to convert to 'numeric', we can do this as in the earlier solution (as.numeric(..).
library(car)
df$new_col <- with(df, recode(factor_col, "'b'=2; 'c'=5; 'free'=0"))
data
df <- data.frame(factor_col= c('b', 'c', 'b', 'free', 'c', 'free'))
You can use the following approach to create the new column:
# an example data frame
f <- data.frame(factor_col = c("b", "c", "free"))
# create new_col
f <- transform(f, new_col = (factor_col == "b") * 2 + (factor_col == "c") * 5)
The result (f):
factor_col new_col
1 b 2
2 c 5
3 free 0

R: create new column with name coming from variable

I have a dataframe with several columns and would like to add a new column and name it according to a previous variable. For example:
df <- data.frame("A" = c(1, 2, 3, 4), "B" = c("a", "c", "d", "b"))
Variable <- "C"
This is part of a function where the variable will be changing and rather than each time specifying:
df$C <- NA
I would like a one line that will take the "Variable" to name the additional column
Try [ instead of $:
> df[, Variable] <- NA
> df
A B C
1 1 a NA
2 2 c NA
3 3 d NA
4 4 b NA
In the context of a data.frame name also taken in a variable this might be helpful.
df <- data.frame("A" = c(1, 2, 3, 4), "B" = c("a", "c", "d", "b") )
Variable<-"C"
dfname<-"df"
df<-within ( assign(dfname , get(dfname) ),
assign(Variable, NA )
)

Resources